How to scrap data from Geiger world map (public information about radiation levels)

Hello,

This post relates to public interest, not just my needs.

Recently I purchased a cheap GMC Geiger counter in order to publish radiation levels o GMC world map:
https://gmcmap.com

You can scroll the world map and find a Geiger counter nearby.
Each Geiger counter has a unique ID.

The data is available here:
https://gmcmap.com/historyData.asp?Param_ID=32489957699

How to scap this data and create two sensors:
CPM and uSv/h
The time should be the date downloaded on the web page (date of publication), not the fetching date.

I tried without success to parse TD, I am lost.
There is also a possibility to download data in CSV format.

Could someone skilled create a parser, I have been using HA for only 2 days …
It would be very interesting to have an official integrator where we would only input Geiger ID …
But let’s do it first with scrap.

Kind regards,
Ffries

2 Likes

Do you just want the latest data? I would have thought a select of td and and index of 1 (CPM) and 3 (uSv/h) would work.

Whoever wrote the code that generates the HTML needs a slap. Each row is in its own <tbody>!

Thank you.

I would prefer to have all data but latest data is also acceptable.

I don’t understand how to do it.

Let’s start with the latest. If those as historical readings, HA will gradually build up the history itself as it refreshes.

  • Settings
  • Devices & Services (see “Integrations” in the sub-text?)
  • Click Integrations at the top if not already selected
  • Click ADD INTEGRATION button
  • Type and select Scrape
  • Paste your URL in the Resource box, click NEXT
  • Type something in the Name box (“Geiger CPM” perhaps)
  • Put td in the Select box
  • Put 1 in the Index box (we want the second <td>, but we count from zero)
  • Set device class and state class to “No … class”
  • Type CPM in the Unit of Measurement box.
  • SUBMIT

Then repeat for the uSv/h one with a different name, 3 in the index (the fourth <td>) and uSv/h in the Unit box.

Screenshot_2023-06-27_12-56-18

2 Likes

This works great, thank you!!!

Now I would like to scrap data from a local webserver running on the Raspberry Pi. Below is the page code.

I would like to fetch first data (14.74):

</td><td id="cpm1stA">17.97</td><td id="cpm1stB">14.74</td><td id="cpm1stC">12.47</td></tr><tr>
        <td class="left">[µSv/h]

I tried various values of td or td id=“cpm1stA” without success.

Any idea ?

Kind regards,

div {max-width:380px; margin:auto;} td,th {padding:4px 0px 4px 1px; font-size:20px; text-align:center;} th {font-weight:900;} p {margin-bottom:0px; font-weight:normal;} Data

GeigerLog Monitor Server

Monitor Plot Data Info

Quick Log Start Log Stop Log

Avg: 1 min3 min10 min
CPM: ---------
[µSv/h] ---------
CPS: ---------
    </td></tr><tr>
        <td class="left">CPM1st:
        </td><td id="cpm1stA">17.97</td><td id="cpm1stB">14.74</td><td id="cpm1stC">12.47</td></tr><tr>
        <td class="left">[µSv/h]
        </td><td id="usv1stA">0.12</td><td id="usv1stB">0.10</td><td id="usv1stC">0.08</td></tr><tr>
        <td class="left">CPS1st:
        </td><td id="cps1stA">0.31</td><td id="cps1stB">0.26</td><td id="cps1stC">0.21</td></tr><tr>
        <td colspan="4" style="padding:2px; background: white;">

    </td></tr><tr>
        <td class="left">CPM2nd:
        </td><td id="cpm2ndA">---</td><td id="cpm2ndB">---</td><td id="cpm2ndC">---</td></tr><tr>
        <td class="left">[µSv/h]
        </td><td id="usv2ndA">---</td><td id="usv2ndB">---</td><td id="usv2ndC">---</td></tr><tr>
        <td class="left">CPS2nd:
        </td><td id="cps2ndA">---</td><td id="cps2ndB">---</td><td id="cps2ndC">---</td></tr><tr>
        <td colspan="4" style="padding:2px; background: white;">

    </td></tr><tr>
        <td class="left">CPM3rd:
        </td><td id="cpm3rdA">---</td><td id="cpm3rdB">---</td><td id="cpm3rdC">---</td></tr><tr>
        <td class="left">[µSv/h]
        </td><td id="usv3rdA">---</td><td id="usv3rdB">---</td><td id="usv3rdC">---</td></tr><tr>
        <td class="left">CPS3rd:
        </td><td id="cps3rdA">---</td><td id="cps3rdB">---</td><td id="cps3rdC">---</td></tr><tr>
        <td colspan="4" style="padding:2px; background: white;">

    </td></tr><tr><td class="left">T:[°C] </td><td id="tempA">---</td><td id="tempB">---</td><td id="tempC">---</td></tr><tr><td class="left">P:[hPa]</td><td id="pressA">---</td><td id="pressB">---</td><td id="pressC">---</td></tr><tr><td class="left">H:[%]  </td><td id="humidA">---</td><td id="humidB">---</td><td id="humidC">---</td></tr><tr><td class="left">X:[%]  </td><td id="xtraA">---</td><td id="xtraB">---</td><td id="xtraC">---</td></tr></tbody></table>

You can’t just guess select statements. In this case, you want td#cpm1stB.

The Scrape docs for select point you to the BeautifulSoup docs, as that’s what it uses under the hood. You want this section:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors-through-the-css-property

For this select, we need the id selector:

image

However, as it’s a local webserver, do you have an option to publish the data through a better format, like JSON or XML? Scraping HTML is a horrible last resort option for when there’s no other way to do it.

Thank you for this example!
We have connected the GGreg20_V3 modules available on the GMCmap service to our Home Assistant.

Everything works perfectly!
Best Regards,
Oleksii
Team @iot-devices