ffries
(French Fries)
June 27, 2023, 10:59am
1
Hello,
This post relates to public interest, not just my needs.
Recently I purchased a cheap GMC Geiger counter in order to publish radiation levels o GMC world map:
https://gmcmap.com
You can scroll the world map and find a Geiger counter nearby.
Each Geiger counter has a unique ID.
The data is available here:
https://gmcmap.com/historyData.asp?Param_ID=32489957699
How to scap this data and create two sensors:
CPM and uSv/h
The time should be the date downloaded on the web page (date of publication), not the fetching date.
I tried without success to parse TD, I am lost.
There is also a possibility to download data in CSV format.
Could someone skilled create a parser, I have been using HA for only 2 days …
It would be very interesting to have an official integrator where we would only input Geiger ID …
But let’s do it first with scrap.
Kind regards,
Ffries
2 Likes
Troon
(Troon)
June 27, 2023, 11:06am
2
Do you just want the latest data? I would have thought a select
of td
and and index
of 1
(CPM) and 3
(uSv/h) would work.
Whoever wrote the code that generates the HTML needs a slap. Each row is in its own <tbody>
!
ffries
(French Fries)
June 27, 2023, 11:08am
3
Thank you.
I would prefer to have all data but latest data is also acceptable.
I don’t understand how to do it.
Troon
(Troon)
June 27, 2023, 11:54am
4
Let’s start with the latest. If those as historical readings, HA will gradually build up the history itself as it refreshes.
Settings
Devices & Services (see “Integrations” in the sub-text?)
Click Integrations at the top if not already selected
Click ADD INTEGRATION button
Type and select Scrape
Paste your URL in the Resource box, click NEXT
Type something in the Name box (“Geiger CPM” perhaps)
Put td
in the Select box
Put 1
in the Index box (we want the second <td>
, but we count from zero)
Set device class and state class to “No … class”
Type CPM
in the Unit of Measurement box.
SUBMIT
Then repeat for the uSv/h one with a different name, 3
in the index (the fourth <td>
) and uSv/h
in the Unit box.
2 Likes
ffries
(French Fries)
June 27, 2023, 1:12pm
5
This works great, thank you!!!
ffries
(French Fries)
June 28, 2023, 9:54am
6
Now I would like to scrap data from a local webserver running on the Raspberry Pi. Below is the page code.
I would like to fetch first data (14.74):
</td><td id="cpm1stA">17.97</td><td id="cpm1stB">14.74</td><td id="cpm1stC">12.47</td></tr><tr>
<td class="left">[µSv/h]
I tried various values of td or td id=“cpm1stA” without success.
Any idea ?
Kind regards,
div {max-width:380px; margin:auto;}
td,th {padding:4px 0px 4px 1px; font-size:20px; text-align:center;}
th {font-weight:900;}
p {margin-bottom:0px; font-weight:normal;}
Data
Monitor
Plot
Data
Info
Quick Log
Start Log
Stop Log
Avg:
1 min 3 min 10 min
CPM:
--- --- ---
[µSv/h]
--- --- ---
CPS:
--- --- ---
</td></tr><tr>
<td class="left">CPM1st:
</td><td id="cpm1stA">17.97</td><td id="cpm1stB">14.74</td><td id="cpm1stC">12.47</td></tr><tr>
<td class="left">[µSv/h]
</td><td id="usv1stA">0.12</td><td id="usv1stB">0.10</td><td id="usv1stC">0.08</td></tr><tr>
<td class="left">CPS1st:
</td><td id="cps1stA">0.31</td><td id="cps1stB">0.26</td><td id="cps1stC">0.21</td></tr><tr>
<td colspan="4" style="padding:2px; background: white;">
</td></tr><tr>
<td class="left">CPM2nd:
</td><td id="cpm2ndA">---</td><td id="cpm2ndB">---</td><td id="cpm2ndC">---</td></tr><tr>
<td class="left">[µSv/h]
</td><td id="usv2ndA">---</td><td id="usv2ndB">---</td><td id="usv2ndC">---</td></tr><tr>
<td class="left">CPS2nd:
</td><td id="cps2ndA">---</td><td id="cps2ndB">---</td><td id="cps2ndC">---</td></tr><tr>
<td colspan="4" style="padding:2px; background: white;">
</td></tr><tr>
<td class="left">CPM3rd:
</td><td id="cpm3rdA">---</td><td id="cpm3rdB">---</td><td id="cpm3rdC">---</td></tr><tr>
<td class="left">[µSv/h]
</td><td id="usv3rdA">---</td><td id="usv3rdB">---</td><td id="usv3rdC">---</td></tr><tr>
<td class="left">CPS3rd:
</td><td id="cps3rdA">---</td><td id="cps3rdB">---</td><td id="cps3rdC">---</td></tr><tr>
<td colspan="4" style="padding:2px; background: white;">
</td></tr><tr><td class="left">T:[°C] </td><td id="tempA">---</td><td id="tempB">---</td><td id="tempC">---</td></tr><tr><td class="left">P:[hPa]</td><td id="pressA">---</td><td id="pressB">---</td><td id="pressC">---</td></tr><tr><td class="left">H:[%] </td><td id="humidA">---</td><td id="humidB">---</td><td id="humidC">---</td></tr><tr><td class="left">X:[%] </td><td id="xtraA">---</td><td id="xtraB">---</td><td id="xtraC">---</td></tr></tbody></table>
Troon
(Troon)
June 28, 2023, 3:09pm
7
You can’t just guess select
statements. In this case, you want td#cpm1stB
.
The Scrape docs for select
point you to the BeautifulSoup docs, as that’s what it uses under the hood. You want this section:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors-through-the-css-property
For this select
, we need the id
selector:
However, as it’s a local webserver, do you have an option to publish the data through a better format, like JSON or XML? Scraping HTML is a horrible last resort option for when there’s no other way to do it.
Thank you for this example!
We have connected the GGreg20_V3 modules available on the GMCmap service to our Home Assistant.
Everything works perfectly!
Best Regards,
Oleksii
Team @iot-devices