Debugging scrape integration

I’m trying to scrape rain data off the Tempest weather station map using the Scrape integration configured via the UI. I was successful in scraping local pollen data, so am somewhat familiar with how Scrape works and CSS selectors. However, I can’t get this to work.

If I go to Tempest and select “rain yesterday” and inspect and copy CSS selector, I get “div.weather-tile:nth-child(3) > div:nth-child(2) > div:nth-child(3) > div:nth-child(2) > p:nth-child(2)” When I try to add that as the select, I get an “Unavailable”.

I tried reading the BeautifulSoup docs and attempted to select by attribute with

p[data-param=“param-precip_accumm_local_yesterday_final_display_with_units”]

but that also gave an Unavailable error.

Anyone have any idea what I’m doing wrong? Is there somewhere/how I can enable extended logging to get more visibility into what’s happening?

Thanks!

You won’t be able to use the scrape integration for this data, because it is not present in the original HTML that is first downloaded when you open the page. Instead, it is filled in by a websocket connection that gets the data and then continues to receive updates (as new weather observations are available from the station, I presume).

To see this yourself, you can do a “View Page Source” on the page. What you see here is all that is available to the Scrape integration.

There are some other options for more advanced scraping, but I see that there is a Tempest integration for Home Assistant. Any reason that isn’t working for you?

Ah, that explains it, thanks! I’m used to scraping with changedetection.io which uses a browser engine to create the web page prior to scraping. I have my changedetection instance sending me notifications via HA, so perhaps I can leverage that to scrape the rain data once a day and pull it out of the notification… Or perhaps I can find another source of local historical rain data and scrape that.

As for the Tempest integration, I don’t own a Tempest weather station, so I don’t have an API key. Tempest was the best source of local rain data I could find, but I guess it’s back to the drawing board.

Thanks again for the help!

If this page is your only option and you’re not afraid of a little bit of Javascript, the Browserless Chromium addon might work for you. The addon uses Puppeteer at heart to control a real web browser. This guide uses it to output the fully rendered HTML (in your case, after the websocket data has filled in the data) so you can then use “traditional” webscraping integrations to get the data you want into Home Assistant.

1 Like

Thanks for the tip! I was going to check out multi scrape also but I didn’t realize there was a browserless chromium addon. So many options I’m sure one will work out.

Thanks!