That Javascript builds a single-page app that uses websocket for its communication. When you use DevTools to find the element, you’re looking at what the Javascript has built. Scrape only sees the initial HTML (from View Source) as pasted above.
Working out a good select is a bit of understanding of HTML elements and attributes, BeautifulSoup selectors, plus how HA treats them, plus knowing how website connections work. The KerkStraat one won’t work because if you access it with a terminal — requesting the site via curl for example — it returns an error that you need Javascript and cookies enabled.
Try it: from a terminal, enter curl https://huispedia.nl/amsterdam/1017gl/kerkstraat/8 and note that what comes back isn’t the source you were expecting.
For the Yahoo! one, I looked at the source (pretty-printed via DevTools as the original is a mess):
and saw that we want the first fin-streamer element under the div with idLead-5-QuoteHeaderProxy. We could also use the next div down as our “anchor”, with idquote-header-info.
Note that ids should be unique within the document, whereas class can be the same for many elements. That’s important for understanding if your select is going to return one or many items.
Despite being a strong advocate of YAML, I use the UI for Scrape configuration as it allows for changes without restarting. Also check your logs if the entities aren’t returning what you expect.
It’s important to realise that Scrape should be treated as a last-resort way of accessing information. It is messy, unreliable and dependent on the site owners not changing their site structure. Always look for another source of the data, ideally something machine-readable like XML or JSON.
I dug deep into it again.
It is always looking at the first element? fin streamer in this case? What if I also want to scrape the second value? (regularMarketChangePercent). Can I use data_field instead of fin streamer? because fin streamer is displayed more than once.
Note that I’m still anchoring off the <div> with id="quote-header-info" as there are lots of other <fin-streamer>s including many with data-field="regularMarketChange" in the document.
It’s really easy to accidentally select the wrong thing, which is why I’d recommend using the UI rather than YAML for scrape — a YAML change to a scrape sensor currently requires a full restart.
It’s really easy to accidentally select the wrong thing, which is why I’d recommend using the UI rather than YAML for scrape — a YAML change to a scrape sensor currently requires a full restart.
The Quick Reload is identical to the “ALL YAML CONFIGURATION” action, which is the same as clicking all the individual items. Scrape isn’t one of them — multiscrape may be, but that’s not what’s being discussed here.
hm I thought I was almost there but it doesnt recognise the % as numeric value. That’s true because it adds an ( and ) before and after the numberic value I’m trying to scrape…
Can I somehow remove the ( ) ?
thats my last question haha
ValueError: Sensor sensor.test has device class None, state class None unit % and suggested precision None thus indicating it has a numeric value; however, it has the non-numeric value: (+0.96%) (<class 'str'>)
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/sensor/__init__.py", line 581, in state
numerical_value = float(value) # type:ignore[arg-type]
ValueError: could not convert string to float: '(+0.96%)'