In the past I used scrape quite heavily to get data from the web into HA.
Here an example from my configuration.yaml where I retrieve an exchange rate string from an open webpage:
When I installed the December update I found out that this is no longer the way to do it.
So I tried to muddle my way through the new scrape integration, and --------------- have no idea what to do.
Is there a tutorial, or is there anybody who could lead me through this new UI so I can generate the same sensor (sensor.chf_eur_string) I am using now?
Coment out scrape sensor setting in configuration.yaml.
Restart Home Asistant
Add Scrape integration
Form would open. Enter resource url.
Click Next
Add sensor by filling name and select expression fields
Submit
Thank you pedjas for your help - I did most of that, but when it comes to:
.
.
Click Next
Add sensor by filling name and select expression fields
.
.
That is where there are a number of cryptic fields that need to be filled in.
I seem not to be able to correlate these with the information I have - what goes where?
In the Select part you need to fill in the CSS selector WITHOUT quotes.
So use #keyelement_kurs_update > div.realtime-indicator > span
and not
“#keyelement_kurs_update > div.realtime-indicator > span”
So I re-read the documentation again and realized, that the GUI is not the exclusive new way to Scrape - thank god for that, because in its current format it is not really usable in my opinion - especially if you have a number of different resources (how is this even done in the GUI?).
So I went back to the configuration.yaml approach and it would appear, that the new format to achieve what I wanted is now something like this:
I added my scrape sensor using user interface but cannot find them in yaml to edit. I also prefer editing such stuff in text editor. It is much simpler.
I don’t think that the sensors you create with the GUI will show up in the configuration.yaml.
You have to edit the sensors via the GUI - go to the integration page - find the scrape integration - click on “configure” and then you can select and edit the sensor from that sub-menu.
Super complicated if you ask me and I never figured out how you assign a certain resource to a specific sensor. That GUI needs some re-work, at the moment not really user friendly.
I ended up deleting all sensors I tried to create with the GUI.
Then I went to the configuration.yaml and edited my existing scrape sensors as shown above.
Seems to work - for now I do not get the notice under *repairs" that I have to do anything else.
You can use both YAML or GUI to set up the SCRAPE sensor, but both ways don’t “sync” with each other. It’s what you personally prefer.
What is being deprecated however is configuring scrape as a platform in your configuration.yaml.
In the past, scrape was a platform under sensor in your configuration.yaml.
It looked like this in the past:
The above (platform) way is being deprecated. If you prefer configuration.yaml over GUI than you have to use scrape as an integration, not as a platform anymore. Example:
Do you see the difference? The same happened to MQTT, REST and template in the past. Platform got deprecated and it became an integration.
I hope I explained it good enough
OK - while the GUI takes a bit of getting used to, I am starting to like the functionality.
It seems as if the sensors you set in the GUI are auto-updating every 10 minutes.
If you need more/less frequent updates, you can disable the auto-update in the system options for each sensor. Then you have to add an automation to update them at your chosen time interval (or any other trigger).
Another irritating thing was the sensor name that appears by default in the little integration window.
Inspired by sammyke007 (who obviously is not a Tesla user and had nice names for the sensors) - I figured out how to change the names - by clicking on the 3 dots bottom right after selecting the sensor in the integration.
So my apologies to the devs (should they read my negative comments above ) - once you understand the GUI a bit more, it actually works quite well.
I really hope, this method survives.
I found the UI way just to complicated and I see a potential problem in it about sharing and caring.
Without YAML it is impossible to just copy this sensor and paste it on a forum or in a messaging app, so someone else can use it too. It will take a long talk about, where to put what in UI. Not a way of easing things in my mind…
Yes, I realized that, but too late. I already created all over again using GUI.
Docs should be more clear on that matter. At first, I actually thought that docs were not updated as it still explained entering config in configuration.yaml.
And, of course having, two totally separate ways to create configuration and configuration is not even shared among them is confusing by definition.
It still does not work as it should: there is the option scan_interval in the docs, but it has no effect whatsoever, scan is always done every 10 minutes…
But I will say that for a scrape feature where the user consumes bandwidth and server CPU time from a 3rd party a configurable scan interval is essential.
Either you make it configurable like it was in yaml.
Or you remove the scan interval completely and document that people should always use an automation to fetch the data.
Except for html resources that belong to yourself, a 10 minute scan interval is really bad human behaviour.
I run a website and I have banned many IP addresses because some punk suddenly starts auto scraping some detail every 10 minutes. You only need a few handfulls of those and your network link is overloaded. Please friends, do not scrape at 10 min intervals anywhere unless it is your own internal resources. And if it is your own resource then you should have enough control to use something far batter than html scraping.
10 minute default scan interval is luring in-experienced HA users into being bad internet citizens
No, I’d like to have much less frequent updates than 10 minutes, that is why I usually set up 3600 seconds in yaml, but the sensors are still updated every 10 minutes.
Seems to be a bug, so I’ve created a ticket about it.