Help with scrap sensor

@jocnnor

Thanks.Very useful to know (its easy for my to implement)
However I tried both of your suggestions and I get state unknown.
Can you find out why this is happening?

##Stathmos Xalandri
  - platform: scrape
    resource: "http://chalandri.meteoclub.gr/"
    select: ".pricetab > .price > h2"
    index: 1
    name: xalandri
    value_template: "{{ value[:-5] }}"
    #value_template: '{{ value | regex_findall_index(find="\d+") }}'

Can we rephrase it like this?
Inside double quotes backslash (\) can be used to instruct YAML parser to treat next character as part of the string (it’s called escaping) - for example, that way another double quote(“) will not be considered as end of the string - like “this string has \” in it”.
Single quoted strings do not do anything like that so this won’t work ‘this string has \’ in it’ with or without a backslash.

In case of regular expressions their syntax requires backslashes but not for escaping, that’s the way we represent them so we want them to remain \d, for example.
Considering the above, one way to preserve these backslashes is to use single quotes around regular expressions. (Alternatively, you can still use double quotes but you’ll need to escape each backslash with a backslash).

tl;dr I know that @jocnnor you know that, just wanted to give @Makis a bit more comprehensive explanation as “escape characters (i.e., the backslashes) are being consumed by the YAML parser” can sound a bit abstract for some :wink:

2 Likes

my mistake
I kept wrong select: address (from the previous stations)
I replace it and it is working.
Sorry

Perfect. I only kind of understood it, but not well enough to be able to explain it. Thanks!

yeah, it’s kind of here
I spent some time to find out why we do it the way we do as here we potentially have at least YAML, Jinja and Python.

Eh? Regex doesn’t use a backslash to escape the literal meaning of a character? Of course it does. Certain characters are reserved for special use and, to escape their special meaning, you prepend them with a backslash.

From here:

If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash

A single period means match any character. Prepend it with a backslash and now it means match the literal period character.

May I ask if these kind of sensors could make HA to respond slower? or to consume resources or something?

Right now, in order to test and find out how accurate these stations are I am scrapping 5-6 different sites from areas that are close or very close to my home.
HA is installed in a NUC so I guess is not a problem but I would like to know the effect in overall performance.

Technically, ALL sensors make HA respond slower. As the database grows and the number of things it has to manage increases, more CPU is used.

Scrape sensors are polling sensors. So, every 30 seconds, it’s going to ask for a new update. And sadly, it seems like you can’t change this value easily. They should be async tasks, so they shouldn’t affect the core thread responsible for everything else. I’m not too familiar with the core architecture, just going by this

So, yes, these could make other things slower…but hopefully the core thread wont be affected. Doesn’t mean you won’t ever notice it though…especially if an automation needs to run an async task which might have to way for all of your other sensors to update before getting a chance to run.

1 Like

It’s my understanding that the polling interval, for this polling-based sensor, can be adjusted using the scan_interval option. The last example in the documentation shows it used to set the interval to 3600 seconds.

Beyond knowing of the scrape sensor’s existence, I have no practical experience with it. Do you know if scan_interval works for it or is there a bug (or is the documentation incorrect)?

Oh, fair enough. I missed that completely and just referenced the first old forum I happen to come across. Looks like it was added early 2019ish?

But yeah, if it’s in the docs, it should work just fine. It’s been in there for 2 years, so it would have been fixed by now if it didn’t work I would assume…

Right now I have the following sensors.
I am going to watch their results and general performance (for example some maybe go offline) I am looking for a way to take an average result of the best 3 for my automation. I don’t know if you could suggest something more efficient.
can you give me an example of an average sensor for the best 3 stations?

Annotation 2020-04-09 222111

I think it’s an old thing.
And without digging deep I just checked the code - the sensor is based on Entity class and it does accept scan_interval variable from YAML because of that.

So it’s still possible to add scan_interval: 3600 to the sensors config and it’ll update once per hour.

All polling happens inside EntityPlatform class.

not sure if it’s what you’re looking for but take look at statistics or min/max sensor.

interesting. can this work and the other way? can it force the update eg every 5 sec? or there is a limit of 30 sec?

You’ll have to change it to find out!

there’s no limit. even 1 will work.
btw, check my previous post re. average

I will try min/max. looks close to what I need.thanks

As mentioned, you can reduce scan_interval to 5 seconds (or less) but take a moment to consider the implications:

  • You are hammering away at someone’s web-site every 5 seconds. Some sites may have an automatic rule in place that throttles or blocks requests from an overly demanding IP address.
  • Your Home Assistant server is engaged in a very repetitive task that returns data that may not have changed significantly in the last 5 seconds (or even over the last 5 minutes).
  • Outdoor temperature, humidity, pressure, wind-speed, etc doesn’t change much over a 5-second period even if the site is updating them at that frequency (many weather-data sites don’t).

Question: Is there no weather-data service (such as OpenWeatherMap) that can provide the information you seek so that you don’t have to resort to scraping a web-site? Or is there something you need that’s very specific to what is offered by http://chalandri.meteoclub.gr/?

two more things to consider:

  • database size (it will impact HA’s responsiveness)
  • number of disk writes (if using SD card, will kill it much faster)

and yes, 5 sec for wind speed is a bit overkill.

my question was more for education reasons mostly. I am not going to perform such action.
by the way does anybody knows which is the default scan interval?

Unfortunately, the only one I now is the windguru page I asked in the begging but it not easy to be scrap.
However I have friend whos hobby is fishing and he told me that’s its by far the most accurate site especially for wind prediction and reporting

it’s 30s by default