yeah, it’s kind of here
I spent some time to find out why we do it the way we do as here we potentially have at least YAML, Jinja and Python.
Eh? Regex doesn’t use a backslash to escape the literal meaning of a character? Of course it does. Certain characters are reserved for special use and, to escape their special meaning, you prepend them with a backslash.
From here:
If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash
A single period means match any character. Prepend it with a backslash and now it means match the literal period character.
May I ask if these kind of sensors could make HA to respond slower? or to consume resources or something?
Right now, in order to test and find out how accurate these stations are I am scrapping 5-6 different sites from areas that are close or very close to my home.
HA is installed in a NUC so I guess is not a problem but I would like to know the effect in overall performance.
Technically, ALL sensors make HA respond slower. As the database grows and the number of things it has to manage increases, more CPU is used.
Scrape sensors are polling sensors. So, every 30 seconds, it’s going to ask for a new update. And sadly, it seems like you can’t change this value easily. They should be async tasks, so they shouldn’t affect the core thread responsible for everything else. I’m not too familiar with the core architecture, just going by this
So, yes, these could make other things slower…but hopefully the core thread wont be affected. Doesn’t mean you won’t ever notice it though…especially if an automation needs to run an async task which might have to way for all of your other sensors to update before getting a chance to run.
It’s my understanding that the polling interval, for this polling-based sensor, can be adjusted using the scan_interval
option. The last example in the documentation shows it used to set the interval to 3600 seconds.
Beyond knowing of the scrape sensor’s existence, I have no practical experience with it. Do you know if scan_interval
works for it or is there a bug (or is the documentation incorrect)?
Oh, fair enough. I missed that completely and just referenced the first old forum I happen to come across. Looks like it was added early 2019ish?
But yeah, if it’s in the docs, it should work just fine. It’s been in there for 2 years, so it would have been fixed by now if it didn’t work I would assume…
Right now I have the following sensors.
I am going to watch their results and general performance (for example some maybe go offline) I am looking for a way to take an average result of the best 3 for my automation. I don’t know if you could suggest something more efficient.
can you give me an example of an average sensor for the best 3 stations?
I think it’s an old thing.
And without digging deep I just checked the code - the sensor is based on Entity class and it does accept scan_interval
variable from YAML because of that.
So it’s still possible to add scan_interval: 3600
to the sensors config and it’ll update once per hour.
All polling happens inside EntityPlatform class.
not sure if it’s what you’re looking for but take look at statistics or min/max sensor.
interesting. can this work and the other way? can it force the update eg every 5 sec? or there is a limit of 30 sec?
You’ll have to change it to find out!
there’s no limit. even 1 will work.
btw, check my previous post re. average
I will try min/max. looks close to what I need.thanks
As mentioned, you can reduce scan_interval
to 5 seconds (or less) but take a moment to consider the implications:
- You are hammering away at someone’s web-site every 5 seconds. Some sites may have an automatic rule in place that throttles or blocks requests from an overly demanding IP address.
- Your Home Assistant server is engaged in a very repetitive task that returns data that may not have changed significantly in the last 5 seconds (or even over the last 5 minutes).
- Outdoor temperature, humidity, pressure, wind-speed, etc doesn’t change much over a 5-second period even if the site is updating them at that frequency (many weather-data sites don’t).
Question: Is there no weather-data service (such as OpenWeatherMap) that can provide the information you seek so that you don’t have to resort to scraping a web-site? Or is there something you need that’s very specific to what is offered by http://chalandri.meteoclub.gr/
?
two more things to consider:
- database size (it will impact HA’s responsiveness)
- number of disk writes (if using SD card, will kill it much faster)
and yes, 5 sec for wind speed is a bit overkill.
my question was more for education reasons mostly. I am not going to perform such action.
by the way does anybody knows which is the default scan interval?
Unfortunately, the only one I now is the windguru page I asked in the begging but it not easy to be scrap.
However I have friend whos hobby is fishing and he told me that’s its by far the most accurate site especially for wind prediction and reporting
Thanks
Probably I will change it to 2-3 minutes.
I find it interesting that Scrape Sensor inherits from the EntityComponent helper class which has scan_interval set to 15 seconds but overrides it to 30 seconds.
I wonder how many other sensors (inheriting from the same class) also override scan_interval (or don’t)?
I found a station near me which seems accurate.
I can get the result in python script but when I configure the sensor the result is unknown.
Can someone spot the problem?
site: https://www.wunderground.com/dashboard/pws/IVRILISS2
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
# Change these 2 things
URL="https://www.wunderground.com/dashboard/pws/IVRILISS2"
# This is the select line you will use in the config
SELECT="lib-display-unit"
# You may need to use a template after the fact...
INDEX=2
r = requests.get(URL)
data=r.text
soup = BeautifulSoup(data)
#print(soup)
val = soup.select(SELECT)
print("********** Output of SELECT: **********")
for v in range(len(val)):
print(" index[{}]: {}".format(v, val[v].text))
print("***************************************")
value = val[INDEX].text
print(value)
results
********** Output of SELECT: **********
index[0]: 68.7 F
index[1]: 68.7
index[2]: 1.8
index[3]: 3.4 mph
index[4]: 34.3 F
index[5]: 0.00 in/hr
index[6]: 30.10 in
index[7]: 28 %
index[8]: 0.00 in
index[9]: 4
index[10]: 34.3 F
index[11]: 28 %
index[12]: 1.8
index[13]:
index[14]: 3.4 mph
index[15]: 30.10 in
index[16]: 0.00 in/hr
index[17]: 0.00 in
index[18]: 4
index[19]: 68.4 F
index[20]: 52.3 F
index[21]: 56.7 F
index[22]: 38.1 F
index[23]: 27.1 F
index[24]: 32.7 F
index[25]: 52 %
index[26]: 28 %
index[27]: 40 %
index[28]: 0.00 in
index[29]: 4.0 mph
index[30]: 0.0 mph
index[31]: 1.0 mph
index[32]: 5.8 mph
index[33]: 2.0 mph
index[34]: 30.10 in
index[35]: 30.05 in
***************************************
1.8
>>>
sensor
##Stathmos
- platform: scrape
resource: "https://www.wunderground.com/dashboard/pws/IVRILISS2"
select: ".lib-display-unit"
index: 2
name: vrilissia wunderground