Web scraper Sensor question

Hi,

I am trying to scrape few data from Accuweather webpage (they have nice working, in my opinion, Minutecast - minute by minute for next two hours forecast).
https://www.accuweather.com/en/cu/havana/122438/minute-weather-forecast/122438

Unfortunatelly I cannot find selector. I did try:
title, .title, .minutecast-dial .title, .minutecast-dial.title and all this options with p at the end.
No luck. All the time error:
ERROR (SyncWorker_18) [homeassistant.components.scrape.sensor] Unable to extract data from HTML
Any hint?

I managed to get this working from the Ginlong monitoring website after setting it to public but it doesn’t seem to be “recording” the information only displaying it on a card. Does anyone happen to know how I would get this to work please? I’m using InfluxDB atm for thermostats so would like to add it here if possible.

Here is my sensor in configuration.yaml:-

this returns :- image

Sorted it, The system wouldn’t record it as it was seeing it as a text value not a number as per below.

Hi Dietlman,
I’m trying to grab the temperature from the same HWg-STE box to my HA with no luck.
My code looks like this:
sensor:

  • platform: scrape
    resource: MY_HWG-STE_IP_XML_WEB_SITE
    name: Tamir_Boiler
    select: “#s215

And it shows me the same A sign in the temperature result.
Can you please share your code that works?
Where should I add the value_template ?

Thanks a lot,
Tamir

Hi Tamir,

this is how it works for me:

  • platform: scrape
    resource: http://my_sensor_http
    name: WW-Boiler
    select: ‘.value’
    value_template: ‘{{value[:-4] | float}}’
    unit_of_measurement: “°C”
    scan_interval: 360

Hi,
I’m trying to use this component but I get the following error during HA initialization:
FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Any idea how to solve the problem?
I’m using Home Assistant 0.113.1

arch | armv7l
dev | false
docker | false
hassio | false
installation_type | Home Assistant Core
os_name | Linux
os_version | 4.19.50-v7+
python_version | 3.7.3
timezone | UTC
version | 0.113.1
virtualenv | true

Hi, i’m trying to build a scrape for my weather station. I’ll be happy to share when I reach the 100% functionality.
I can scrape a raw of a table with hold the temprature with “select”:

<input name="outTemp" disabled type="text" class="item_2" style="WIDTH: 80px" value="8.3" maxlength="5">

now i need to set a value_teamplace in the scraping config for thet only the result of “value=8.30”, indeed the 8.3 (which could be also -1.5 if it’s freezing…).
How can i do this? Thanks since now!

i quote myself and report back in case others need that. with two split i’ll get the needed values.

{{ value.split('value=')[1].split('"')[1] }}

Can you help me with this other one?


  - platform: scrape
    resource: https://zzzzzz
    select: "div > div:nth-child(1) > div > div.PollenBreakdown--outlookContainer--3Jjts > ul > li:nth-child(1) > strong"

    name: zzzzzzz
    scan_interval: 310

This sensor returns me a “Alto” Value with is ok, but I want to map the values to numbers so I can graph them, I tried this but of course is not working:


  - platform: scrape
    resource: https://zzzzz
    select: "div > div:nth-child(1) > div > div.PollenBreakdown--outlookContainer--3Jjts > ul > li:nth-child(1) > strong"
    value_template: >
          {% if is_state("value", "Ninguno") %}
            "0"
          {% elif is_state("value", "Muy bajo") %}
            "1"
          {% elif is_state("value", "Bajo") %}
            "2"
          {% elif is_state("value", "Moderado") %}
            "3"
          {% elif is_state("value", "Alto") %}
            "4"
          {% elif is_state("value", "Muy alto") %}
            "5"
          {% elif is_state("value", "unavailable") %}
            false
          {% else %}
            true
          {%- endif %}
    name: zzzzz
    scan_interval: 310

working:

    value_template: >
          {% if value.lower() == 'ninguno' %}
          0
          {% elif value.lower() == 'muy bajo' %}
          1
          {% elif value.lower() == 'bajo' %}
          2
          {% elif value.lower() == 'moderado' %}
          3
          {% elif value.lower() == 'alto' %}
          4
          {% elif value.lower() == 'muy alto' %}
          5
          {% else %}
          -1
          {% endif %}

Have postet this one before a while ago: I have tried to use select command, but in this case I have no clue which word I could look for as the word I am trying to use is shown several times. Maybe somebody could help with this to get this values out to use in HA. I would need the three yellow marked figures in the screenshot.
The weblink for the site is http://77.119.243.51:86

I am afraid it will not be very easy as there are no selectors to find. You could try to search for all elements like in the example of the scrape integration:

# Example configuration.yaml entry
sensor:
  - platform: scrape
    resource: http://77.119.243.51:86
    name: Temperature Cellar
    select: "td"
    index: 60 #experiment with this number to find the correct value
    unit_of_measurement: "C"

Once you find the correct TD element it should be easy to clone this code and get the next value.

Hi Alekxsy,

thanks for your answer, I am not sure what you mean to clone the code and get next value? I can say that I am not an expert in this topic, so I feel it will be too complicated for me to get that to work.
So far I have added the example you gave me to my config, I have restarted HA but don’t see a sensor/entity called Temperature Cellar yet, so maybe I am doing something wrong.

In theory this should work, but I am getting an error in the logs… something to do with headers. Possibly related to the port being different…

sensor:
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Temperature Cellar
    select: "td"
    index: 7 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'
    unit_of_measurement: "°C"
   
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Hunidity Cellar
    select: "td"
    index: 8 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'
    unit_of_measurement: "%"
    
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Air Pressure Cellar
    select: "td"
    index: 9 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'

Hi and thanks for your response!
I have added your config to my setup , but I get the following error after restarting HA:

Logger: homeassistant.components.rest.data
Source: components/rest/data.py:69
Integration: RESTful (documentation, issues)
First occurred: 20:24:39 (12 occurrences)
Last logged: 20:28:14

Error fetching data: http://77.119.243.51:86 failed with illegal chunk header: bytearray(b’F9 \r\n’)
Error fetching data: http://77.119.243.51:86 failed with

maybe you know what that means… I have no clue to be honest

it looks like a bug.
I have submitted an issue here

You might try this

If someone could help that would be great. Thank you!!

Trying to get the current outdoor temp on this site. I am unable to get this to work.

Website is ambientweather.net/dashboard/kbck

Here is the select:

#root > div > div.page-container > div > div > div > div.device-device-realtime-dashboard > div > div > div.device-widget.square.temp > div.device-temp-widget.center-aligned > div > div.top > span > span.fdp-val

Not sure how to fix. Any ideas?

looks like the data is loaded via javascript, see page source:
<noscript>Sorry, Javascript must be enabled to use the Ambient Weather Dashboard.</noscript>
That means the data is not available when the page html conten is loaded and is loaded afterwards via javascript. that means you can’t scrape it.
May sugest either using one of the multiple weather integrations already available or create your own since Ambient Weather has an API

Thank you. I didnt notice that. dang.

Hi all, I need some help with my scraping.
I’m trying to extract the values of my solar panel controller web page. It used to work but I updated multiscrape and now it doesn’t support the property index anymore.
My web page is like that:

<html>
 <head>
   <meta http-equiv=pragma content=no-cache>
   <meta http-equiv=expire content=now>
   <title></title>
</head>
<body bgcolor=ffffff text=black><br><br>
 <table align=center border=1 cellpadding=0 cellspacing=0 bordercolor=#008000 bordercolorlight=#ffffff borderdark=#808000 width=1024>
  <center>
     <tr bgcolor=#43CD80>
         <td align=center>Inverter ID</td>
         <td align=center>Current Power</td>
         <td align=center>Grid Frequency</td>
         <td align=center>Grid Voltage</td>
         <td align=center>Temperature</td>
         <td align=center>Date</td>
        </tr>
      </center>
      <center>
         <tr>
            <td align=center>404000066234-A</td>
            <td align=center> 62&nbsp;W</td>
            <td align=center> 50.0&nbsp;Hz</td>
            <td align=center> 233&nbsp;V</td>
            <td align=center> 23&nbsp;<sup>o</sup>C</td>
           <td align=center> 2022-06-07 11:38:22</td>
        </tr>
        .....
    </table><br><br>
    <hr></hr><center><tr><td>&copy2013 Altenergy Power System Inc.</td></tr></center>
 </body>
</html>

I’m trying to extract the values in rows after the header. I tried body > table > tr:nth-child(2) > td:nth-child(1) and also table > center > tr:nth-child(2) > td:nth-child(5) but I keep getting errors like:

2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # Start scraping to update sensor
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Panel 1 Name # Tag selected: None
2022-06-10 12:54:34 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # Unable to scrape data: Could not find a tag for given selector.
Consider using debug logging and log_response for further investigation.
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # On-error, set value to None
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Panel 1 Name # Updated sensor and attributes, now adding to HA

Any idea what I’m doing wrong?
Thanks a lot, appreciate any help.

Could you enable file logging and post the HTML page as logged?