Web scraper Sensor question

Hi, i’m trying to build a scrape for my weather station. I’ll be happy to share when I reach the 100% functionality.
I can scrape a raw of a table with hold the temprature with “select”:

<input name="outTemp" disabled type="text" class="item_2" style="WIDTH: 80px" value="8.3" maxlength="5">

now i need to set a value_teamplace in the scraping config for thet only the result of “value=8.30”, indeed the 8.3 (which could be also -1.5 if it’s freezing…).
How can i do this? Thanks since now!

i quote myself and report back in case others need that. with two split i’ll get the needed values.

{{ value.split('value=')[1].split('"')[1] }}

Can you help me with this other one?


  - platform: scrape
    resource: https://zzzzzz
    select: "div > div:nth-child(1) > div > div.PollenBreakdown--outlookContainer--3Jjts > ul > li:nth-child(1) > strong"

    name: zzzzzzz
    scan_interval: 310

This sensor returns me a “Alto” Value with is ok, but I want to map the values to numbers so I can graph them, I tried this but of course is not working:


  - platform: scrape
    resource: https://zzzzz
    select: "div > div:nth-child(1) > div > div.PollenBreakdown--outlookContainer--3Jjts > ul > li:nth-child(1) > strong"
    value_template: >
          {% if is_state("value", "Ninguno") %}
            "0"
          {% elif is_state("value", "Muy bajo") %}
            "1"
          {% elif is_state("value", "Bajo") %}
            "2"
          {% elif is_state("value", "Moderado") %}
            "3"
          {% elif is_state("value", "Alto") %}
            "4"
          {% elif is_state("value", "Muy alto") %}
            "5"
          {% elif is_state("value", "unavailable") %}
            false
          {% else %}
            true
          {%- endif %}
    name: zzzzz
    scan_interval: 310

working:

    value_template: >
          {% if value.lower() == 'ninguno' %}
          0
          {% elif value.lower() == 'muy bajo' %}
          1
          {% elif value.lower() == 'bajo' %}
          2
          {% elif value.lower() == 'moderado' %}
          3
          {% elif value.lower() == 'alto' %}
          4
          {% elif value.lower() == 'muy alto' %}
          5
          {% else %}
          -1
          {% endif %}

Have postet this one before a while ago: I have tried to use select command, but in this case I have no clue which word I could look for as the word I am trying to use is shown several times. Maybe somebody could help with this to get this values out to use in HA. I would need the three yellow marked figures in the screenshot.
The weblink for the site is http://77.119.243.51:86

I am afraid it will not be very easy as there are no selectors to find. You could try to search for all elements like in the example of the scrape integration:

# Example configuration.yaml entry
sensor:
  - platform: scrape
    resource: http://77.119.243.51:86
    name: Temperature Cellar
    select: "td"
    index: 60 #experiment with this number to find the correct value
    unit_of_measurement: "C"

Once you find the correct TD element it should be easy to clone this code and get the next value.

Hi Alekxsy,

thanks for your answer, I am not sure what you mean to clone the code and get next value? I can say that I am not an expert in this topic, so I feel it will be too complicated for me to get that to work.
So far I have added the example you gave me to my config, I have restarted HA but don’t see a sensor/entity called Temperature Cellar yet, so maybe I am doing something wrong.

In theory this should work, but I am getting an error in the logs… something to do with headers. Possibly related to the port being different…

sensor:
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Temperature Cellar
    select: "td"
    index: 7 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'
    unit_of_measurement: "°C"
   
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Hunidity Cellar
    select: "td"
    index: 8 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'
    unit_of_measurement: "%"
    
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Air Pressure Cellar
    select: "td"
    index: 9 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'

Hi and thanks for your response!
I have added your config to my setup , but I get the following error after restarting HA:

Logger: homeassistant.components.rest.data
Source: components/rest/data.py:69
Integration: RESTful (documentation, issues)
First occurred: 20:24:39 (12 occurrences)
Last logged: 20:28:14

Error fetching data: http://77.119.243.51:86 failed with illegal chunk header: bytearray(b’F9 \r\n’)
Error fetching data: http://77.119.243.51:86 failed with

maybe you know what that means… I have no clue to be honest

it looks like a bug.
I have submitted an issue here

You might try this

If someone could help that would be great. Thank you!!

Trying to get the current outdoor temp on this site. I am unable to get this to work.

Website is ambientweather.net/dashboard/kbck

Here is the select:

#root > div > div.page-container > div > div > div > div.device-device-realtime-dashboard > div > div > div.device-widget.square.temp > div.device-temp-widget.center-aligned > div > div.top > span > span.fdp-val

Not sure how to fix. Any ideas?

looks like the data is loaded via javascript, see page source:
<noscript>Sorry, Javascript must be enabled to use the Ambient Weather Dashboard.</noscript>
That means the data is not available when the page html conten is loaded and is loaded afterwards via javascript. that means you can’t scrape it.
May sugest either using one of the multiple weather integrations already available or create your own since Ambient Weather has an API

Thank you. I didnt notice that. dang.

Hi all, I need some help with my scraping.
I’m trying to extract the values of my solar panel controller web page. It used to work but I updated multiscrape and now it doesn’t support the property index anymore.
My web page is like that:

<html>
 <head>
   <meta http-equiv=pragma content=no-cache>
   <meta http-equiv=expire content=now>
   <title></title>
</head>
<body bgcolor=ffffff text=black><br><br>
 <table align=center border=1 cellpadding=0 cellspacing=0 bordercolor=#008000 bordercolorlight=#ffffff borderdark=#808000 width=1024>
  <center>
     <tr bgcolor=#43CD80>
         <td align=center>Inverter ID</td>
         <td align=center>Current Power</td>
         <td align=center>Grid Frequency</td>
         <td align=center>Grid Voltage</td>
         <td align=center>Temperature</td>
         <td align=center>Date</td>
        </tr>
      </center>
      <center>
         <tr>
            <td align=center>404000066234-A</td>
            <td align=center> 62&nbsp;W</td>
            <td align=center> 50.0&nbsp;Hz</td>
            <td align=center> 233&nbsp;V</td>
            <td align=center> 23&nbsp;<sup>o</sup>C</td>
           <td align=center> 2022-06-07 11:38:22</td>
        </tr>
        .....
    </table><br><br>
    <hr></hr><center><tr><td>&copy2013 Altenergy Power System Inc.</td></tr></center>
 </body>
</html>

I’m trying to extract the values in rows after the header. I tried body > table > tr:nth-child(2) > td:nth-child(1) and also table > center > tr:nth-child(2) > td:nth-child(5) but I keep getting errors like:

2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # Start scraping to update sensor
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Panel 1 Name # Tag selected: None
2022-06-10 12:54:34 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # Unable to scrape data: Could not find a tag for given selector.
Consider using debug logging and log_response for further investigation.
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # On-error, set value to None
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Panel 1 Name # Updated sensor and attributes, now adding to HA

Any idea what I’m doing wrong?
Thanks a lot, appreciate any help.

Could you enable file logging and post the HTML page as logged?

I’m using multiscrape to try to get my Logitech mouse battery level. The app I’m using provides this XML file:

<xml>
<device_id>devxxxxxxxxx</device_id>
<device_name>PRO X Wireless</device_name>
<device_type>Mouse</device_type>
<battery_voltage>-0,00</battery_voltage>
<battery_percent>100,00</battery_percent>
<charging>False</charging>
</xml>

My current code is this: (I used the Chrome inspect copy selector to get the selector)

multiscrape:
  - resource: http://mypcIP:12321/device/dev4f7137224093c0940000
    name: PRO X Wireless
    scan_interval: 60
    sensor:
      - name:       PRO X Wireless Battery level
        unique_id:  pro_x_wireless_battery_level
        icon:       mdi:mouse-bluetooth
        select:     "folder0 > div.opened > div:nth-child(5) > span:nth-child(2)"
    log_response: true

I’m getting the error PRO X Wireless # PRO X Wireless Battery level # Unable to scrape data: Could not find a tag for given selector

Any idea what I’m doing wrong?

I can’t test it but just try:
select: battery_percent

That worked, thanks so much!

Is there a reason why the copy selector method didn’t work? I used that before and I had good results.

Because it’s XML instead of HTML.
Good to hear it works!

1 Like

I have one more question, the scraper works perfectly, but it generates a lot of errors when my PC is off. Which makes sense of course, but is there a way to suppress errors for one “scrape” without just suppressing all Multiscrape errors?

Yes, see on-error