Web scraper Sensor question

dietlman · October 1, 2021, 7:40am

Have postet this one before a while ago: I have tried to use select command, but in this case I have no clue which word I could look for as the word I am trying to use is shown several times. Maybe somebody could help with this to get this values out to use in HA. I would need the three yellow marked figures in the screenshot.
The weblink for the site is http://77.119.243.51:86

alekseyn · October 5, 2021, 2:48am

I am afraid it will not be very easy as there are no selectors to find. You could try to search for all elements like in the example of the scrape integration:

# Example configuration.yaml entry
sensor:
  - platform: scrape
    resource: http://77.119.243.51:86
    name: Temperature Cellar
    select: "td"
    index: 60 #experiment with this number to find the correct value
    unit_of_measurement: "C"

Once you find the correct TD element it should be easy to clone this code and get the next value.

dietlman · October 5, 2021, 8:35am

Hi Alekxsy,

thanks for your answer, I am not sure what you mean to clone the code and get next value? I can say that I am not an expert in this topic, so I feel it will be too complicated for me to get that to work.
So far I have added the example you gave me to my config, I have restarted HA but don’t see a sensor/entity called Temperature Cellar yet, so maybe I am doing something wrong.

alekseyn · October 6, 2021, 6:00pm

In theory this should work, but I am getting an error in the logs… something to do with headers. Possibly related to the port being different…

sensor:
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Temperature Cellar
    select: "td"
    index: 7 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'
    unit_of_measurement: "°C"
   
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Hunidity Cellar
    select: "td"
    index: 8 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'
    unit_of_measurement: "%"
    
  - platform: scrape
    resource: "http://77.119.243.51:86"
    name: Air Pressure Cellar
    select: "td"
    index: 9 #experiment with this number to find the correct value
    value_template: '{{ ((value.split(" ")[0]) | replace (",", ".")) }}'

dietlman · October 6, 2021, 6:30pm

Hi and thanks for your response!
I have added your config to my setup , but I get the following error after restarting HA:

Logger: homeassistant.components.rest.data
Source: components/rest/data.py:69
Integration: RESTful (documentation, issues)
First occurred: 20:24:39 (12 occurrences)
Last logged: 20:28:14

Error fetching data: http://77.119.243.51:86 failed with illegal chunk header: bytearray(b’F9 \r\n’)
Error fetching data: http://77.119.243.51:86 failed with

maybe you know what that means… I have no clue to be honest

alekseyn · October 6, 2021, 10:08pm

it looks like a bug.
I have submitted an issue here

You might try this

scubieman · April 13, 2022, 7:29pm

If someone could help that would be great. Thank you!!

Trying to get the current outdoor temp on this site. I am unable to get this to work.

Website is ambientweather.net/dashboard/kbck

Here is the select:

#root > div > div.page-container > div > div > div > div.device-device-realtime-dashboard > div > div > div.device-widget.square.temp > div.device-temp-widget.center-aligned > div > div.top > span > span.fdp-val

Not sure how to fix. Any ideas?

lolouk44 · April 13, 2022, 9:24pm

looks like the data is loaded via javascript, see page source:
<noscript>Sorry, Javascript must be enabled to use the Ambient Weather Dashboard.</noscript>
That means the data is not available when the page html conten is loaded and is loaded afterwards via javascript. that means you can’t scrape it.
May sugest either using one of the multiple weather integrations already available or create your own since Ambient Weather has an API

scubieman · April 13, 2022, 9:36pm

Thank you. I didnt notice that. dang.

Backus · June 10, 2022, 11:32pm

Hi all, I need some help with my scraping.
I’m trying to extract the values of my solar panel controller web page. It used to work but I updated multiscrape and now it doesn’t support the property index anymore.
My web page is like that:

<html>
 <head>
   <meta http-equiv=pragma content=no-cache>
   <meta http-equiv=expire content=now>
   <title></title>
</head>
<body bgcolor=ffffff text=black><br><br>
 <table align=center border=1 cellpadding=0 cellspacing=0 bordercolor=#008000 bordercolorlight=#ffffff borderdark=#808000 width=1024>
  <center>
     <tr bgcolor=#43CD80>
         <td align=center>Inverter ID</td>
         <td align=center>Current Power</td>
         <td align=center>Grid Frequency</td>
         <td align=center>Grid Voltage</td>
         <td align=center>Temperature</td>
         <td align=center>Date</td>
        </tr>
      </center>
      <center>
         <tr>
            <td align=center>404000066234-A</td>
            <td align=center> 62&nbsp;W</td>
            <td align=center> 50.0&nbsp;Hz</td>
            <td align=center> 233&nbsp;V</td>
            <td align=center> 23&nbsp;<sup>o</sup>C</td>
           <td align=center> 2022-06-07 11:38:22</td>
        </tr>
        .....
    </table><br><br>
    <hr></hr><center><tr><td>&copy2013 Altenergy Power System Inc.</td></tr></center>
 </body>
</html>

I’m trying to extract the values in rows after the header. I tried body > table > tr:nth-child(2) > td:nth-child(1) and also table > center > tr:nth-child(2) > td:nth-child(5) but I keep getting errors like:

2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # Start scraping to update sensor
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Panel 1 Name # Tag selected: None
2022-06-10 12:54:34 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # Unable to scrape data: Could not find a tag for given selector.
Consider using debug logging and log_response for further investigation.
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Panel 1 Name # On-error, set value to None
2022-06-10 12:54:34 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Panel 1 Name # Updated sensor and attributes, now adding to HA

Any idea what I’m doing wrong?
Thanks a lot, appreciate any help.

danieldotnl · June 20, 2022, 8:23am

Could you enable file logging and post the HTML page as logged?

Sjorsa · January 3, 2023, 6:55pm

I’m using multiscrape to try to get my Logitech mouse battery level. The app I’m using provides this XML file:

<xml>
<device_id>devxxxxxxxxx</device_id>
<device_name>PRO X Wireless</device_name>
<device_type>Mouse</device_type>
<battery_voltage>-0,00</battery_voltage>
<battery_percent>100,00</battery_percent>
<charging>False</charging>
</xml>

My current code is this: (I used the Chrome inspect copy selector to get the selector)

multiscrape:
  - resource: http://mypcIP:12321/device/dev4f7137224093c0940000
    name: PRO X Wireless
    scan_interval: 60
    sensor:
      - name:       PRO X Wireless Battery level
        unique_id:  pro_x_wireless_battery_level
        icon:       mdi:mouse-bluetooth
        select:     "folder0 > div.opened > div:nth-child(5) > span:nth-child(2)"
    log_response: true

I’m getting the error PRO X Wireless # PRO X Wireless Battery level # Unable to scrape data: Could not find a tag for given selector

Any idea what I’m doing wrong?

danieldotnl · January 6, 2023, 2:54pm

I can’t test it but just try:
select: battery_percent

Sjorsa · January 6, 2023, 7:32pm

That worked, thanks so much!

Is there a reason why the copy selector method didn’t work? I used that before and I had good results.

danieldotnl · January 6, 2023, 7:59pm

Because it’s XML instead of HTML.
Good to hear it works!

Sjorsa · January 11, 2023, 5:43pm

I have one more question, the scraper works perfectly, but it generates a lot of errors when my PC is off. Which makes sense of course, but is there a way to suppress errors for one “scrape” without just suppressing all Multiscrape errors?

danieldotnl · January 12, 2023, 7:51pm

Yes, see on-error

rr19-hub · February 11, 2023, 3:33pm

Hello, I’ve been trying to retrieve the status of urbackup for hours. The URL is local and can be scraped, but so far I have only succeeded with the selector div > div:nth-child(1), but here I only get the alt-text for the top right button (“Toggle navigation UrBackup”).
If I copy the selector via Chrome or Firefox, I get for example #status_table > tbody > tr.even > td:nth-child(5) or #status_table_wrapper > div.dataTables_scroll > div.dataTables_scrollHead. And very reliably the message ‘unavailable’. Also all attempts with #body, #tbody, #root or other selectors after div lead to the message ‘unavailable’.
Does anyone have a tip how I could somehow get into the table?

When I look in the log, I find the following messages:

“Unable to scrape data: Could not find a tag for given selector Consider using debug logging and log_response for further investigation.”
“homeassistant.exceptions.InvalidStateError: Invalid state encountered for entity ID: sensor.urbackup_server. State max length is 255 characters.”

This is strange, because the ID has a normal length:

- resource: http://192.168.xx.xx:55414
  scan_interval: 3600
  sensor:
    - unique_id: urbackup-server
      name: Urbackup RP4
      select: '.div.dataTables_scrollBody > table > tbody > tr:first-child'

rr19-hub · February 12, 2023, 9:07am

It seems to be java generated, so I solved it with Python: GitHub - uroni/urbackup-server-python-web-api-wrapper: Python wrapper to access and control an UrBackup server

n3xT · May 3, 2023, 12:50pm

Hi,

anyone for helping me.

I try to get the EPEX DAM value from this site : Paramètres d'indexation d'électricité | ENGIE

but I cannot get the value, I still get this error : * Index ‘0’ not found in sensor.epex_dam