Scrape sensor improved - scraping multiple values

@danieldotnl Will you update the integration to solve the following warning/requirement?

No 'version' key in the manifest file for custom integration 'multiscrape'. This will not be allowed in a future version of Home Assistant. Please report this to the maintainer of 'multiscrape'

Thank you. :slight_smile:

1 Like

Hello,

A little question.
Is it possible to maintain a HTTP livestream of a website in order to retrieve the data live and also update it?
Since my weather station offers a website where real-time data is played back, it would be cool if I could also use this.

I updated the opening post of this thread with an with an update on the new repository that’s now in the default HACS store. Please read this when you are using the multiscrape custom component!

Somehow I keep missing notifications from this thread, but the version has been added!

So you still do not plan adding scraping after logging into where required?

I’m actually looking into that, by popular request :grinning:

1 Like

Merging from my dev branch should not be that bad. It’s not up to date but async working just fine.

Biggest problem I have with it that it doesn’t allow logging to same page with multiple different credentials. So having multiple sensors for same page but with different credentials. It uses same session so it’s already logged in.

Hi,

I need help with two sensors not working, value shows up as empty.

select: #acc’ is not not working value ends up empty

<span id="acc">ON</span>

select: ‘#gpsSpeed’ works

<span id="gpsSpeed">0</span>

select: ‘#driverName’ works

<div class="col-sm-12 c-009934 fw-b" id="driverName">Uplander LT</div>

select: ‘#coordinate’ is not not working value ends up empty

<span id="coordinate">-73.63061 / 45.52565</span>
<input type="hidden" id="c_latitude" value="45.52565">
<input type="hidden" id="c_longitude" value="-73.63061">

Does it work in chrome? You can try out the css selectors in the chrome console like:

$$("#acc")

This is how my config looks,

select: “#coordinate” not working

select: “#gpsSpeed” working

select: “#acc” not working

select: “#gpsTime” not working

select: “#driverName” working

- platform: multiscrape
  resource: https://example.com/
  name: Uplander LT
  scan_interval: 10
  headers:
        User-Agent: Mozilla/5.0
  selectors:
        uplander_lt_location:
                name: Uplander LT Location
                select: "#coordinate"
                
        uplander_lt_speed:
                name: Uplander LT Speed
                select: "#gpsSpeed"

        uplander_lt_status:
               name: Uplander LT Status
               select: "#acc"

        uplander_lt_time:
                name: Uplander LT Update Time
                select: "#gpsTime"

        uplander_lt_drivername:
                 name: Uplander LT Driver Name
                 select: "#driverName"

I need aswell help here :frowning:

I thought i get it but i’m struggling for weeks now, i guess there are a few links missing in my brain

I have a local gateway with live weather data
How the heck do i get those plain htm lines out of this website
PLEAAAAAAAAAASE please help me out i’m totally stucked

This is a part of the livedata.htm page:
For example if i want to scrape the wind speed with the name avgwind i’m totally lost how to achieve the correct scrape configuration

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <title>LiveData</title>
        <link href="axcss0.css" rel="stylesheet" type="text/css" />
    </head>
    <body>
        <table width="800" border="0" align="center" cellpadding="0" cellspacing="0">
            <tr>
                <td colspan="2" align="right" bgcolor="#0088F7">&nbsp;</td>
            </tr>
            <tr>
                <td colspan="2" bgcolor="#FFFFFF"><table border="0" cellpadding="0" cellspacing="0">
                        <tr>
                            <td width="20" height = "80">&nbsp;</td>                                             
                            <td ><img src="img/1.jpg" width="74" height="80" ></td>
							<td width="10">&nbsp;</td>
                           	<td class="txtstyle_1" >AmbientWeather 4.5.8</td>                                   
                        </tr>
                </table></td>
            </tr>
            <tr> 
                <td colspan="2" align="right" bgcolor="#60B7FF">&nbsp;</td>
            </tr>
            <tr>
                <td colspan="2" align="left" bgcolor="#C0C0C0">
                    <table width="20" border="0" cellpadding="0" cellspacing="0">
                        <tr>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="bscsetting.htm">Local Network</a></div></td>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="weather.htm">Weather Network</a></div></td>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="station.htm">Station Settings</a></div></td>
                            <td bgcolor="#EDEFEF"><div class="menuitem_1"><a href="livedata.htm">Live Data</a></div></td>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="correction.htm">Calibration</a></div></td>     
                        </tr>
                    </table>
                </td>
            </tr>
            <form name="livedata" method="POST" onsubmit="return chkForm(0);">  
                <tr>
                    <td colspan="2" bgcolor="#EDEFEF">&nbsp;</td>
                </tr> 
                <tr>
                    <td colspan="2" bgcolor="#EDEFEF"><div class="subitem_1">Live Data</div></td>
                </tr>   
                 <tr>
                    <td width="448" bgcolor="#EDEFEF"><div class="item_1">Receiver Time:</div></td>
                    <td width="352" bgcolor="#EDEFEF">                    
                    <input name="CurrTime" disabled="disabled" type="text" class="item_2" style="WIDTH: 120px" value="19:07 5/27/2021" maxlength="16"/></td>
                </tr>

                <tr>
                    <td width="448" bgcolor="#EDEFEF"><div class="item_1">Indoor Sensor ID and  Battery </div></td>
                    <td width="352" bgcolor="#EDEFEF">
                    <input name="IndoorID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0x--" maxlength="5" />
                    <input name="inBattSta" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="- -" maxlength="12" />
                    </td>
                </tr>                
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor Sensor ID and Battery</div></td>
                    <td bgcolor="#EDEFEF">                    
                        <input name="Outdoor1ID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0xcb" maxlength="5" />
                        <input name="outBattSta1" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="Normal" maxlength="12" />
                    </td>
                </tr>                
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor2 Sensor ID and Battery</div></td>
                    <td bgcolor="#EDEFEF">                    
                        <input name="Outdoor2ID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0x--" maxlength="5" />
                        <input name="outBattSta2" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="- -" maxlength="12" />
                    </td>
                </tr>             

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Indoor Temperature</div></td>
                    <td bgcolor="#EDEFEF"><input name="inTemp" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="--.-" maxlength="5" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Indoor Humidity</div></td>
                    <td bgcolor="#EDEFEF"><input name="inHumi" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="--" maxlength="3" /></td>
                </tr>
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Absolute Pressure </div></td>
                    <td bgcolor="#EDEFEF"><input name="AbsPress" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="----" maxlength="6" /></td>
                </tr>	
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Relative Pressure </div></td>
                    <td bgcolor="#EDEFEF"><input name="RelPress" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="----" maxlength="6" /></td>
                </tr>	
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor Temperature</div></td>
                    <td bgcolor="#EDEFEF"><input name="outTemp" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="13.9" maxlength="5" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor Humidity </div></td>
                    <td bgcolor="#EDEFEF"><input name="outHumi" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="55" maxlength="3" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Wind Direction </div></td>
                    <td bgcolor="#EDEFEF"><input name="windir" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="255" maxlength="5" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Wind Speed </div></td>
                    <td bgcolor="#EDEFEF"><input name="avgwind" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="11.2" maxlength="5" /></td>
                </tr>

Small scraping guide:

  • Load the page in Chrome
  • Right-click the value you want to scrape
  • Choose ‘Inspect’
  • Right-click on the selected line in the html
  • Copy → Copy Selector

This is the value you should paste in the ‘select’ field of the sensor.
Note that sometimes Chrome adds add a ‘tbody’ element tables in the html which might not be in the original site’s html and therefore not retrieved by multiscrape. In this case, you need to remove those from the select field.
E.g.: table > tbody > tr:nth-child(7) > td:nth-child(2) > font > b
becomes:
table > tr:nth-child(7) > td:nth-child(2) > font > b

2 Likes

They might be in the original, if the site was properly-written. You just have to check manually.

Yup thanks, slightly updated the guide.

@everybody: feel free to contribute with some nice screencasts :slight_smile:

1 Like

Has anyone been testing pre-release 4.0.0? I would like to receive some feedback before releasing it as a stable release.

I’m a bit lost with the configuration of this integration.

Goal is to create 1 sensor with four attributes capturing the toner level of my printer. I tried:

# Dell Printer
multiscrape:
- resource: "http://printer/status.asp"
  scan_interval: 600
  sensor:
  - name: DELL Printer Toner Level
    selectors:
      cyan:
        name: Cyan
        select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(2) > td:nth-child(1) > b'
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
      magenta:
        name: Magenta
        select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(4) > td:nth-child(1) > b'
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
      yellow:
        name: Yellow
        select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(6) > td:nth-child(1) > b'
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
      black:
        name: Black
        select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(8)) > td:nth-child(1) > b'
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'

but this only results in:

Error while setting up multiscrape platform for sensor
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 250, in _async_setup_platform
    await asyncio.shield(task)
  File "/config/custom_components/multiscrape/sensor.py", line 34, in async_setup_platform
    if rest.data is None:
UnboundLocalError: local variable 'rest' referenced before assignment

How can I do that?

Hi, did you check the upgrade notes?
It should be like this (it will create different sensors though instead of attributes, that’s the goal of this component):

multiscrape:
- resource: "http://printer/status.asp"
  scan_interval: 600
  sensor:
  - name: cyan
    select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(2) > td:nth-child(1) > b'
    value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
  - name: magenta
    select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(4) > td:nth-child(1) > b'
    value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'  

Thanks for your support. Here is the working code:

# Dell Printer
multiscrape:
  - resource: "http://192.168.0.20/status.asp"
    scan_interval: 10
    sensor:
      - name: Cyan
        select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(2) > td:nth-child(1) > b"
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
      - name: Magenta
        select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(4) > td:nth-child(1) > b"
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
      - name: Yellow
        select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(6) > td:nth-child(1) > b"
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
      - name: Black
        select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(8) > td:nth-child(1) > b"
        value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'

However, this still does not do what I want. It now creates four separate sensors but I was under the impression that I can use the component to make the values available as attributes on one sensor instead of having four sensors. Or did I misunderstand?

1 Like

The main aim of multiscrape is to make multiple sensors available from a single REST call, rather than having to make multiple requests. If you want a single sensor with attributes, you can create a template sensor from your four separate ones, but that seems a bit pointless.

template:
   - sensor:
      - name: Dell printer inks
        state: 'dummy state'
        attributes:
          cyan: "{{ states('sensor.cyan') }}"
          magenta: "{{ states('sensor.magenta') }}"
          yellow: "{{ states('sensor.yellow') }}"
          black: "{{ states('sensor.black') }}"

Why would you prefer a single sensor with attributes over four individual ones, though, which can then have unit_of_measurement and are easily put into Lovelace?

Understood, thanks for clarifying. It might help to make your description clearer, at least I was confused. If you like, I can suggest a pull request.

I was looking for a combined sensor to keep things closer together that belong together and to avoid flooding of my entity space.

1 Like