Scrape sensor improved - scraping multiple values

Wow the template seems to work great! Now I only have the issue that
tags are removed and just replaced by spaces, so I cannot format the text nicely.

Latest Multi Scape component is working fine for me. It’s logging-in (username/password) and scrape multiple values which are available as attributes on the sensor.

My question is: Is it possible to get the value of ‘totaal:’ as Entity State value? (current Entity State is none)

    selectors:
      levering_dag:
        name: Levering dag
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > a:nth-child(1)"
        unit_of_measurement: "kWh"
      levering_nacht:
        name: Levering nacht
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(2) > a:nth-child(1)"
        unit_of_measurement: "kWh"
      totaal:
        name: Totaal (levering – teruglevering)
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(5) > td:nth-child(2) > a:nth-child(1)"
        unit_of_measurement: "kWh"

1 Like

@drogfild how can I point to login form since it does not have any name? My page looks like this:
https://ebok.mpwik.lublin.pl/login

Hi @majkers! Good question and thanks for giving an example. After @BrianHanifin commit that “prelogin” script also looks for attribute name, id, class or action. So in your case you should be able to use it’s class

      preloginform: 'form-horizontal'

or even it’s action

      preloginform: '/login'

Unfortunately my script isn’t updated with latest multiscrape improvements and is based on quite old version of it. I haven’t yet got my head around async requests yet :expressionless:

1 Like

Hi,
I’m trying to get exchange rates as-of-today from national bank link, but seems i don’t write proper syntax.
After someone can illuminate me, then i will want to get with a single request exchage rates for 4 currencies. (tried openexchange, but for free api the result is only for usd and i want base currency to be “ron”
Thank you

 - platform: scrape
   resource: https://bnr.ro/Cursul-de-schimb-524.aspx
   select: "chf"
   name: leutu
   value_template: '{{ (value | int) / 10 }}'

I am trying to use the fork by @drogfild https://github.com/drogfild/hass-multiscrape to scrape data from my heat pump (thread https://community.home-assistant.io/t/is-there-any-interest-in-a-stiebel-eltron-climate-platform), but could not get login working. A GET request by curl, browser, or the requests module from python returns the expected content of the login page, but this module gets always a 400 - bad request page. Also tried to add a user-agent to headers, no luck.

Any idea what am I missing?

That’s weird. Haven’t experienced that error myself. Have you been able to verify if you get that error from the first page load or is it after login attempt?

Most probably doesn’t affect this problem, but I have beta version of my fork new version. It’s quite up to date with original. You can find it from dev branch. Config should be identical.

@danieldotnl Will you update the integration to solve the following warning/requirement?

No 'version' key in the manifest file for custom integration 'multiscrape'. This will not be allowed in a future version of Home Assistant. Please report this to the maintainer of 'multiscrape'

Thank you. :slight_smile:

1 Like

Hello,

A little question.
Is it possible to maintain a HTTP livestream of a website in order to retrieve the data live and also update it?
Since my weather station offers a website where real-time data is played back, it would be cool if I could also use this.

I updated the opening post of this thread with an with an update on the new repository that’s now in the default HACS store. Please read this when you are using the multiscrape custom component!

Somehow I keep missing notifications from this thread, but the version has been added!

So you still do not plan adding scraping after logging into where required?

I’m actually looking into that, by popular request :grinning:

1 Like

Merging from my dev branch should not be that bad. It’s not up to date but async working just fine.

Biggest problem I have with it that it doesn’t allow logging to same page with multiple different credentials. So having multiple sensors for same page but with different credentials. It uses same session so it’s already logged in.

Hi,

I need help with two sensors not working, value shows up as empty.

select: #acc’ is not not working value ends up empty

<span id="acc">ON</span>

select: ‘#gpsSpeed’ works

<span id="gpsSpeed">0</span>

select: ‘#driverName’ works

<div class="col-sm-12 c-009934 fw-b" id="driverName">Uplander LT</div>

select: ‘#coordinate’ is not not working value ends up empty

<span id="coordinate">-73.63061 / 45.52565</span>
<input type="hidden" id="c_latitude" value="45.52565">
<input type="hidden" id="c_longitude" value="-73.63061">

Does it work in chrome? You can try out the css selectors in the chrome console like:

$$("#acc")

This is how my config looks,

select: “#coordinate” not working

select: “#gpsSpeed” working

select: “#acc” not working

select: “#gpsTime” not working

select: “#driverName” working

- platform: multiscrape
  resource: https://example.com/
  name: Uplander LT
  scan_interval: 10
  headers:
        User-Agent: Mozilla/5.0
  selectors:
        uplander_lt_location:
                name: Uplander LT Location
                select: "#coordinate"
                
        uplander_lt_speed:
                name: Uplander LT Speed
                select: "#gpsSpeed"

        uplander_lt_status:
               name: Uplander LT Status
               select: "#acc"

        uplander_lt_time:
                name: Uplander LT Update Time
                select: "#gpsTime"

        uplander_lt_drivername:
                 name: Uplander LT Driver Name
                 select: "#driverName"

I need aswell help here :frowning:

I thought i get it but i’m struggling for weeks now, i guess there are a few links missing in my brain

I have a local gateway with live weather data
How the heck do i get those plain htm lines out of this website
PLEAAAAAAAAAASE please help me out i’m totally stucked

This is a part of the livedata.htm page:
For example if i want to scrape the wind speed with the name avgwind i’m totally lost how to achieve the correct scrape configuration

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <title>LiveData</title>
        <link href="axcss0.css" rel="stylesheet" type="text/css" />
    </head>
    <body>
        <table width="800" border="0" align="center" cellpadding="0" cellspacing="0">
            <tr>
                <td colspan="2" align="right" bgcolor="#0088F7">&nbsp;</td>
            </tr>
            <tr>
                <td colspan="2" bgcolor="#FFFFFF"><table border="0" cellpadding="0" cellspacing="0">
                        <tr>
                            <td width="20" height = "80">&nbsp;</td>                                             
                            <td ><img src="img/1.jpg" width="74" height="80" ></td>
							<td width="10">&nbsp;</td>
                           	<td class="txtstyle_1" >AmbientWeather 4.5.8</td>                                   
                        </tr>
                </table></td>
            </tr>
            <tr> 
                <td colspan="2" align="right" bgcolor="#60B7FF">&nbsp;</td>
            </tr>
            <tr>
                <td colspan="2" align="left" bgcolor="#C0C0C0">
                    <table width="20" border="0" cellpadding="0" cellspacing="0">
                        <tr>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="bscsetting.htm">Local Network</a></div></td>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="weather.htm">Weather Network</a></div></td>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="station.htm">Station Settings</a></div></td>
                            <td bgcolor="#EDEFEF"><div class="menuitem_1"><a href="livedata.htm">Live Data</a></div></td>
                            <td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="correction.htm">Calibration</a></div></td>     
                        </tr>
                    </table>
                </td>
            </tr>
            <form name="livedata" method="POST" onsubmit="return chkForm(0);">  
                <tr>
                    <td colspan="2" bgcolor="#EDEFEF">&nbsp;</td>
                </tr> 
                <tr>
                    <td colspan="2" bgcolor="#EDEFEF"><div class="subitem_1">Live Data</div></td>
                </tr>   
                 <tr>
                    <td width="448" bgcolor="#EDEFEF"><div class="item_1">Receiver Time:</div></td>
                    <td width="352" bgcolor="#EDEFEF">                    
                    <input name="CurrTime" disabled="disabled" type="text" class="item_2" style="WIDTH: 120px" value="19:07 5/27/2021" maxlength="16"/></td>
                </tr>

                <tr>
                    <td width="448" bgcolor="#EDEFEF"><div class="item_1">Indoor Sensor ID and  Battery </div></td>
                    <td width="352" bgcolor="#EDEFEF">
                    <input name="IndoorID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0x--" maxlength="5" />
                    <input name="inBattSta" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="- -" maxlength="12" />
                    </td>
                </tr>                
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor Sensor ID and Battery</div></td>
                    <td bgcolor="#EDEFEF">                    
                        <input name="Outdoor1ID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0xcb" maxlength="5" />
                        <input name="outBattSta1" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="Normal" maxlength="12" />
                    </td>
                </tr>                
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor2 Sensor ID and Battery</div></td>
                    <td bgcolor="#EDEFEF">                    
                        <input name="Outdoor2ID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0x--" maxlength="5" />
                        <input name="outBattSta2" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="- -" maxlength="12" />
                    </td>
                </tr>             

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Indoor Temperature</div></td>
                    <td bgcolor="#EDEFEF"><input name="inTemp" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="--.-" maxlength="5" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Indoor Humidity</div></td>
                    <td bgcolor="#EDEFEF"><input name="inHumi" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="--" maxlength="3" /></td>
                </tr>
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Absolute Pressure </div></td>
                    <td bgcolor="#EDEFEF"><input name="AbsPress" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="----" maxlength="6" /></td>
                </tr>	
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Relative Pressure </div></td>
                    <td bgcolor="#EDEFEF"><input name="RelPress" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="----" maxlength="6" /></td>
                </tr>	
                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor Temperature</div></td>
                    <td bgcolor="#EDEFEF"><input name="outTemp" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="13.9" maxlength="5" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Outdoor Humidity </div></td>
                    <td bgcolor="#EDEFEF"><input name="outHumi" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="55" maxlength="3" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Wind Direction </div></td>
                    <td bgcolor="#EDEFEF"><input name="windir" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="255" maxlength="5" /></td>
                </tr>

                <tr>
                    <td bgcolor="#EDEFEF"><div class="item_1">Wind Speed </div></td>
                    <td bgcolor="#EDEFEF"><input name="avgwind" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="11.2" maxlength="5" /></td>
                </tr>

Small scraping guide:

  • Load the page in Chrome
  • Right-click the value you want to scrape
  • Choose ‘Inspect’
  • Right-click on the selected line in the html
  • Copy → Copy Selector

This is the value you should paste in the ‘select’ field of the sensor.
Note that sometimes Chrome adds add a ‘tbody’ element tables in the html which might not be in the original site’s html and therefore not retrieved by multiscrape. In this case, you need to remove those from the select field.
E.g.: table > tbody > tr:nth-child(7) > td:nth-child(2) > font > b
becomes:
table > tr:nth-child(7) > td:nth-child(2) > font > b

2 Likes

They might be in the original, if the site was properly-written. You just have to check manually.