Scrape sensor improved - scraping multiple values

I try to login at https://login.ns.nl/login.
And i my config is:

multiscrape:
  - resource: 'https://www.ns.nl/mijnns#/dashboard'
    form_submit:
      submit_once: False
      resource: 'https://login.ns.nl/login'
      select: '#loginForm'
      input:
        email: '[email protected]'
        password: 'PASSWORD'
    sensor:
      - unique_id: ns_test
        name: NS
        select: '#mijnns-app > mns-app > div > mijnns-dashboard > div > div > div.grid__unit.s-4-4.m-12-12.l-8-12.ng-tns-c157-0 > div > section > div > ovcp-overview > div > div:nth-child(1) > div > h3'

But i get status unknown…

Hi @danieldotnl, I’m try to add my (scraped) Solar values into the new Energy Monitor.
There are a few requirements for that and one of them is the last_reset parameter.

      - name: Energy Solar Total
        device_class: energy
        state_class: measurement
        last_reset: '1970-01-01T00:00:00+00:00'
        unit_of_measurement: "kWh"
        icon: "mdi:solar-panel"
        select: "#param_link_15684226"
        value_template: "{{value.split(' ')[0] | replace(',', '.')}}"

Configuration invalid
Invalid config for [multiscrape]: [last_reset] is an invalid option for [multiscrape]. Check: multiscrape->multiscrape->0->sensor->6->last_reset. (See /config/configuration.yaml, line 55).

However, the last_reset parameter isn’t a valid option for Multiscrape sensor (yet).
Is it perhaps possible to add this parameter for the Multiscrape Component?
It would be very helpful. :slight_smile:

Pre-release v5.4.0 supports fixed values for sensors/attributes by using the value_template and omitting a select. So you can add the last_reset attribute like this:

attributes:
  - name: last_reset
    value_template: '1970-01-01T00:00:00+00:00'

Daniel, thanks again. Works perfectly.

Hey Guys,

I’m trying to scrape from this site https://covidlive.com.au to look at the New Cases in the Last 24 hours, but no matter which way I try, I get an invalid response from the integration.

e.g.

#content > div > div:nth-child(1) > section > table > tbody > tr:nth-child(2) > td.COL5.NET > span

doesn’t return the value in red, which at the moment is 80.

any ideas what I’m doing wrong? It works perfectly for scraping from other sites.

image

and the logs show me this:

image

Hi,

Looks like something has broken since the core September updates: my sensors, which were working perfectly until the end of August, now all have the value ‘unknown’.

What is weird is that, in the debug log, the multiscrape integration happily shows the right values. For instance:

2021-09-04 11:39:16 DEBUG (MainThread) [custom_components.multiscrape.sensor] Sensor Température piscine selected: 26.0

The values just don’t seem to make it from multiscrape to the HA sensors.

I wonder if this has something to do with the recent changes in sensors for the new long-term statistics. I tried adding

        device_class: temperature
        state_class: measurement
        force_update: true

to no avail.

Any idea?

Please install the latest pre-release (or wait for a regular release). See: https://github.com/danieldotnl/ha-multiscrape/issues/55
and
https://github.com/danieldotnl/ha-multiscrape/issues/50

1 Like

Thanks a lot, works perfectly :+1:t2:

Enjoy your holiday!

It’s the ‘tbody’ bit. Just remove it and you should be good (other than the depression due to the numbers themselves!) eg.

 
  - resource: 'https://covidlive.com.au/'
    scan_interval: 120
    sensor:
      - select: '#content > div > div:nth-child(1) > section > table > tr:nth-child(2) > td.COL5.NET'
        name: multiscrape-COVID-NSW    
      - select: '#content > div > div:nth-child(1) > section > table > tr:nth-child(3) > td.COL5.NET'
        name: multiscrape-COVID-Vic    

Cheers

Great tool. I wonder if anyone could help me though - I’m having some issues trying to get it to login to one site. Not sure if it is something specific to the site, or if I’ve missed something, but any suggestions are welcome!

  - resource: 'https://connectmypool.com.au/Account/Chemistry.aspx'
    scan_interval: 120
    form_submit:
      submit_once: True
      resource: 'https://connectmypool.com.au/'
      select: "#loginPanel > table > tr > td:nth-child(2) > table > tr:nth-child(2) > td:nth-child(2)"
# Have also tried:
#      select: "#ucLogin1_txtUserName"
#      select: "#loginPanel > table > tbody > tr > td:nth-child(2) > table > tbody > tr:nth-child(2) > td:nth-child(2)"
      input:
        username: [email protected]
        password: 'passw0rd here'
        extra: field
    sensor:
      - select: '#dvPh > table > tr > td.chem_val_td > div'
        name: multiscrape-pH

I don’t immediately spot a mistake. Any errors in debug logging?

Thanks for looking, Daniel. Nothing that points to what the issue is, but I’ll keep hunting. Cheers.

Minor update, going through the logs in debug mode I am pretty sure now that multiscrape is successfully logging on with:

      select: "#loginPanel

as I am seeing happy messages like:

2021-09-11 09:01:24 DEBUG (MainThread) [httpx._client] HTTP Request: GET https://connectmypool.com.au/Account/Chemistry.aspx "HTTP/1.1 302 Found"

I am however now seeing errors such as:

2021-09-11 09:01:25 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range

when it is trying to populate the sensor. So a little more troubleshooting required. :slight_smile:

This is the raw html being generated on the page being scraped, and I’m just trying to get the data out of #lblPHMeasure which in this case is currently 7.9. No longer seeing index out of range errors, just:

2021-09-11 13:52:46 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor multiscrape-pH was unable to extract data from HTML

I’m getting the same error if I just use the standard scrape sensor, so starting to think that maybe something else is happening here - I might move on to do something else for now so I can look at it with fresh eyes later…

    <div id="updpnlChemistry">
	
            <div class="con_area" style="min-height: 200px;">
                <span id="lblMessage"></span>
                <div id="dvAllData">

                    <div id="dvPh" class="cg">
                        <div class="chem_gauge">
                             <img src="../App_Themes/Default/images/gauge-needle.png" id="imgNeedle" style="position:relative;left:28px;top:52px;" />                          
                       </div>
                       <table class="chem_table">

                            <tr>
                                <td class="chem_txt_td">Ph Level:</td>
                                <td class="chem_val_td">
                                    <div class="chem_label chem_txtok">
                                        <span id="lblPHMeasure">7.9</span>
                                    </div>
                                    <br /><small><span id="lblPHLast">29 minutes ago</span></small>
                                </td> 
                            </tr>

                        </table>
                    </div>
                              
                    <div class="cg">
                        <table class="chem_table">

                            <tr id="dvOM">
		<td class="chem_txt_td">ORP Status:</td>
		<td class="chem_val_td">
                                    <div id="dvORPMeasure" class="chem_label chem_txtok">
                                        <span id="lblORPMeasure">OK</span>
                                    </div>
                                    <br /><small><span id="lblORPLast">34 minutes ago</span></small>
                                </td>
	</tr>

Thanks for this component!
I have an issue with my boiler page.

I am getting error messages from the logs:

2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.scraper] Updating from http://10.4.149.20/login.cgi
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.scraper] Submitting form data {'username': 'secret', 'password': 'secret2', 'submit': None, 'extra': 'field'} to http://10.4.149.20/login.cgi
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.scraper] Updating from http://10.4.149.20
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape] Finished fetching scraper data data in 0.969 seconds (success: True)
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor boiler_temp was unable to extract data from HTML
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor solar_temp_tpo was unable to extract data from HTML
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor solar_temp_tpm was unable to extract data from HTML
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.binary_sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.binary_sensor] Sensor boiler_state was unable to extract data from HTML

My configuration looks like this:

multiscrape:
  - resource: 'http://10.4.149.20'
    scan_interval: 300
    form_submit:
      submit_once: True
      resource: 'http://10.4.149.20/login.cgi'
      select: "#login-container"
      input:
        username: secret
        password: 'secret2'
        extra: field
    binary_sensor:
      - unique_id: boiler_state
        name: boiler_state
        select: '#value-FA0_L_kesselstatus'
    sensor:
      - unique_id: boiler_temp
        name: boiler_temp
        select: '#value-FA0_L_kesseltemperatur'
      - unique_id: solar_temp_tpo
        name: solar_temp_tpo
        select: '#value-L_pu0_einschaltfuehler_ist'
      - unique_id: solar_temp_tpm
        name: solar_temp_tpm
        select: '#value-L_pu0_ausschaltfuehler_ist'

After successfully transmitting the login form, I am trying to scrape from the following code:

<!DOCTYPE html>
<html>
<head>------8<----------</head>
<body class="default">
<script type="text/javascript">
  main.showLoader();
</script>
<div id="wrapper">
  <div id="header"><a class="btn" href="#" id="logout"></a></div>
  <div id="content">
    <div id="menu"></div>
    <div id="infobox">
      <div id="messages" style="display:none">
        <h1 id="messages-title"></h1>
        <div id="message-area"></div>
      </div>
      <h1 id="infobox-title"></h1>
      <div class="value" style="background:none">
        &nbsp;
        <div class="values">
          <span id="infobox-is" class="value-is" style="font-weight:bold"></span>
          <span id="infobox-set" style="font-weight:bold"></span>
        </div>
      </div>
      <div id="value-area">
        <div id="value-group-0"></div>
        <div id="value-group-1"></div>
        <div id="value-group-2"></div>
        <div id="value-group-3-0">
          <div class="value">PE1 Boiler Mode<div class="values">
            <span class="value-is" id="value-FA0_L_kesselstatus">Off</span>
          </div>
        </div>
      </div>
      <div id="weather-actual"></div>
    </div>
    <div class="clear"></div>
  </div>
</div>
</body>
</html>

The div <div id="value-area"> is a list, which is dynamically populated and I left out all but the first relevant value.

edit: I updated everything a few hours ago.

Does anyone spot a mistake?

Best,
Ck

Nevermind, I found a way to fetch JSON via the RESTful API…

Great, and that API is accessible without login?

The JSON path contains a password…

@danieldotnl Can you give me an example how/where I can implement the on-error option? I see it twice, one under Sensor/Binary Sensor and one under Sensor Attributes.

For my sensors I would like to add the values:

log: info
value: last
Invalid config for [multiscrape]: [on-error] is an invalid option for [multiscrape]. Check: multiscrape->multiscrape->0->sensor->1->on-error. 

Did you manage to get it to work? I am still encountering the same issue even after upgrading to the latest version and removing the tbody tag.