Scrape sensor improved - scraping multiple values

It’s the ‘tbody’ bit. Just remove it and you should be good (other than the depression due to the numbers themselves!) eg.

 
  - resource: 'https://covidlive.com.au/'
    scan_interval: 120
    sensor:
      - select: '#content > div > div:nth-child(1) > section > table > tr:nth-child(2) > td.COL5.NET'
        name: multiscrape-COVID-NSW    
      - select: '#content > div > div:nth-child(1) > section > table > tr:nth-child(3) > td.COL5.NET'
        name: multiscrape-COVID-Vic    

Cheers

Great tool. I wonder if anyone could help me though - I’m having some issues trying to get it to login to one site. Not sure if it is something specific to the site, or if I’ve missed something, but any suggestions are welcome!

  - resource: 'https://connectmypool.com.au/Account/Chemistry.aspx'
    scan_interval: 120
    form_submit:
      submit_once: True
      resource: 'https://connectmypool.com.au/'
      select: "#loginPanel > table > tr > td:nth-child(2) > table > tr:nth-child(2) > td:nth-child(2)"
# Have also tried:
#      select: "#ucLogin1_txtUserName"
#      select: "#loginPanel > table > tbody > tr > td:nth-child(2) > table > tbody > tr:nth-child(2) > td:nth-child(2)"
      input:
        username: [email protected]
        password: 'passw0rd here'
        extra: field
    sensor:
      - select: '#dvPh > table > tr > td.chem_val_td > div'
        name: multiscrape-pH

I don’t immediately spot a mistake. Any errors in debug logging?

Thanks for looking, Daniel. Nothing that points to what the issue is, but I’ll keep hunting. Cheers.

Minor update, going through the logs in debug mode I am pretty sure now that multiscrape is successfully logging on with:

      select: "#loginPanel

as I am seeing happy messages like:

2021-09-11 09:01:24 DEBUG (MainThread) [httpx._client] HTTP Request: GET https://connectmypool.com.au/Account/Chemistry.aspx "HTTP/1.1 302 Found"

I am however now seeing errors such as:

2021-09-11 09:01:25 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range

when it is trying to populate the sensor. So a little more troubleshooting required. :slight_smile:

This is the raw html being generated on the page being scraped, and I’m just trying to get the data out of #lblPHMeasure which in this case is currently 7.9. No longer seeing index out of range errors, just:

2021-09-11 13:52:46 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor multiscrape-pH was unable to extract data from HTML

I’m getting the same error if I just use the standard scrape sensor, so starting to think that maybe something else is happening here - I might move on to do something else for now so I can look at it with fresh eyes later…

    <div id="updpnlChemistry">
	
            <div class="con_area" style="min-height: 200px;">
                <span id="lblMessage"></span>
                <div id="dvAllData">

                    <div id="dvPh" class="cg">
                        <div class="chem_gauge">
                             <img src="../App_Themes/Default/images/gauge-needle.png" id="imgNeedle" style="position:relative;left:28px;top:52px;" />                          
                       </div>
                       <table class="chem_table">

                            <tr>
                                <td class="chem_txt_td">Ph Level:</td>
                                <td class="chem_val_td">
                                    <div class="chem_label chem_txtok">
                                        <span id="lblPHMeasure">7.9</span>
                                    </div>
                                    <br /><small><span id="lblPHLast">29 minutes ago</span></small>
                                </td> 
                            </tr>

                        </table>
                    </div>
                              
                    <div class="cg">
                        <table class="chem_table">

                            <tr id="dvOM">
		<td class="chem_txt_td">ORP Status:</td>
		<td class="chem_val_td">
                                    <div id="dvORPMeasure" class="chem_label chem_txtok">
                                        <span id="lblORPMeasure">OK</span>
                                    </div>
                                    <br /><small><span id="lblORPLast">34 minutes ago</span></small>
                                </td>
	</tr>

Thanks for this component!
I have an issue with my boiler page.

I am getting error messages from the logs:

2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.scraper] Updating from http://10.4.149.20/login.cgi
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.scraper] Submitting form data {'username': 'secret', 'password': 'secret2', 'submit': None, 'extra': 'field'} to http://10.4.149.20/login.cgi
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.scraper] Updating from http://10.4.149.20
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape] Finished fetching scraper data data in 0.969 seconds (success: True)
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor boiler_temp was unable to extract data from HTML
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor solar_temp_tpo was unable to extract data from HTML
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor solar_temp_tpm was unable to extract data from HTML
2021-09-11 18:08:01 DEBUG (MainThread) [custom_components.multiscrape.binary_sensor] Exception selecting sensor data: list index out of range
2021-09-11 18:08:01 ERROR (MainThread) [custom_components.multiscrape.binary_sensor] Sensor boiler_state was unable to extract data from HTML

My configuration looks like this:

multiscrape:
  - resource: 'http://10.4.149.20'
    scan_interval: 300
    form_submit:
      submit_once: True
      resource: 'http://10.4.149.20/login.cgi'
      select: "#login-container"
      input:
        username: secret
        password: 'secret2'
        extra: field
    binary_sensor:
      - unique_id: boiler_state
        name: boiler_state
        select: '#value-FA0_L_kesselstatus'
    sensor:
      - unique_id: boiler_temp
        name: boiler_temp
        select: '#value-FA0_L_kesseltemperatur'
      - unique_id: solar_temp_tpo
        name: solar_temp_tpo
        select: '#value-L_pu0_einschaltfuehler_ist'
      - unique_id: solar_temp_tpm
        name: solar_temp_tpm
        select: '#value-L_pu0_ausschaltfuehler_ist'

After successfully transmitting the login form, I am trying to scrape from the following code:

<!DOCTYPE html>
<html>
<head>------8<----------</head>
<body class="default">
<script type="text/javascript">
  main.showLoader();
</script>
<div id="wrapper">
  <div id="header"><a class="btn" href="#" id="logout"></a></div>
  <div id="content">
    <div id="menu"></div>
    <div id="infobox">
      <div id="messages" style="display:none">
        <h1 id="messages-title"></h1>
        <div id="message-area"></div>
      </div>
      <h1 id="infobox-title"></h1>
      <div class="value" style="background:none">
        &nbsp;
        <div class="values">
          <span id="infobox-is" class="value-is" style="font-weight:bold"></span>
          <span id="infobox-set" style="font-weight:bold"></span>
        </div>
      </div>
      <div id="value-area">
        <div id="value-group-0"></div>
        <div id="value-group-1"></div>
        <div id="value-group-2"></div>
        <div id="value-group-3-0">
          <div class="value">PE1 Boiler Mode<div class="values">
            <span class="value-is" id="value-FA0_L_kesselstatus">Off</span>
          </div>
        </div>
      </div>
      <div id="weather-actual"></div>
    </div>
    <div class="clear"></div>
  </div>
</div>
</body>
</html>

The div <div id="value-area"> is a list, which is dynamically populated and I left out all but the first relevant value.

edit: I updated everything a few hours ago.

Does anyone spot a mistake?

Best,
Ck

Nevermind, I found a way to fetch JSON via the RESTful API…

Great, and that API is accessible without login?

The JSON path contains a password…

@danieldotnl Can you give me an example how/where I can implement the on-error option? I see it twice, one under Sensor/Binary Sensor and one under Sensor Attributes.

For my sensors I would like to add the values:

log: info
value: last
Invalid config for [multiscrape]: [on-error] is an invalid option for [multiscrape]. Check: multiscrape->multiscrape->0->sensor->1->on-error. 

Did you manage to get it to work? I am still encountering the same issue even after upgrading to the latest version and removing the tbody tag.

I ended up switching to the scrape sensor, and it’s doing what I want it to. That being said, does my head in trying to find the correct selector, I have no idea why it’s so hard

This seems to work for me: #content > div > div:nth-child(1) > section > table > tr:nth-child(2) > td.COL5.NET

Though, the value is now (fortunately) “-”.

Here an example for on_error:

multiscrape:
  - resource: https://www.home-assistant.io
    scan_interval: 3600
    sensor:
      - unique_id: ha_latest_version
        name: Latest version
        select: ".current-version > h1:nth-child(1)"
        value_template: '{{ (value.split(":")[1]) }}'
        on_error:
          log: warning
          value: last

@danieldotnl I’m trying to scrape some weather data into a sensor’s attributes. All working good except one attribute and I don’t know if is a bug or am I just doing something wrong :frowning:
The code below returns null in the attribute wind_direction

  - resource: https://www.meteoblue.com/ro/vreme/widget/daily/bucure%c8%99ti_rom%c3%a2nia_683506?geoloc=fixed&days=5&tempunit=CELSIUS&windunit=KILOMETER_PER_HOUR&precipunit=MILLIMETER&coloured=coloured&pictoicon=0&pictoicon=1&maxtemperature=0&maxtemperature=1&mintemperature=0&mintemperature=1&windspeed=0&windspeed=1&windgust=0&windgust=1&winddirection=0&winddirection=1&uv=0&humidity=0&precipitation=0&precipitation=1&precipitationprobability=0&precipitationprobability=1&spot=0&pressure=0&layout=light
    scan_interval: 180
    sensor:
      - unique_id: meteoblue_next_days
        name: Meteoblue Next Days
        select: ".day_long"
        value_template: "{{ value }}"
        attributes:
          - name: "Wind Direction"
            select_list: ".wind.dir > span"
            attribute: "class"
            value_template: |
              {{value}}

but if if change select_list it to select

          - name: "Wind Direction"
            select: ".wind.dir > span"
            attribute: "class"
            value_template: |
              {{value}}

the response is

['glyph', 'winddir', 'NE']

The code I’m parsing from the webpage is like this

<div class="wind dir">
<span class="glyph winddir N"></span>
</div>
...
<div class="wind dir">
<span class="glyph winddir SW"></span>
</div>
...
<div class="wind dir">
<span class="glyph winddir N"></span>
</div>

so when I use select_list I expect the result to be something like this:
glyph winddir N, glyph winddir SW, glyph winddir N

Please tell me, am I doing something wrong?

Many thanks in advance!

I’m trying to get newest BIOS version sensor but without success.

Site from I’m trying to scrape value: ASRock > B450M Steel Legend

multiscrape:
- resource: "https://www.asrock.com/mb/AMD/B450M%20Steel%20Legend/index.asp#BIOS1"
  scan_interval: 30
  sensor:
  - name: bios3
    select: 'table tbody tr:nth-child(1) td:nth-child(1)'

and only what I can get is value from “Manual” section…

Bez tytułu

What am I doing wrong?

Tried also with copy selector and delete tbody part and sensor is unknown:

#BIOS > table > tr:nth-child(1) > td:nth-child(1)

@Szaman there are other tables in the page and the scraper extracts the first it finds. That is why you should be more specific with the selector.
Your second try was correct but you shouldn’t delete the tbody part.
Try again with

select: '#BIOS > table > tbody > tr:nth-child(1) > td:nth-child(1)'

@iulisir Unfortunetly after using this select value is still uknown.

With ['glyph', 'winddir', 'NE'] If you want JUST the wind direction, do

{{ value[2] }}

if you want glyph winddir N do

{{ value|join(" ") }}