Scrape sensor improved - scraping multiple values

@danieldotnl I have question i want to scrape some data from a website for example:
www.vodafone.com but also some info from: Bestel je iPhone 12 bij Vodafone

i want the data from the second link as attribute for the first sensor.
Is that possible?

For this you would need to create 2 multiscrape sensors and combine them with a template sensor.

Thx for your reply!

Hi,
I receive this type of error in my log. Can someone explain what is this error about

I used select “p” but now my value is in the “< body >” tags. how can I retrieve it?

If I use select: "body" it doesn’t work.

Hi,

I am trying to use multiscrape to load a table of data into the attributes of a sensor – one attribute would be a list of times, the other a list of corresponding forecast temperatures – so that I can then create a plot of these forecasts in lovelace using ApexCharts. Ideally I would get a sensor which looks a little like this:

TempNow: 25.6
ForecastTimes:
  - '18:00'
  - '19:00'
  - '20:00'
...
ForecastTemps:
  - 25
  - 24
  - 23
...

And at the moment, the best I can come up with is this:

- name: in-počasi Praha
  resource: "https://www.in-pocasi.cz/predpoved-pocasi/cz/praha/praha-324/"
  scan_interval: 300
  sensor:
    - unique_id: inpo_praha_temperature
      name: In-Počasi Temperature Now
      select: ".alfa"
      value_template: "{{(value|trim)[:-2]|float}}"
      unit_of_measurement: °C
      attributes:
        - name: "Forecast Time"
          select: ".day-hour-ext.flex-column .day-hour-time"
          value_template: "{{value}}"
        - name: "Forecast Temperature"
          select: ".day-hour-ext .h-100 .font-weight-bold"
          value_template: "{{value}}"

The CSS selectors bring in the whole table row, looking like this according to BeautifulSoup:

[<div class="day-hour-time text-black font-weight-bold">19:00</div>,
 <div class="day-hour-time text-black font-weight-bold">20:00</div>,
 <div class="day-hour-time text-black font-weight-bold">21:00</div>,
...

but multiscrape only takes the first item. I’ve tried using some sort of list or .split() filters, but this just filters the first item and not the others.

Is it possible to do some sort of iteration/split the selected row of table data into a list of some sort, without defining a separate attribute for each data point in the forecast?

Interesting use case! It is not possible yet. Could you create a github issue?

I’ve created a feature request here:

@danieldotnl has very kindly introduced a new feature in multiscrape 5.2, select_list to deal with use cases such as scraping a data table into a sensor. Here is my example config to scrape the next 24h of weather forecast from my local provider:

- name: in-počasi Praha
  resource: "https://www.in-pocasi.cz/predpoved-pocasi/cz/praha/praha-324/"
  scan_interval: 300
  sensor:
    - unique_id: inpo_praha_temperature
      name: In-Počasi Temperature Now
      select: ".alfa"
      value_template: "{{(value|trim)[:-2]|float}}"
      unit_of_measurement: °C
      attributes:
        - name: "Forecast Time"
          select_list: ".day-hour-ext.flex-column .day-hour-time"
          value_template: |
            {%-set value = value.split(",")-%}
            {%for x in value%}
            - {{x}}
            {%-endfor-%}
        - name: "Forecast Temperature"
          select_list: ".day-hour-ext .h-100 .font-weight-bold"
          value_template: |
            {%-set value = value.split(",")-%}
            {%for x in value%}
            - {{x[0:-2]|float}}
            {%-endfor-%}
2 Likes

Hi all, anyone can help me?

Extract this:

I use multiscrape.

Example

multiscrape:
  - resource: https://www.tmb.cat/ca/barcelona/metro/-/lineametro/estacion/224
    scan_interval: 60
    sensor:
      - unique_id: l2-metro-santroc
        name: Prox metro
        #NO select: "#detall-panel-1 > section.imetro > div.imetro-content > div > ul > li:nth-child(2) > div > span:nth-child(1)"
        #NO select: "#detall-panel-1 > section.imetro > div.imetro-content > div.imetro-item > ul.next > li.next-item > div.next-item__info > span"
        #NO select: "#detall-panel-1 > section.imetro > div.imetro-content > div.imetro-item > ul.next > li.next-item:nth-child(2) > div.next-item__info > span:nth-child(1)"

I tried above 3 selects (erasing #NO) but returns “unknown”.

Can you help me?

Looks like values are injected by javascript
 Network monitor shows some promising json responses though. If they contain the data you are looking for, you could retrieve it with the rest sensor.

Thanks for your reply!!

I found the json. And i got the results with the rest platform. Thanks again!

1 Like

I try to login at https://login.ns.nl/login.
And i my config is:

multiscrape:
  - resource: 'https://www.ns.nl/mijnns#/dashboard'
    form_submit:
      submit_once: False
      resource: 'https://login.ns.nl/login'
      select: '#loginForm'
      input:
        email: '[email protected]'
        password: 'PASSWORD'
    sensor:
      - unique_id: ns_test
        name: NS
        select: '#mijnns-app > mns-app > div > mijnns-dashboard > div > div > div.grid__unit.s-4-4.m-12-12.l-8-12.ng-tns-c157-0 > div > section > div > ovcp-overview > div > div:nth-child(1) > div > h3'

But i get status unknown


Hi @danieldotnl, I’m try to add my (scraped) Solar values into the new Energy Monitor.
There are a few requirements for that and one of them is the last_reset parameter.

      - name: Energy Solar Total
        device_class: energy
        state_class: measurement
        last_reset: '1970-01-01T00:00:00+00:00'
        unit_of_measurement: "kWh"
        icon: "mdi:solar-panel"
        select: "#param_link_15684226"
        value_template: "{{value.split(' ')[0] | replace(',', '.')}}"

Configuration invalid
Invalid config for [multiscrape]: [last_reset] is an invalid option for [multiscrape]. Check: multiscrape->multiscrape->0->sensor->6->last_reset. (See /config/configuration.yaml, line 55).

However, the last_reset parameter isn’t a valid option for Multiscrape sensor (yet).
Is it perhaps possible to add this parameter for the Multiscrape Component?
It would be very helpful. :slight_smile:

Pre-release v5.4.0 supports fixed values for sensors/attributes by using the value_template and omitting a select. So you can add the last_reset attribute like this:

attributes:
  - name: last_reset
    value_template: '1970-01-01T00:00:00+00:00'

Daniel, thanks again. Works perfectly.

Hey Guys,

I’m trying to scrape from this site https://covidlive.com.au to look at the New Cases in the Last 24 hours, but no matter which way I try, I get an invalid response from the integration.

e.g.

#content > div > div:nth-child(1) > section > table > tbody > tr:nth-child(2) > td.COL5.NET > span

doesn’t return the value in red, which at the moment is 80.

any ideas what I’m doing wrong? It works perfectly for scraping from other sites.

image

and the logs show me this:

image

Hi,

Looks like something has broken since the core September updates: my sensors, which were working perfectly until the end of August, now all have the value ‘unknown’.

What is weird is that, in the debug log, the multiscrape integration happily shows the right values. For instance:

2021-09-04 11:39:16 DEBUG (MainThread) [custom_components.multiscrape.sensor] Sensor Température piscine selected: 26.0

The values just don’t seem to make it from multiscrape to the HA sensors.

I wonder if this has something to do with the recent changes in sensors for the new long-term statistics. I tried adding

        device_class: temperature
        state_class: measurement
        force_update: true

to no avail.

Any idea?

Please install the latest pre-release (or wait for a regular release). See: https://github.com/danieldotnl/ha-multiscrape/issues/55
and
https://github.com/danieldotnl/ha-multiscrape/issues/50

1 Like

Thanks a lot, works perfectly :+1:t2:

Enjoy your holiday!