Scrape sensor improved - scraping multiple values

@mobile.andrew.jones That’s not the problem. The problem is that I need to use select_list and this returns null

@Szaman The problem is not with the scraper but with the site. If you open a View Source of the page you won’t find the data you’re looking for - that’s because the data is retrieved dynamically, on click, perhaps with some ajax or something like that.
So the scraper can’t retrieve something that doesn’t exist (in the page source).

I understand now. This scrape is closed but you opened for me few others which I had problem to scrape. I didn’t know about View Source thing. Thank you for your help.

@iulisir Could I kindly ask you if you could look at this case?

Using this scrape selector but still with uknown value:

multiscrape:
  - resource: "http://www.herospeed.net/en/index.php?m=content&c=index&a=lists&catid=14"
    scan_interval: 30
    sensor:
      - name: NVR
        select: "body > div.viewport.J-viewport > table > tbody > tr:nth-child(14) > td:nth-child(1)"

Tried with tbody and without and tons of other selectors. In Source View of website data is visible.

@Szaman I looked into your HA code: there’s nothing wrong with it. But, if we look closely into the site’s source we see the code is broken, hence the scraper can’t scrape the data you’re asking for.
Let me explain:
In html every opening tag should have a correspondent closing tag, e.g. <tr> must have a correspondent </tr>
The browsers are forgiving with bad written code, meaning if the programmer forgot to close some tags, the browser will close them for him. That is why, if you look at the page’s code with Inspect, you will find a clean/curated code, and of course you’ll extract the selector you need from that curated code.
But View Source reveals the raw code of the page, as it was written, good or bad - in our case, bad.

In short, while with Inspect you found the selector to be in the 14th TR tag ( tr:nth-child(14) ), in View Source the 14th TR tag doesn’t exist because there are missing opening TRs and the scraper can’t cope with that, because it doesn’t know what to do with unclosed tags.

Let me exemplify:
In Inspect
image
In View Source

Fun fact: instead of 22 opening TRs - <tr (shown by Inspect), in View Source (the raw code) there are only 6 opening TRs. Really bad code…

Thank you very much for your explenations. It’s more clear for me now. Sad that code on website of nvr manufacturer is crap. I should be concerned about quality of software security of network on my camera recorder :smiley:

Glad I could help - I didn’t get an answer to my question, but at least I helped you :smiley:
As for NVR manufacturer… probably they outsourced the site’s programming to a cheap company :slight_smile: Let’s hope they don’t do the same thing on their main business :joy:

Hello, I read carefully and tried to test on my own to scrape some french train arrival, but I didn’t succeeded…
The url I would like to scrap is : Prochains trains au départ de votre gare | Transilien

I would like to catch hours and gate. Is that even possible ?

Thanks !

Hey
I have a Problem with the Integration too.

This is my configuration of the Integration


 - platform: scrape
   resource: https://www.pegelonline.wsv.de/gast/stammdaten?pegelnr=23700600#graphs?id=2cb8ae5b-c5c9-4fa8-bac0-bb724f2754f4
   name: Pegel Rhein Speyer 2
   select: "#content2 table:nth-child(4) tr:nth-child(3) td:nth-child(2)"
   unit_of_measurement: "cm"

Logger: homeassistant.components.scrape.sensor
Source: components/scrape/sensor.py:154
Integration: scrape (documentation, issues)
First occurred: 13:53:01 (38 occurrences)
Last logged: 14:01:33

  • Unable to extract data from HTML for Pegel Rhein Speyer
  • Unable to extract data from HTML for Pegel Rhein Speyer 2

Please can any one help me with my Problem. THX

I keep getting uknown in my sensor.

This is what I used:
select: "#app > div > main > div > div > div:nth-child(1) > div.col-md-8.col-lg-8.col-xl-8.col-12 > div > div.v-card__text > div > div:nth-child(7) > span.float-right.item-value.item-width > span"

Hi,
I am trying to get the values of the table in this website
(Развод мостов в Санкт-Петербурге в 2021 году - График разводки мостов, расписание)

There are 2 values in one <td> separated by a <br> and i cant manage to get the values in different sensors

Could you please help me?

thanks in advance

My configuration is split into files in HA. In Configuration.yaml the multiscrape is running. However, I can’t get it to run in the splitted variant.

Can someone please help me?

Thanks in advance :slight_smile:

multiscrape:
  - resource: https://soziales.hessen.de/gesundheit/corona-in-hessen/taegliche-uebersicht-der-bestaetigten-sars-cov-2-faelle
    scan_interval: 21600
    sensor:
      - unique_id: hkm_ffm_name
        name: HKM Name des Landkreises
        select: "tr:nth-of-type(25) td:nth-of-type(1)"
      - unique_id: hkm_7_tage_inzidenz
        name: HKM 7 Tage Inzidenz
        select: "tr:nth-of-type(25) td:nth-of-type(7)"

try this
select: .next-departure-result-page__departure-time

or this one
select: #next-departure-result-page-app > div > div.nd-result__body > div.tn-accordion.next-departure-result-page.hidden-xs .next-departure-result-page__departure-time-container

Hi, I’m trying to scrape my Generac PwrCell inverter with the custom multiscrape component, and I can correctly scrape the solar panel production (Energy Today), but can’t for the life of my get the Battery Charge %. I’m not very proficient in CSS and don’t know what I’m doing wrong. Sorry if I posted in the wrong place, wasn’t sure where this question should be asked.

Here’s my current code:

multiscrape:
  - resource: https://pwrcell.generac.com/users/XXXXX/dashboard
    username: "user"
    password: "pass"
    sensor:
      - select: "#content > div > div > div > div.row-fluid.dash-statistics > div:nth-child(1) > div:nth-child(2) > h4"
        value_template: '{{ value.rstrip(" kWh") | float }}'
        unit_of_measurement: "kWh"
        unique_id: generac_inverter_energy_today
        name: Generac Energy Today
  - resource: https://pwrcell.generac.com/users/XXXXX/dashboard
    username: "user"
    password: "pass"
    sensor:
      - select: "#content > div > div > div > div.row-fluid.dash-statistics > div:nth-child(3) > div:nth-child(1) > h4"
        unique_id: generac_inverter_home_backup_charge
        value_template: '{{ value.rstrip("%") | float }}'
        name: Generac Backup Charge %
        unit_of_measurement: "%"
        on_error:
          log: debug
          value: last

Here’s a screenshot of the pages CSS.

They highlighted part is the part that includes the Battery %. If I right click and copy > copy selector, I only get #battery-charge, and putting the id directly doesn’t seem to work. I’ve tried to put it directly in the CSS path in my yaml, and then add the id at the end but that doesn’t work either. Anyone have any tips or something I’m overlooking? I’ve tried multiple variations and even tried to get the value from another place on that dashboard, but I still am stuck. Any help would be greatly appreciated!

1 Like

use the chrome css selecter tester plugin, this scraper is harder to setup, as it needs to check much higher.
like i see it needs to start with #container-fluid
u can test it with that selector and see when its highlighted

1 Like

Would you happen to have a link to the plug-in? It’s not available in the chrome webstore anymore and the github file doesn’t seem to work.

EDIT: Nevermind, found it, playing around with it now, I can see it highlighted but it still isn’t working. Here’s the code I tried:

#content > div.container-fluid > div.row-fluid > div.span12 > div.row-fluid.dash-statistics > div:nth-child(3) > div:nth-child(1) > H4#battery-charge.title.text-success

I get the following log error, but not more detail.

Logger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:139 
Integration: Multiscrape scraping component (documentation, issues) 
First occurred: 7:07:40 PM (8 occurrences) 
Last logged: 7:14:40 PM

Sensor Generac Battery Charge Percent was unable to extract data from HTML

i think u scrape wrong,
#content is not a class…

sadly i cant help you because its not an public site…

u need to use #container-fluid

try something thike this

#container-fluid > div.container-fluid > div.row-fluid > div.span12 > div.row-fluid.dash-statistics > div:nth-child(3) > div:nth-child(1)

use the css selector plugin for chrome, it helps alot

Thanks, that kinda worked, I had to remove #container-fluid and start on div.container-fluid, but I get “No Battery” scrapped. Inspecting the source tab in Chrome Inspector, No Battery is what appears there, but in Elements I do get the percentage.

add this > title text-success >

is there a way to full send the html ? as a file or something so i can check for you

send me exactly the selector you using right now where u get the no battery

or change this :div:nth-child(3) to div:nth-child(2)