Scrape sensor improved - scraping multiple values

Did you change the 123456 from the url?
I’m not at home for the next few days, so I have limited access to my config.

I sure did but I realised what why issue is. I’m on a peak/off-peak plan so there are four daily rates, not one. That means your select won’t work for me as my table is a little more complex. I’ll post a screenshot here and see if anyone can assist with the right select.

It would be more helpful to post logs and maybe the “JS path” - my two cents

Will do. Here are the JS Paths if that will help anyway define the correct select.

Off-peak Rate (both can be found in two locations)

document.querySelector("#main_container > div > div.row > div.rates-table-container.base-rates-table > table > tbody > tr:nth-child(1) > td.base-rates.current > span.rate.gst_inclusive > div")

document.querySelector("#unit-balance-container > div.tou-consumer-detail.white-box > section.bin-times > section > section > div.bin-detail-bar > span.bin-detail.bin-width-14 > span.expanded > span.rate")

Peak Rate

document.querySelector("#main_container > div > div.row > div.rates-table-container.base-rates-table > table > tbody > tr:nth-child(2) > td.base-rates.current > span.rate.gst_inclusive > div")

document.querySelector("#unit-balance-container > div.tou-consumer-detail.white-box > section.bin-times > section > section > div.bin-detail-bar > span:nth-child(2) > span.expanded > span.rate")

Try to perform those steps and post page_soup.txt if you are still stuck.

1 Like

Cracked it! For those wanting to scrape their pricing (standard rates) if the have a peak and off-peak plan with Powershop in NZ, here is the full working code.

[EDIT] 9th November 2022. Adjusted code so that the peak and off-peak dollar values can be used in the HA Energy dashboard.

- resource: 'https://secure.powershop.co.nz/rates'
  name: Powershop
  log_response: true
  scan_interval: 43200 #every 12hrs
  form_submit:
    submit_once: true
    resource: 'https://secure.powershop.co.nz'
    select: ".content > form"
    input:
      email: !secret powershop_user
      password: !secret powershop_pass
  sensor:
    - unique_id: powershop_offpeak
      name: Powershop Off Peak
      select: "#main_container > div > div.row > div.rates-table-container.base-rates-table > table > tbody > tr:nth-child(1) > td.base-rates.current > span.rate.gst_inclusive"
      unit_of_measurement: "NZD/kWh"
      value_template: '{{ (value | int) / 100 | float | round(3)}}'
      device_class: monetary     
    - unique_id: powershop_peak
      name: Powershop Peak
      select: "#main_container > div > div.row > div.rates-table-container.base-rates-table > table > tbody > tr:nth-child(2) > td.base-rates.current > span.rate.gst_inclusive"      
      unit_of_measurement: "NZD/kWh"
      value_template: '{{ (value | int) / 100 | float | round(3)}}'
      device_class: monetary   
    - unique_id: powershop_daily_charge
      name: Powershop Daily Charge
      select: "#main_container > div > div.row > div.rates-table-container.base-rates-table > table > tbody > tr:nth-child(4) > td.base-rates.current > span.rate.gst_inclusive"      
      unit_of_measurement: "NZD/kWh"
      value_template: '{{ (value | int) / 100 | float | round(3)}}'
      device_class: monetary      
      on_error:
        log: warning
        value: last

1 Like

Just spent multiple hours scanning this thread. First, this is a super cool addin but fraught with issues getting it to work because so much happening under the hood.

I am not able to get past the form_submit and suspect the issue is with my select: parameter
error is:

Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page. Could not find form

So what is wrong with my form_submit code?

multiscrape:
  - resource: "https://eyeonwater.ca"
    scan_interval: 3600
    log_response: true
    form_submit:
      submit_once: true
      resource: "https://eyeonwater.ca/signon"
      select: "signin_account"     # or "#signin_account" ???
      input:
        email: !secret eyeonwater_username
        password: !secret eyeonwater_password
      #  extra: optional-extra-fields-to-submit
    sensor:
      - unique_id: water_consumption_daily
        name: Daily Water Consumption
        select: "#your-barnacle > div > dl > dd:nth-child(4) > div.compound-read-low > ul"
        on_error:
          log: warning
          value: last

logger:
  default: info
  logs:
    custom_components.multiscrape: debug

Few comments / questions for @danieldotnl :

  1. Many posts are related to getting past website login. For me, the documentation is unclear about how to set this up properly - specifically what to enter for the form_submit select: For my case above, the form is ‘signin_account’. Someone on discord said I need to include hashtag (‘#signin_account’). It would be helpful to clarify how the user/password data fields on the form_submit are mapped. If the form notes the field as an email then we use email: as opposed to username: I didn’t find this distinction in the documentation.
  2. Is there any difference between putting text entries in ", ’ or excluded entirely? I have seen every combination reading through this thread.
  3. One of my use cases is to scrape my investment account but they periodically put a service message after login that requires a mouse click to get to the main account page. Is there anyway to get through this? FYI I am currently scraping this account using Microsoft Power Automate free version which doesn’t allow scheduling of the automation so I run it manually every day. It’s definitely a lot easier to debug in a WYSIWYG environment. IMO web scraping is so fundamental to HA perhaps we may evolve to a tool similar to this to generate and debug the yaml code…
1 Like

I am having issues getting through the login to my water company. Took a look at the powershop website and just curious how you came up with the
select: “.content > form”

I am trying to figure out the correct entry to login to eyeonwater.ca

I just copied what a fellow kiwi had done as I had no clue Scrape sensor improved - scraping multiple values - #282 by joem

I’m glad that you’ve got it working @xbmcnut
Question: did you try to trim the select content to something like this dd:nth-child(4) > div.compound-read-low > ul? Just to make the code shorter.

@KevinE try using fieldset in the select

Thanks for your comments. Would be great if you can help to improve the documentation! (either by a PR on github or by sending me your suggested changes)

Here my answers to your questions:

  1. To find the form on the HTML page, multiscrape needs a CSS selector. CSS selectors that refer to the id of an element, always require a hashtag.
    For retrieving the input fields of the form, the name is being used, as this is also what is submitted.
    So in your case it means:
form_submit:
      submit_once: true
      resource: "https://eyeonwater.ca/signon"
      select: "#signin_account"
      input:
        username: !secret eyeonwater_username
        password: !secret eyeonwater_password
  1. https://stackoverflow.com/questions/19109912/yaml-do-i-need-quotes-for-strings-in-yaml
  2. Try to find out if it loads a new page after the mouse click (then use that one for scraping) or check (in your browser developer tools) if the values your want to scrape are already retrieved from the server. In that case, multiscrape is not bothered by the mouse click but just continues scraping.

Hi @danieldotnl

Thanks for a great home assistant add-on.
Can you please assist I have been battling to get the right selection for my scraping but it keeps on giving me the same error at the same position of the ID=4 which it don’t like the “4” and if I add “#4” it also don’t like it. What is the right selection for this scrape please. See below is the page, at the yellow highlight is what I am looking for at ID=4 which was “86.0%”


Here is one of many code selection in yellow
Battery Percent 3
And here is one of the logs pointing at the “4” being the problem which I have had in plenty of different orders

Tope this is enough information to assist me
Thanks

Thanks for the response Daniel.

Don’t know if you looked at the logon page for eyeonwater.ca but there is no “username” field (just email address) so how are you mapping my email address to the correct field?

Still can’t get it to work. I get one error and one warning my system log. I think I am not getting authenticated but not sure (this would be a good debug message to log that you got passed/failed authentication). Same error below is logged when I use an incorrect password.

Logger: custom_components.multiscrape.coordinator
Source: custom_components/multiscrape/coordinator.py:62
Integration: Multiscrape scraping component ([documentation](https://github.com/danieldotnl/ha-multiscrape), [issues](https://github.com/danieldotnl/ha-multiscrape/issues))
First occurred: 9:19:51 AM (1 occurrences)
Last logged: 9:19:51 AM

Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page. Could not find form

And this warning:

Logger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:163
Integration: Multiscrape scraping component ([documentation](https://github.com/danieldotnl/ha-multiscrape), [issues](https://github.com/danieldotnl/ha-multiscrape/issues))
First occurred: 9:29:00 AM (1 occurrences)
Last logged: 9:29:00 AM

Scraper_noname_0 # Daily Water Consumption # Unable to scrape data: Could not find a tag for given selector. Consider using debug logging and log_response for further investigation.

I tried commenting out the sensor portion of my code to try and isolate an authentication issue and no errors showed in the log. Is this expected behaviour?

Cheers

Perhaps the ‘4’ is interpreted as a number instead of a string. I don’t know how to get around this however. You might try to write some basic python script with bs4 to try elsewhere?

Something like I did:

And @KevinE. Thanks for the tips but both of those suggestions are above my pay grade :grimacing:

Do you get the page_response_body.txt or page_soup.txt generated?

Yes, not sure what I do with these. I have scanned them both but there is no rendered data in them.

Well an attribute ID as “4” is illegal in HTML or XML. It is the same as a name token. See Basic HTML data types.

Guys, hoping someone can help me out here.

I am trying to get the realtime electricity price from here: Price Information

The table moves every half an hour and shows the past (real price), present and future (forecasted price). The price that I want is the immediate past, which is always second row, fifth column (USEP
($/MWh)).

From Chrome Inspect, I get

#realtimeWindow > div > div.tabberlive > div:nth-child(2) > div > div > div.realtimeTableContainer > table > tbody > tr:nth-child(2) > td:nth-child(5)

Here is my configuration.yaml

multiscrape:
  - resource: https://www.emcsg.com/marketdata/priceinformation
    scan_interval: 30
    sensor:
      - unique_id: electricity_usep_price
        name: Electricity USEP Price
        select: "#realtimeWindow > div > div.tabberlive > div:nth-child(2) > div > div > div.realtimeTableContainer > table > tbody > tr:nth-child(2) > td:nth-child(5)"
        #value_template: '{{ (value.split(":")[1]) }}'

I tried both WITH and WITHOUT tbody but got the same error in the log. My log is already set to DEBUG mode.

This error originated from a custom integration.

Logger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:163
Integration: Multiscrape scraping component (documentation, issues)
First occurred: 17:01:39 (9 occurrences)
Last logged: 17:05:41

Scraper_noname_0 # Electricity USEP Price # Unable to scrape data: Could not find a tag for given selector. Consider using debug logging and log_response for further investigation.

What am I missing?

@Dex
Try this:
select: ".realtimePriceTable tr:nth-child(2) td:nth-child(5)"