Scrape sensor improved - scraping multiple values

The prelogin is not in yet, but it’s a preparation. I’ll be asking for beta testers soon!

Heyho,

First of all thank you very much for this component ^^
I just wanted to ask if you could implement an option to dismiss those error messages?

They sometimes occur I don’t know why but they do and then the value for the sensors is Unknown could you fetch this error and just use the old value until it is working again? Because sometimes the device I scrape has some downtimes and I don’t want the sensor have no value.
Thank you :smiley:

First of all, could you please upgrade to the latest release?

Then the fundamental question is whether you really want to see the old value when the site you are scraping is down. Unless you frequently check the error logs, you might not notice that something is wrong and trust an outdated value… Is that really what you want?

It’s a beautiful day! :partying_face:
The much requested and long-desired form-submit (aka prelogin) functionality is now available for testing!
This functionality was inspired by @drogfild (thank you for your work!) but fully redesigned. It can be used for any form, not just for login and works with sessions so in many cases you only need to submit the form (login) once after a restart of home assistant.

An explanation of the form-submit functionality can be found on the wiki.

If you want to test this (and I hope many do), please download the files from the form-submit branch, and copy them manually in your custom_components/multiscrape folder.

Please post general questions and comments in this forum thread and issues on github.

3 Likes

Okay I have updated to the latest version. I’m not sure if the error occurs again but probably, cause it is a timeout exception and is related to the website.
In that sense, of course, you’re right but I’m just monitoring my local weather station.
Since it doesn’t provide a REST API but only an HTTP website which provides live updates, I query the value every 15 seconds. (Especially because of wind gusts.). Maybe a longer update rate is enough. Mainly I use it to retract blinds outside. In any case, it is just checked with each query how the values are. Since the weather station is actually always available, I wonder why a timeout occurs at all.

Looking for some help… Is it possible to change the URL to scrape from based on an Input Text box?
I’ve opened a thread here

Basically, I’m wanting a text box where I can type something in, and that will be my custom search.

Hi Daniel…

I’ve tried using this with your default template and getting these errors…

Logger: homeassistant.components.sensor
Source: custom_components/multiscrape/sensor.py:41
Integration: Sensor (documentation, issues)
First occurred: 16:50:02 (2 occurrences)
Last logged: 16:50:02

Error while setting up multiscrape platform for sensor
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 205, in _async_setup_platform
    await asyncio.shield(task)
  File "/config/custom_components/multiscrape/sensor.py", line 41, in async_setup_platform
    if rest.last_exception:
AttributeError: 'ScrapedRestData' object has no attribute 'last_exception'

Logger: homeassistant.components.binary_sensor
Source: custom_components/multiscrape/binary_sensor.py:40
Integration: Binary Sensor (documentation, issues)
First occurred: 16:50:02 (1 occurrences)
Last logged: 16:50:02

Error while setting up multiscrape platform for binary_sensor
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 205, in _async_setup_platform
    await asyncio.shield(task)
  File "/config/custom_components/multiscrape/binary_sensor.py", line 40, in async_setup_platform
    if rest.last_exception:
AttributeError: 'ScrapedRestData' object has no attribute 'last_exception'

Please share your full multiscrape configuration.

Release 4.1.3

I released version 4.1.3 which now includes the option to specify attributes with scraped values on sensors. It is also possible now to specify a unique_id which will be used an entity_id and allows you to change sensor properties in the UI.
Last but not least, icon(templates) are supported, so you don’t need to add a template sensors anymore, just for setting an icon!

Besides this I added an extensive options table on Github, explaining all possible configuration features.

Next to do:
Merge and release the form-submit functionality!

Hi @danieldotnl,

I’m trying to upgrade from ‘hass-multiscrape (drogfild) pre-login’ to your latest version 4.1.3 ‘ha-multiscrape (danieldotnl)’ and I can’t figure out (also not after reading your wiki) how to move everything into one new multiscrape: (scrape and sensors) action.

Can you perhaps help me?

Below the pre-login and (part of) my sensors.

  - platform: multiscrape
    name: osc scraper
    resource: https://portal.xxx.com/gateways/xxxxx
    verify_ssl: false
    prelogin:
      preloginpage: https://portal.xxx.com/u/sign_in
      preloginform: 'loginForm'
      username_field: 'user[email]'
      password_field: 'user[password]'
      username: !secret xxx_username
      password: !secret xxx_password
    scan_interval: 00:15:00 # Request every 15 min
    selectors:
      levering_dag:
        name: Levering dag
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > a:nth-child(1)"
        value_template: "{{value[:-4] | replace('.', '') | float | round(0)}}"
      levering_nacht:
        name: Levering nacht
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(2) > a:nth-child(1)"
        value_template: "{{value[:-4] | replace('.', '') | float | round(0)}}"

#######################
  - platform: template
    sensors:
      osc_afnemen_dag:
        friendly_name: OSC afnemen dag
        icon_template: "mdi:thermometer"
        unit_of_measurement: "kWh"
        value_template: "{{ state_attr('sensor.osc_scraper', 'Levering dag') }}"
        attribute_templates:
          updated: >
            {{ as_timestamp(states.sensor.osc_scraper.last_updated) | timestamp_custom('%Y-%m-%d %H:%M', true) }}

  - platform: template
    sensors:
      osc_afnemen_nacht:
        friendly_name: OSC afnemen nacht 
        icon_template: "mdi:thermometer"
        unit_of_measurement: "kWh"
        value_template: "{{ state_attr('sensor.osc_scraper', 'Levering nacht') }}"
        attribute_templates:
          updated: >
            {{ as_timestamp(states.sensor.osc_scraper.last_updated) | timestamp_custom('%Y-%m-%d %H:%M', true) }}

multiscrape:
  - resource: 'https://thepagewiththedatathatyouwant.com'
    scan_interval: 30
    form_submit:
      submit_once: True
      resource: 'https://thesitewiththeform.com'
      select: ".unique-css-selector-for-the-form"
      input:
        username: [email protected]
        password: '12345678'
        extra: field
    sensor:
      - select: 'td.mydata:nth-child(1) > a:nth-child(1)'
        name: scraped-value-after-form-submit

Thanks a lot for your help! :wink:

1 Like

Erik, you are bit early. The form-submit functionality has not yet been released. I hope to release it this week or in the weekend.
Could you indicate what’s not clear on the wiki, so I can improve it? Thanks!

Allright, thanks for the info. I’ll wait for the release and have a look/try and let you know.:wink:

Hi…
I’m using your example from your config.

Release v5.0.0 with the form-submit functionality is out there! The options are described in the readme and the wiki.

Great sensor, I used it for this:

1 Like

Hmmm with this page Usługa ePPK Pekao TFI – serwis PPK dla pracowników
what woulb be the name of input fields in configuration? I con’t get this idea of merging…

If I have a string ASCII: 1111111100000000 in a P tag
How can I retrieve only one character at a time from that string?

For example I only want the first 1, I tried this but it doesn’t work…

- platform: scrape
    resource: http://192.168.1.#/####/##
    name: Irrig State3
    select: "p"
    value_template: "{{ value.text[8][1] | }}"

Also tried value_template: “{{ value.text[8:1] | }}” but doesn’t work…

In your browser just right-click on the textboxes and inspect. You’ll see the id ‘app-input-username’ for the username and ‘app-input-password’ for the password field.

Edit: your input fields don’t have a name, see reply below.

I manage to do it with this code:

value_template: >-
      {% if (value | regex_findall_index("(\d)", index=0)) == "0" %}
        on
      {% else %}
        off
      {% endif %}

So it is not name but id of the component right?