Scrape sensor improved - scraping multiple values

Great component, thanks for that!
I am having trouble using a dynamic select. I want to use something like:

select: >
            {% set wn = now().isocalendar()[1] %}
            {% set wd = now().weekday() %}
            tr[data-weeknumber="{{ wn }}"][data-dayofweek="{{ wd }}"] td:nth-child(2)

But I guess because the select is only a string, it does not support templates. Do you know if something like that is possible or could be made possible?

Support for templates would be a nice feature! I’ll try to find time to look into it.
I also still want to publish the latest pre-release as an official release.

Hi all! Sorry for not being able to update my code lately. I try to find time for this project also.

@swifty For me your config seems valid. Can you test is it possible to login to that page without javascript with your browser? Am I able to create account to that site?

I just released the latest pre-release as an official release and created a new pre-release which supports templates in the select! @Roemer: Please give it a try!

Sorry for the late reply, it was a busy weekend!
I think the site probably needs javascript but got it going in the end.
I use node-red for all my automations so I used a selenium docker container controlled by node-red to scrape the information I needed from the site.

Wow the template seems to work great! Now I only have the issue that
tags are removed and just replaced by spaces, so I cannot format the text nicely.

Latest Multi Scape component is working fine for me. It’s logging-in (username/password) and scrape multiple values which are available as attributes on the sensor.

My question is: Is it possible to get the value of ‘totaal:’ as Entity State value? (current Entity State is none)

    selectors:
      levering_dag:
        name: Levering dag
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > a:nth-child(1)"
        unit_of_measurement: "kWh"
      levering_nacht:
        name: Levering nacht
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(2) > a:nth-child(1)"
        unit_of_measurement: "kWh"
      totaal:
        name: Totaal (levering – teruglevering)
        select: "div.col-lg-4:nth-child(4) > div:nth-child(1) > div:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(5) > td:nth-child(2) > a:nth-child(1)"
        unit_of_measurement: "kWh"

1 Like

@drogfild how can I point to login form since it does not have any name? My page looks like this:
https://ebok.mpwik.lublin.pl/login

Hi @majkers! Good question and thanks for giving an example. After @BrianHanifin commit that “prelogin” script also looks for attribute name, id, class or action. So in your case you should be able to use it’s class

      preloginform: 'form-horizontal'

or even it’s action

      preloginform: '/login'

Unfortunately my script isn’t updated with latest multiscrape improvements and is based on quite old version of it. I haven’t yet got my head around async requests yet :expressionless:

1 Like

Hi,
I’m trying to get exchange rates as-of-today from national bank link, but seems i don’t write proper syntax.
After someone can illuminate me, then i will want to get with a single request exchage rates for 4 currencies. (tried openexchange, but for free api the result is only for usd and i want base currency to be “ron”
Thank you

 - platform: scrape
   resource: https://bnr.ro/Cursul-de-schimb-524.aspx
   select: "chf"
   name: leutu
   value_template: '{{ (value | int) / 10 }}'

I am trying to use the fork by @drogfild https://github.com/drogfild/hass-multiscrape to scrape data from my heat pump (thread https://community.home-assistant.io/t/is-there-any-interest-in-a-stiebel-eltron-climate-platform), but could not get login working. A GET request by curl, browser, or the requests module from python returns the expected content of the login page, but this module gets always a 400 - bad request page. Also tried to add a user-agent to headers, no luck.

Any idea what am I missing?

That’s weird. Haven’t experienced that error myself. Have you been able to verify if you get that error from the first page load or is it after login attempt?

Most probably doesn’t affect this problem, but I have beta version of my fork new version. It’s quite up to date with original. You can find it from dev branch. Config should be identical.

@danieldotnl Will you update the integration to solve the following warning/requirement?

No 'version' key in the manifest file for custom integration 'multiscrape'. This will not be allowed in a future version of Home Assistant. Please report this to the maintainer of 'multiscrape'

Thank you. :slight_smile:

1 Like

Hello,

A little question.
Is it possible to maintain a HTTP livestream of a website in order to retrieve the data live and also update it?
Since my weather station offers a website where real-time data is played back, it would be cool if I could also use this.

I updated the opening post of this thread with an with an update on the new repository that’s now in the default HACS store. Please read this when you are using the multiscrape custom component!

Somehow I keep missing notifications from this thread, but the version has been added!

So you still do not plan adding scraping after logging into where required?

I’m actually looking into that, by popular request :grinning:

1 Like

Merging from my dev branch should not be that bad. It’s not up to date but async working just fine.

Biggest problem I have with it that it doesn’t allow logging to same page with multiple different credentials. So having multiple sensors for same page but with different credentials. It uses same session so it’s already logged in.

Hi,

I need help with two sensors not working, value shows up as empty.

select: #acc’ is not not working value ends up empty

<span id="acc">ON</span>

select: ‘#gpsSpeed’ works

<span id="gpsSpeed">0</span>

select: ‘#driverName’ works

<div class="col-sm-12 c-009934 fw-b" id="driverName">Uplander LT</div>

select: ‘#coordinate’ is not not working value ends up empty

<span id="coordinate">-73.63061 / 45.52565</span>
<input type="hidden" id="c_latitude" value="45.52565">
<input type="hidden" id="c_longitude" value="-73.63061">