The new way to SCRAPE

That could be it! Thanks a lot for the clarification! Awesome work man …!

1 Like

Well, new Website new luck - is it possible that some websites can´t be scraped whysoever?

Fitness Express Renningen I tried getting their “Auslastung” image
but couldn´t get it done. Selectors are e.g.:

`body > div.tpl-main > section.is-relative.tpl-section.ex-section-dzGVcD8bf1 > div > div > div > div > div > clubapp-checkin-count > div > div > div > div.ldBar > div`

seems not to work… Why is that? I am wondering from day to day why the data can´t be read out if it is dispalyed in front of me … :open_mouth: ? Thanks a lot in advance!

i want the “propaan” price from Huidige maximumprijzen | BRAFCO . (And save the previous one in a next phase). How do I start ?

I downloaded and installed in from HACS, added

scrape:
  - resource: https://www.brafco.be/nl/huidige-maximumprijzen  
    sensor:
      - name: propaan
        select: .view-brandstofprijzen

but probably the select is wrong. What would be the code ? Or a site where I can find a tutorial to do it ?
Because I don’t understand it.

@Faecon , which price would you like to scrape?

CSS selectors for scraping:

Propaan >= 2000:
#block-views-block-brandstofprijzen-block > div > div > div:nth-child(5) > div:nth-child(6) > div > span > div > div

This gives data:
Vanaf 29/03/2023 0.7521 €/L incl. btw 0.6216 €/L excl. btw Prijsevolutie

Propaan < 2000:
#block-views-block-brandstofprijzen-block > div > div > div:nth-child(5) > div:nth-child(7) > div > span > div > div

This gives data:
Vanaf 29/03/2023 0.8317 €/L incl. btw 0.6874 €/L excl. btw Prijsevolutie

With value templating you can cut or regex it to the data you need.
You can’t scrape the price alone as it has no CSS selector itself…

1 Like

It’s best to use the integration Scrape.
I can scrape the price (incl. taxes) for Propane >=2000 like this:

URL: Huidige maximumprijzen | BRAFCO

Selecteer: div:nth-of-type(6) .promoted > div

Template: {{ value|regex_findall_index(find='([0-9]+\.[0-9]+)',index=0, ignorecase=False) }}
for the price exclusive btw. use:
{{ value|regex_findall_index(find=’([0-9]+\.[0-9]+)’,index=1, ignorecase=False) }}

This gives me the following sensor:

image

I have an error :frowning:

User input malformed: invalid template (TemplateSyntaxError: unexpected char ‘’’ at 34) for dictionary value @ data[‘value_template’]

Your value_template is wrong somewhere. The quotations " " are wrong somewhere.
What did you paste in the Waardesjabloon / template field (exactly)?

{{ value|regex_findall_index(find=’([0-9]+.[0-9]+)’,index=0, ignorecase=False) }}

Strange, as it seems correct and works perfectly fine over here. Try and replace it with {{value}} just for testing.

then I have an entity with “Vanaf 29/03/2023 0.7521 €/L incl. btw 0.6216 €/L excl. btw Prijsevolutie”

Correct. Try again with this as template:

{{ value|regex_findall_index(find="([0-9]+\.[0-9]+)",index=0, ignorecase=False) }}

’ should’ve been " I guess…

EDIT: now it’s correct, please use the above template as a backslash was missing before the period.

1 Like

Sammy, you are a legend :wink:

1 Like

{{ value|regex_findall_index(find="([0-9]+\.[0-9]+)",index=0, ignorecase=False) }}

should be 100% correct, sorry, I’m doing this in between my normal work, so too much distraction…

I’m mobile now and I’ll check later this evening. You can try yourself with right clicking the desired item in your browser and select Inspect. The debug console will open up. From there you can right click your desired data in the debug console and choose copy > selector if I recall correctly :grinning_face_with_smiling_eyes:

1 Like

Hi there, When I checked out the integration GUI I cannot find any way to set a POST payload for the scrape request. Is this simply not exposed via the GUI or am I blind?

Is it not via the first dropdown on the add integration popup? Or did I misunderstand your question?

Untitled

The Method is there, but where can I input the request body (a.k.a. payload)?

That is a very good point! (Never tried it before). Looks like a bug honestly.

Hi together,

I do also have a problem to get the value from

https://mygridbox.viessmann.com/live-view

I did open the console and did copy the selector from the value

#text-soc > tspan

So far I did try out every combination (Post/Get, basic/digest), I’m not sure if the autenthication does work at all?!

If we could solve that authentication problem of the Viessmann gridbox website, I guess a lot of people using Viessmann stuff would be very happy. They are blocking most data, that we could use for energy metering. You have to pay approx. 120€/a to get the data. So, any update here?

BR, Timo