The new way to SCRAPE

Looks like the scan_interval bug (when using configuration.yaml) has been fixed!

Thanks devs :+1: :+1: :+1:

2 Likes

Hi!
I’ve tried to follow your example but I think I’m wrong about something.
I imagine that all this must be added in our configuration.yaml, is it true?

Old situation:

- platform: scrape
  resource: http://www.aemet.es/xml/municipios/localidad_28026.xml
  select: 'prediccion dia:nth-of-type(2)'
  attribute: "fecha"
  name: Fecha manana
  scan_interval: 18000

New situation:

scrape:
  - resource: http://www.aemet.es/xml/municipios/localidad_28026.xml
    scan_interval: 18000
    sensor:
      - name: fecha_manana
        select: prediccion dia:nth-of-type
        index: 2
        attribute: fecha

This configuration returns “unknown” result :frowning:

Thank you very much and happy holidays!

You forgot to put the “…” after select: and attribute:

1 Like

Hi all,

now I am trying to fix my issue since weeks now, read through forum posts etc. but I cant figure out, how to do the CSS select and what to paste in / what to use for authentication. the Website I want to scrape is:

https://www.sallys-shop.de/sallys-induktionsmatte-eckig

there I want to get the stock options (Currently “Nicht Lieferbar” - so not deliverable). When I inspect the “Nicht lieferbar” element, and copy the CSS Selector:

div.ml-1:nth-child(2) > div:nth-child(1) > div:nth-child(2)

And I paste that into the home assistant:

Then it surely does not work now … Then I tried to add the selector “leading-tight” that I found the availability in:

But it still says unknown or nothing at all?

Please I could really need some help here… sorry :frowning:

Like this? it does not work :pensive:

scrape:
  - resource: http://www.aemet.es/xml/municipios/localidad_28026.xml
    scan_interval: 180
    sensor:
      - name: fecha_manana2
        select: "fecha"
        value_template: '(prediccion_dia:nth-of-type)'
        index: 2

I tried your use case and it doesn’t work - I guess some things can’t be scraped :slight_smile:

One the same website if I try to scrape the price, it works like advertised - so it is not the resource that is prohibiting the scrape.

Hi Zepol,
can you make a screen shot of the website and highlight what exactly you are trying to scrape?

Hei Thanks a lot in advance! Could you show me your scrape of the price quickly? would be awesome! Mysterious that it does not work to scrape the “Lieferbar” or “Nicht lieferbar” or? :open_mouth:





Nice thanks a lot! where did you get the #content> div.sticky.... from? I copied the CSS Selector which was div.pr-2 and this worked aswell - but as you said, somehow the Delivery status it not reachable… mysterious… :frowning: Thanks anyway!

  • highlight the item I want to scrape
  • right mouse click → untersuchen
  • Go to “Elemente” box top right and right mouse click on the highlighted area
  • “Kopieren”
  • “Selector kopieren”

Result:
#content > div.sticky-footer.print:hidden > div > div > div.w-full.md:w-auto.flex.items-center.justify-end.space-x-4 > div.flex.flex-col.items-end.justify-center > div.text-secondary.text-2xl.font-bold.leading-tight

Thanks! I now solved my issue with the “Menge:” text, that is not visible when the item is out of stock, this works awesome :wink: Thanks for the help!

1 Like

Hi Zepol,
I tried your parameters and I am not sure if this is the data you want to scrape?


or in a different card:

Sorry one more question: How can I change the update time of the sensor? Nothing happens since 3 hours, and what automations do i have to use for updating the Sensor? Thanks loads! checked the update_Entity service and disabled the auto update from the GUI, nothing happens :frowning:

So tried a few things with disbled auto update, and even added scapers via YAML. Scan interval and all automations are not working, is this possible ?

  • disable auto update from GUI
  • make an automation - using the service “homeassistant.update_entity” to update your scrape sensors at the desired intervals.

Here is an example that scrapes every 15 minutes during workdays between 09.15 and 19.00

Thanks for the reply! Much appreciated.

I did that and it looks like this:

Only a restart of home assistant triggers the reload.

Any other ideas? :open_mouth:

I guess you think there was no update due to the line “Menge Vorhanden - Vor 3 Stunden”?

My understanding is, that there was no change of that status “Menge Vorhanden” since you first started HA - which probably happened 3 h ago?

So while the automation updated the status every 15 min - it will only update the time (Vor xxx h) when a change occurs. I am afraid that is a little bug in HA.

I just tested this with my scrape sensors and as you will see in the screenshot, although both values were scraped and updated at the same time, one shows 4 minutes ago (because the exchange rate is still updated at 23:45) and the other one 4h ago (because the stock exchange closed at that time and no changes occurred anymore).

Did you check the traces of your automation?
You should be able to see if and when the automation was executed.

1 Like

Yeah, the website is this:

http://www.aemet.es/xml/municipios/localidad_28026.xml

And the points to scrape:

Thank you very much!

Sorry mate, but I can’t scrape it from that XML either.

But I did try their website, and there it works - maybe give it a try there?