Scrape/Multiscrape can't find tag on website?

Hi, I wonder if anyone is able to scrape/multiscrape the prices from the website Tarieven | Oasen.

I’ve tried numerous different selectors, but all are giving an ‘Unavailable’ in HA.

This is the yaml:

- resource: https://www.oasen.nl/zelf-regelen/alles-over-de-drinkwaterfactuur/tarieven/
  name: Oasen scraper
  scan_interval: 36000
  headers:
    User-Agent: Mozilla/5.0
  sensor:
    - unique_id: leidingwater_price_variable
      name: Leidingwater prijs
      select: ".border-primary-table > tr:nth-child(1) > td:nth-child(2) > div > p"
      icon: mdi:cash
      value_template: '{{ (value.split("€")[1]).replace(",", ".") }}'
      device_class: monetary
      unit_of_measurement: "€/m³"
    - unique_id: leidingwater_belasting
      name: Leidingwater belasting
      select: ".border-primary-table > tr:nth-child(2) > td:nth-child(2) > div > p"
      icon: mdi:cash
      value_template: '{{ (value.split("€")[1]).replace(",", ".") }}'
      device_class: monetary
      unit_of_measurement: "€/m³"
    - unique_id: leidingwater_vastrecht
      name: Leidingwater Vastrecht
      select: ".border-primary-table > tr:nth-child(3) > td:nth-child(2) > div > p"
      icon: mdi:cash
      value_template: '{{ (value.split("€")[1]).replace(",", ".") }}'
      device_class: monetary
      unit_of_measurement: EUR
  log_response: true

Can someone please point me in the right direction?

Welcome to HA, and thank you for a really well-structured first question.

If you View Source, you’ll see that the data isn’t in the HTML as originally received, so the Scrape integration won’t work.

If you then watch the network responses when you load the page and do a bit of digging, you’ll find the data coming in from:

https://www.oasen.nl/api/content/zelf-regelen/alles-over-de-drinkwaterfactuur/tarieven

You can then use the Restful integration: here’s the first one:

rest:
  - resource: https://www.oasen.nl/api/content/zelf-regelen/alles-over-de-drinkwaterfactuur/tarieven
    scan_interval: 36000
    sensor:
      - name: "Variabel tarief"
        value_template: >
          {{ (value_json['data']
                        ['model']
                        ['contentBlocks']
                        [3]
                        ['model']
                        ['tBody']
                        [0]
                        ['model']
                        ['tableBodyItems']
                        [1]
                        ['model']
                        ['value'])
              |replace(',','.')
              |select('in','.-0123456789')
              |join }}
        unit_of_measurement: "€/m³"

That template could be written on a single line: I have just spread it out for readability on the forum. It’s a horrible data structure from that URL: basically HTML-in-JSON, and the sensor will break if they change anything about the layout.

The last bit of the template takes the result from the lookup, which is <p>€ 1,15</p>, swaps the decimal comma for a decimal point, and discards anything that’s not part of a number.

image

Paste the response from the URL into JSON Viewer Online Best and Free and click “JSON Viewer” to find the paths for the other values you want. Here’s the one above so you can see how the path on the tool maps to the JSON lookup in the template:

1 Like

Thanks @Troon, that really helped me setting up the first rest sensor.
For the other two, I’ll get to them later this afternoon. I don’t expect that it would be an issue with your explanation on the side.

1 Like