[SOLVED] Need help to scrape data from website

hi,

i would like to scrape data from this website - the table where you can see the name of the lake and the corrsponding temperature:

http://www.wassertemperatur.org/oesterreich/

could you please give me a hint how to define the scrape function to achieve this?

i tried it with a couple of html tags. with td it shows at least the name of the first lake, when trying tbody the entity is no longer there to integrate in lovelace.

2 Likes

Does that site have an api?

3 Likes

no, it doesn’t.

if it had one and they would provide it in json, how could i use the data with a custom integration? scrape would not work for it?

Do this.

go to the page and right click on the temp you want as a sensor:
image
Inspect element.

Then select:
CSS Selector:
image

Make a sensor in home assistant and put the CSS data in (in confguration.yaml under
Sensor:

  - platform: scrape
    name: Ausee Temperature Test
    resource: http://www.wassertemperatur.org/oesterreich/
    select: ".entry-content > table:nth-child(4) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(2) > span:nth-child(1)"

Result:
image

more info on the link I gave before:

12 Likes

thanks, it worked perfectly!

i read the scrape documentation before, but i just could not figure out, how to define “select”.

thanks!

Mark the “topic as solution” please :slight_smile:

Hi, I am trying to do the same for education but my chrome or edge browser doesn’t have the copy CSS option. Which browser are you using?

you have these options in firefox.

1 Like

Edge also (Inspect)

I tried edge also but I didnt found the css option there

@sender @Makis

when i tried to apply this procedure on other sites, unfortunately i was not able to scrape the data although the fields where properly identified when choosing inspect element.

on

http://laendris.donaustationen.at/index.php?module_id=6
http://laendris.donaustationen.at/index.php?module_id=6&action=details&pegelstelle_id=13&Tage=1&l_laende_id=33

i could not load the values on the overview as well as on the details page.

on

https://www.noel.gv.at/wasserstand/#/de/Messstellen/Details/207373/Wasserstand/3Tage

i also could not load the values. see corresponding screenshots 1-3.

could you imagine was the problem is and could you tell me, when the above procedure works and when not? i tried this now on a couple sites, it works perfectly on the one mentioned above but does not others.

thanks in advance!

pegel2

Hi all, I would like to take some information of the next website:

http://www.climaat.angra.uac.pt/boias/index2.htm

For example, how I can get the temperature shown in the last row of the table?
“Temperatura da água à superficie: 23 ºC” (Water surface temperature)

Hi folks. Can someone please help me get the KP-index from this site? I’ve tried and tried, but I can’t get my head around the formula. If it’s at all possible, that is. I would really appreciate it, thank you!
https://www.spaceweatherlive.com/en/auroral-activity.html

If you read the site faq you are advised not to do thst.

Really cool - thanks for your detailed explanation! I am now tracking the water level of my river nearby :partying_face:

I am also interested in getting the current gas level for Europe from this site: https://agsi.gie.eu/
I have copied the sector path as you described, but the sensor in Home Assistant stays unknown

div.tabulator-row:nth-child(1) > div:nth-child(3)

Maybe someone could help me to get the data into Home Assistant - that would be great!

This scrape method will not work. All of the data on that webpage is generated by javascript calling API. Check for page source to see if the data is actually there. So conventional ‘scrape’ will not work here. I did look at that page and found an API call. BUT that call without headers was not working well. Seems like API needed exact headers to work. With headers it was returning a JSON output that we still need to parse so this works for me:

sensor:
  #Aggregated Gas Storage Inventory in the EU
  - platform: rest
    name: AGSI EU Gas Storage Full
    value_template: "{{ value_json['data'][0]['full'] }}"
    unit_of_measurement: "%"
    scan_interval: 3600
    resource: https://agsi.gie.eu/api?date=today
    headers:
      Content-Type: application/json
      User-Agent: 'Mozilla/5.0 (iPad; CPU OS 15_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/104.0.5112.99 Mobile/15E148 Safari/604.1'
      Upgrade-Insecure-Requests: 1
      Referer: 'https://agsi.gie.eu/'

Also a command line sensor can be built from this python script

import requests; import json; 
headers = { 
    'Upgrade-Insecure-Requests': '1', 
    'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 15_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/104.0.5112.99 Mobile/15E148 Safari/604.1',
   'referer': 'https://agsi.gie.eu/'
};

resp = requests.get('https://agsi.gie.eu/api?date=today', headers=headers);
data = json.loads(resp.content);
print(data['data'][0]['full']);
2 Likes

Thanks for your help. Someone on facebook helped me with the rest sensor.

sensor rest:
  - platform: rest
    scan_interval: 3600
    name: Gas Speicher De
    resource: https://agsi.gie.eu/api?country=DE
    headers:
      content-type: "application/json"
      x-key: !secret gas_token #Hier den API Key eintragen! 
    json_attributes_path: "$.data[0].['.']"
    json_attributes:
      - name
      - code
      - url
      - gasDayStart
      - gasInStorage
      - consumption
      - consumptionFull
      - injection
      - withdrawl
      - workingGasVolume
      - injectionCapacity
      - status
      - trend
      - full
      - info
    value_template: >-
      {{ value_json.message }}
2 Likes

Here is the yaml for the template sensor.

template agsi:
  - sensor:
      - name: Füllstand Deutschland Total
        icon: mdi:gas-burner
        unit_of_measurement: "%"
        state: "{{ state_attr('sensor.gas_speicher_de', 'full') }}"
        device_class: "gas"
      - name: Gas im Speicher Deutschland
        icon: mdi:storage-tank
        unit_of_measurement: "TWh"
        state: "{{ state_attr('sensor.gas_speicher_de', 'gasInStorage') }}"
        #device_class: "gas"
      - name: Trend Gas Speicher Deutschland
        icon: mdi:storage-tank
        unit_of_measurement: "%"
        state: "{{ state_attr('sensor.gas_speicher_de', 'trend') }}"
        #device_class: "gas"
      - name: Gasverbrauch Deutschland
        icon: mdi:gas-burner
        unit_of_measurement: "TWh"
        state: "{{ state_attr('sensor.gas_speicher_de', 'consumption') }}"
      - name: Gasverbrauch Total Deutschland
        icon: mdi:gas-burner
        unit_of_measurement: "%"
        state: "{{ state_attr('sensor.gas_speicher_de', 'consumptionFull') }}"
1 Like

Awesome! Thanks for sharing.

Just in case someone will run into this thread: my solution does not require API registration. It mimics a website visit and therefore no need for an API key.

1 Like