Need help with Scrape sensor for dynamic java webpage

Hey guys,

i tried my best, but unfortunateley i am unable to scrape a simple table on that website of my local transportation service.

sensor:
  - platform: scrape
    name: "Abfahrt Nußbaumerstraße Linie"
    resource: "https://www.kvb.koeln/qr/246/"
    select: "table#qr_ergebnis tbody tr td"
    index: 0
    headers:
      User-Agent: Mozilla/5.0

My goal is to get the information on the next train departing from this station. After some research i think i am stuck because this is a dynamic webpage and the big table is loading afterwards. Is there any way to handle this kind of pages?

Hopefully someone can help. I tried it with https://try.jsoup.org/ and it seems to be right!?
Dunno whats wrong :confused:

1 Like

Not sure what you mean by dynamic, but it’s definitely not a page that pulls the data via javascript: just display the page source and you’ll find the table there.
From what I guess you want to retrieve the line #. This can be achieved with this:

  - platform: scrape
    name: "Abfahrt Nußbaumerstraße Linie"
    resource: "https://www.kvb.koeln/qr/246/"
    select: "td"
    index: 4
    headers:
      User-Agent: Mozilla/5.0

Wow, thank you!

My Error did come from that “tbody” element in the select-statement. That page does not have a named tbody-html-element. Now, this works like a charm:

sensor:
  - platform: scrape
    name: "Abfahrt Nußbaumerstraße Info"
    resource: "https://www.kvb.koeln/qr/246/"
    select: ".qr_td"
    index: 0
    headers:
      User-Agent: Mozilla/5.0
  - platform: scrape
    name: "Abfahrt Nußbaumerstraße 1 Linie"
    resource: "https://www.kvb.koeln/qr/246/"
    select: "table#qr_ergebnis tr td"
    index: 0
    headers:
      User-Agent: Mozilla/5.0
  - platform: scrape
    name: "Abfahrt Nußbaumerstraße 1 Ziel"
    resource: "https://www.kvb.koeln/qr/246/"
    select: "table#qr_ergebnis tr td"
    index: 1
    headers:
      User-Agent: Mozilla/5.0
  - platform: scrape
    name: "Abfahrt Nußbaumerstraße 1 Abfahrt"
    resource: "https://www.kvb.koeln/qr/246/"
    select: "table#qr_ergebnis tr td"
    index: 2
    headers:
      User-Agent: Mozilla/5.0

This is, what my lovelace-card looks like:

Here is the lovelace-code:

card:
  entities:
    - entity: sensor.abfahrt_nussbaumerstrasse_1_abfahrt
      name: '${vars[0] + "\xa0".repeat(5) + vars[1]}'
      type: 'custom:multiple-entity-row'
    - entity: sensor.abfahrt_nussbaumerstrasse_2_abfahrt
      name: '${vars[2] + "\xa0".repeat(5) + vars[3]}'
      type: 'custom:multiple-entity-row'
    - entity: sensor.abfahrt_nussbaumerstrasse_3_abfahrt
      name: '${vars[4] + "\xa0".repeat(5) + vars[5]}'
      type: 'custom:multiple-entity-row'
  show_header_toggle: false
  title: Abfahrt Nußbaumerstraße
  type: entities
entities:
  - sensor.abfahrt_nussbaumerstrasse_1_abfahrt
  - sensor.abfahrt_nussbaumerstrasse_2_abfahrt
  - sensor.abfahrt_nussbaumerstrasse_3_abfahrt
type: 'custom:config-template-card'
variables:
  - 'states[''sensor.abfahrt_nussbaumerstrasse_1_linie''].state'
  - 'states[''sensor.abfahrt_nussbaumerstrasse_1_ziel''].state'
  - 'states[''sensor.abfahrt_nussbaumerstrasse_2_linie''].state'
  - 'states[''sensor.abfahrt_nussbaumerstrasse_2_ziel''].state'
  - 'states[''sensor.abfahrt_nussbaumerstrasse_3_linie''].state'
  - 'states[''sensor.abfahrt_nussbaumerstrasse_3_ziel''].state'

Perhaps I should spend my time in developing a integration for the transportation service of cologne, the ‘KVB’ … :slight_smile:

4 Likes

Quick question: Is this solution still valid or has the underlying source system changed and this approach is no longer operational?

Hello LagaV,

This version is actually outdated, as the KVB website is now dynamic and very difficult to scrape, I have not been able to do it yet.

But: I got behind it in the last days and found out that the Verkehrsverbund Rhein-Sieg (VRS) provides the data via HTML :smiley:

transport.yaml

rest:
  - resource: https://www.vrs.de/index.php?eID=tx_vrsinfo_departuremonitor&i=ccf18ded5585169c2bc1ac8d07693055
    scan_interval: 120
    timeout: 20
    headers:
      User-Agent: Mozilla/5.0
    sensor:
      - name: "KVB Abfahrt 1 Line"
        value_template: '{{ value_json.events[0].line.number }}'
        json_attributes_path: $.events[0].line
        json_attributes:
          - direction
          - product
      - name: "KVB Abfahrt 1 Departure"
        value_template: '{{ iif(value_json.events[0].departure.estimate is defined, value_json.events[0].departure.estimate, value_json.events[0].departure.timetable) }}'
        json_attributes_path: $.events[0].departure
        json_attributes:
          - timetable
          - timestamp
          - estimate
          - delayed
          - day
      - name: "KVB Abfahrt 2 Line"
        value_template: '{{ value_json.events[1].line.number }}'
        json_attributes_path: $.events[1].line
        json_attributes:
          - direction
          - product
      - name: "KVB Abfahrt 2 Departure"
        value_template: '{{ iif(value_json.events[1].departure.estimate is defined, value_json.events[1].departure.estimate, value_json.events[1].departure.timetable) }}'
        json_attributes_path: $.events[1].departure
        json_attributes:
          - timetable
          - timestamp
          - estimate
          - delayed
          - day
      - name: "KVB Abfahrt 3 Line"
        value_template: '{{ value_json.events[2].line.number }}'
        json_attributes_path: $.events[2].line
        json_attributes:
          - direction
          - product
      - name: "KVB Abfahrt 3 Departure"
        value_template: '{{ iif(value_json.events[2].departure.estimate is defined, value_json.events[2].departure.estimate, value_json.events[2].departure.timetable) }}'
        json_attributes_path: $.events[2].departure
        json_attributes:
          - timetable
          - timestamp
          - estimate
          - delayed
          - day

homeassistant:
  customize_glob:
    "sensor.kvb_abfahrt_*_line":
      icon: mdi:train
    "sensor.kvb_abfahrt_*_departure":
      icon: mdi:clock-outline

The first thing you need to do is to create a personal request here:

The generated ID (https://www.vrs.de/am/s/<ID>) you put into the URL (https://www.vrs.de/index.php?eID=tx_vrsinfo_ass2_departuremonitor&i=<ID>). In my example, the query is for the stop Chlodwigplatz in Südstadt.

Lovelace:

cards:
  - type: custom:config-template-card
    variables:
      - states['sensor.kvb_abfahrt_1_line'].state
      - states['sensor.kvb_abfahrt_1_line'].attributes.direction
      - states['sensor.kvb_abfahrt_2_line'].state
      - states['sensor.kvb_abfahrt_2_line'].attributes.direction
      - states['sensor.kvb_abfahrt_3_line'].state
      - states['sensor.kvb_abfahrt_3_line'].attributes.direction
    entities:
      - ${vars[0]}
      - ${vars[2]}
      - ${vars[4]}
    card:
      type: entities
      entities:
        - entity: sensor.kvb_abfahrt_1_departure
          icon: mdi:train
          name: ${vars[0] + "\xa0".repeat(5) + vars[1]}
          type: custom:multiple-entity-row
        - entity: sensor.kvb_abfahrt_2_departure
          icon: mdi:train
          name: ${vars[2] + "\xa0".repeat(5) + vars[3]}
          type: custom:multiple-entity-row
        - entity: sensor.kvb_abfahrt_3_departure
          icon: mdi:train
          name: ${vars[4] + "\xa0".repeat(5) + vars[5]}
          type: custom:multiple-entity-row
gridcol: 1
gridrow: 1
type: vertical-stack
title: Abfahrt Chlodwigplatz

image

Please let me know if the instructions were comprehensible and work!

Many greetings from the Südstadt!

1 Like

Hi blackmesa,

thanks for providing this solution. Works as expected!

I struggled a little bit with transport.yaml and finally put all configuration stuff in configuration.yaml as my attempts to !include the additional yaml failed.

Additionally I got errors for missing cards, so in my case needed to add multiple-entity-row.

Greetings back to Südstadt from a borough nearby.

Thanks,
LagaV

@blackmesa : Did the VRS site change some days back and the results are no longer available or is there a problem with my system?

It is also not working for me anymore . . .

Looks like the source system is broken:

image

When trying to generate a new ID I see, that plenty of stations (Cologne (KVB), but might be complete VRS) aren’t available to select / not offered to select.

The URL changed from

https://www.vrs.de/index.php?eID=tx_vrsinfo_ass2_departuremonitor&i=ccf18ded5585169c2bc1ac8d07693055

to

https://www.vrs.de/index.php?eID=tx_vrsinfo_departuremonitor&i=ccf18ded5585169c2bc1ac8d07693055

I edited the post above :slight_smile:

1 Like

Thanks! Works as expected again.

Hey :wink:

same here! Did you ever figure out how to !include the transport.yaml?
I got it working the way you did, but it is a bit too messy for my taste.

Greetings from Poll :wink:

It’s easy :slight_smile:

I separated my yaml-config in shorter yaml-files in a “packages” folder in the config-folder. Then just add this to your configuration.yaml.

homeassistant:
  packages: !include_dir_named packages

https://www.home-assistant.io/docs/configuration/packages/

Otherwise just include all the code from the transport.yaml to your configuration.yaml :slight_smile:

1 Like