Help to understand Scrape Sensor

Hi,

I would like to scrape data from web page: http://opole.kiedyprzyjedzie.pl/#/stops/122/departures

I would like to get this data:

When I right-click on the line in Firefox and choose Copy - Selector CSS then I get:
tr.stop-departures__line:nth-child(1) > td:nth-child(2)

How to make right configuration (select), as this one doesn’t work for me…

sensor:
  - platform: scrape
    resource: http://opole.kiedyprzyjedzie.pl/#/stops/122/departures
    name: bus
    select: "tr.stop-departures__line:nth-child(1) > td:nth-child(2)"

Well, your problem is those tabs are generated from javascript which python requests won’t “trigger”.

Take a look at the actual page source rather than inspecting the element and you’ll see that what you are looking for does not exist.

view-source:http://opole.kiedyprzyjedzie.pl/#/stops/122/departures

Because the scrape sensor backed uses ‘requests’ library, it just won’t work.

Thank you @jocnnor, I understand.
Any idea how to solve it in a different way? (when requested data are “behind” javascript)

You could request the JSON directly without the need for a scraper! http://opole.kiedyprzyjedzie.pl/api/departures/122

Then, instead of the scraper, use the RESTful integration:

(code is totally untested)

sensor:
  - platform: rest
    name: Platform 122
    resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
    json_attributes_path: $.rows.[0]
    json_attributes:
      - time
1 Like

EDIT: Oh, yeah…pauls idea is way better. Just use that lol. The solution below works for me (in python), but it wasn’t light weight at all. required a few packages installed…and uses chromium in the background to render the HTML.


This lib has a built-in BeautifulSoup scraper but also handles the javascript. You could just write a python script that can parse this and use a command_line sensor to run it every so often (default is every 30 seconds)

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://opole.kiedyprzyjedzie.pl/#/stops/122/departures')
r.html.render()  # this call executes the js in the page

output = r.html.find('.line-no', first=True)
print(output.text)

Thank you! It seems to be the easiest method to get requested data.
I will test it later on (currently I have a high cpu load problem and I am triing to figgure it out…).

BTW - how did you find to change www address from http://opole.kiedyprzyjedzie.pl/#/stops/122/departures to http://opole.kiedyprzyjedzie.pl/api/departures/122 ??

This is a key in this particular problem…
How did you know that you should add “api” and adjust the address accordingly?
I must admit that I tried to find access to api on the site (http://kiedyprzyjedzie.pl/), but with no result…

I used Chrome’s network inspector to look at what requests were being made when viewing the original link that you posted.

Got it! Thanks, yes, now it sound obvious…

Unfortunately, this doesn’t work:

sensor:
  - platform: rest
    name: Platform 122
    resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
    json_attributes_path: $.rows.[0]
    json_attributes:
      - time

I do not see any sensor “Platform 122” under dev tools/states…

Thank you for your help guys anyway!

The API response currently doesn’t have anything in the rows array, so I’m guessing it’s got an error trying to create it. Can you enable logging and see if there are any errors?

When I use

http://opole.kiedyprzyjedzie.pl/api/departures/122

now (after midnight), when no buses are going for next 4 hours then I get as Platform 122 state:

{"timestamp": 1608766765, "only_disembarking": false, "rows": [], "departure_time_limit": 14400, "directions": {}, "deviations": {}} 

But when I use different bus stop:

http://opole.kiedyprzyjedzie.pl/api/departures/2

Then I get error in log (Supervisor / System / Log provider -> Core):

homeassistant.exceptions.InvalidStateError: Invalid state encountered for entity id: sensor.platform_2. State max length is 255 characters.
2020-12-24 00:26:11 ERROR (MainThread) [homeassistant.components.sensor] Error while setting up rest platform for sensor

As I understand data should be templated somehow…

Oh I’m sorry! Reading the restful docs again, we need to use value_template to get the value out. the json_attributes* stuff is just to set sensor attributes, but not the value!

Can you try this:

sensor:
  - platform: rest
    name: Platform 122
    resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
    value_template: '{{ value_json.rows[0].time }}'
    json_attributes_path: $.rows.[0]
    json_attributes:
      - is_estimated
      - at_stop
      - cancelled
1 Like

It’s alive! :grinning: :muscle: :muscle:

I am just wondering, if I would like to have let’s say 3 next buses visible (line number and time), should I have this kind of sensor code:

- platform: rest
  name: Platform 122_0_time
  resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
  value_template: "{{ value_json.rows[0].time }}"
  json_attributes_path: $.rows.[0]
  json_attributes:
    - is_estimated
    - at_stop
    - cancelled

- platform: rest
  name: Platform 122_0_line
  resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
  value_template: "{{ value_json.rows[0].line_name }}"
  json_attributes_path: $.rows.[0]
  json_attributes:
    - is_estimated
    - at_stop
    - cancelled

- platform: rest
  name: Platform 122_1_time
  resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
  value_template: "{{ value_json.rows[1].time }}"
  json_attributes_path: $.rows.[1]
  json_attributes:
    - is_estimated
    - at_stop
    - cancelled

- platform: rest
  name: Platform 122_1_line
  resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
  value_template: "{{ value_json.rows[1].line_name }}"
  json_attributes_path: $.rows.[1]
  json_attributes:
    - is_estimated
    - at_stop
    - cancelled

- platform: rest
  name: Platform 122_2_time
  resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
  value_template: "{{ value_json.rows[2].time }}"
  json_attributes_path: $.rows.[2]
  json_attributes:
    - is_estimated
    - at_stop
    - cancelled

- platform: rest
  name: Platform 122_2_line
  resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
  value_template: "{{ value_json.rows[2].line_name }}"
  json_attributes_path: $.rows.[2]
  json_attributes:
    - is_estimated
    - at_stop
    - cancelled

This is working of course:

But:

  1. What if will be no rows [1] and [2]? (there are no further buses)
  2. “Code” looks dirty, sensor by sensor… I am quite sure that it has to be way to make it much cleaner (maybe some loop or something). Unfortunately, my experience regarding json, rest, etc. is almost 0… Do you think that is a better way to code it or this is a right way?

Ok, I am going back to home duties… Thank you so much for your help!

Merry Christmas! :evergreen_tree: :evergreen_tree: :evergreen_tree:

You could combine them and have them be attributes of a sensor to make it look cleaner but it makes it a little more work to pull the data back out (not much). If it works for you keep it. I would just create a duplicate sensor to play with for testing. The only thing I would do is utilize customize.yaml to give friendly names and icons like:


https://cdn.materialdesignicons.com/5.3.45/

As I thought, when there is no more bus, I get a lot of warnings in log:

WARNING (MainThread) [homeassistant.components.rest.sensor] JSON result was not a dictionary or list with 0th element a dictionary

Could someone help me to avoid it?
Maybe how to add some if (if there is no value, then e.g. == “xxx”)?

- platform: rest
  name: Platform 122_0_time
  resource: http://opole.kiedyprzyjedzie.pl/api/departures/122
  value_template: "{{ value_json.rows[0].time }}" <<<--- I believe that I need to add something here...
  json_attributes_path: $.rows.[0]
  json_attributes:
    - is_estimated
    - at_stop
    - cancelled

Please, can somebody help to me?
I need parsing prices from this JSON. https://www.ote-cr.cz/cs/kratkodobe-trhy/elektrina/denni-trh/@@chart-data
Ideally, all listed prices into one entity into attributes. Possible?

@Zordrac - were you able to figure this out in the end?

Unfortunately, I eventually abandoned the idea…