Scraper for Transport

I’m new to HA. I would like to implement a scraper. I’ve had a look at teh uk_transport sensor, and that’s structured very much for the next arrival of a bus/train/tube. It also has a throttle or you have to pay money.

Transport for London however has a rather nice REST interface documented here.

My idea is to convert a python scraper of mine to use the API to get information like this.

Screenshot 2021-09-19 181552

This scrapes fast so once a minute is possible.

The configuration I have is simple. In the UK there is a national unique identifier for stops, both for tube rail and buses.

I have a time to walk to the stop. If it takes two minutes then you can filter out buses due in the next two minutes.

I have a max count and a max time filter as well.

You can select from a list of lines [bus numbers or tube likes], and you can specify a direction.

So can anyone give me some pointers?

It’s not a single value output, in effect some JSON with a template to generate HTML

No interest in keeping historical values.

Can someone give me some pointers on where to look in the documentation?

Thanks. N

The first pointer is that scraping once a minute is 86400 scrapes a day, which could be considered an abuse of their service. Just because you can scrape that fast does not mean you should.

1 Like

It sounds like this isn’t scraping but using an API. It comes down to the terms of use. For targeted API calls for a single user it may not be an issue, but then again it may.

I’m not sure what you need help with. Do you need help with getting the data result, or implementing in HA?

1 Like

I’ve been thinking so more.

For Tom

In practice what I’m talking about is a time table in general. Either you download the time table, say once per day. Then you want to assume that its correct, and have the time until arrival/departure to be real time based off a real time clock.

You then have “real time timetables” such as TFL’s API, which is just updating the time table more dynamically.

So on the scrapes per day, that deals with most of that, and since you know when say buses are active, it also cuts down dramatically on any polling.

For Matthew

It’s some help on the HA side, API bit I’ve got going.

So it’s getting back a list / array.

Second the display part is to put up the list, and to incorporate a real time clock, so that expected time of arrival - current time = time to arrival, and the time to arrival is live.

Does that make sense?

I would look into creating multiple sensors whose value is a timestamp and the device class is timestamp. See https://developers.home-assistant.io/docs/core/entity/sensor. Each sensor would represent one time.

I have a sensor with a list of attributes with various values for next 4 arriving transports in a schedule for my stop.

Then on a corresponding card I display the time of the earliest bus (3’ - means bus [160] arrives in 3 minutes to the stop ‘Burger King’) and 3 coming buses…
ha_bkk_card

The sensor is updated every 5 min. But a panel, installed next to my entrance door, I have a button to update the schedule to get the latest information.

I do not like even this 5 min update because for a night time I do not need it at all. I would prefer to run an update not in fixed timeframe but somehow flaxible. Or during a long lockdown I did not ‘watch’ at this info at all but it kept running… not nice tho.