Because SQLite’s write performance isn’t great, the initial database building time is rather slow and the schedule queries themselves are also pretty dismal. Here’s a workaround to alleviate the long SQLite database build process and speed up the schedule queries:
Background:
Converting my local transit company’s GTFS files takes around 10 minutes on HASS.io running on my Raspberry Pi 3 with its Samsung EVO+ SD card, and then the schedule queries freeze access to the card every time they run. Argh!
Doing the conversion using a Docker instance of HA running on Macbook Pro 2015 with its SSD still takes around 4 minutes yet the schedule queries take about 5 seconds from the command line. Bummer.
The component’s schedule query is pretty beefy with 5 JOINs on large tables:
SELECT trip.trip_id,
trip.route_id,
time(origin_stop_time.departure_time),
time(destination_stop_time.arrival_time),
time(origin_stop_time.arrival_time),
time(origin_stop_time.departure_time),
origin_stop_time.drop_off_type,
origin_stop_time.pickup_type,
origin_stop_time.shape_dist_traveled,
origin_stop_time.stop_headsign,
origin_stop_time.stop_sequence,
time(destination_stop_time.arrival_time),
time(destination_stop_time.departure_time),
destination_stop_time.drop_off_type,
destination_stop_time.pickup_type,
destination_stop_time.shape_dist_traveled,
destination_stop_time.stop_headsign,
destination_stop_time.stop_sequence
FROM trips trip
INNER JOIN calendar calendar
ON trip.service_id = calendar.service_id
INNER JOIN stop_times origin_stop_time
ON trip.trip_id = origin_stop_time.trip_id
INNER JOIN stops start_station
ON origin_stop_time.stop_id = start_station.stop_id
INNER JOIN stop_times destination_stop_time
ON trip.trip_id = destination_stop_time.trip_id
INNER JOIN stops end_station
ON destination_stop_time.stop_id = end_station.stop_id
WHERE calendar.tuesday = 1
AND time(origin_stop_time.departure_time) > time('14:00:00')
AND start_station.stop_id = 3333
AND end_station.stop_id = 5555
AND origin_stop_time.stop_sequence < destination_stop_time.stop_sequence
AND calendar.start_date <= '2018-01-02'
AND calendar.end_date >= '2018-01-02'
ORDER BY origin_stop_time.departure_time
LIMIT 1;
Initial solution:
Adding the following missing indexes for the large table stop_times didn’t help speed anything much since we’re doing a non-deterministic function operation time()
on the column departure_time:
CREATE INDEX idx_departure_time ON stop_times(departure_time);
CREATE INDEX idx_stop_sequence ON stop_times(stop_sequence);
Better solution:
To speed things up, I still build the database on my laptop with its much faster storage device before trimming the (~90MB) database to the only stops that I care about. I then reduce its file size to a smaller footprint (~10MB) before moving it to the Raspberry Pi where my production instance of HASS.io runs. Alternatively, these steps could also be done directly on the Raspberry Pi’s SD card …if you like watching grass grow.
In SQLite:
.backup gtfs-backup.sqlite
DELETE FROM stop_times WHERE stop_id NOT IN (3333, 5555);
DELETE FROM stops WHERE stop_id NOT IN (3333, 5555);
VACUUM;
.backup gtfs-small.sqlite
At the command prompt:
cp ./gtfs-small.sqlite /Volumes/config/gtfs/NAME_OF_DATA_SOURCE.sqlite
Then restart HA. The above query then runs almost instantly with no freezing of the SD card.