Reading sensor values from LOCAL json file (created using scrapy)

d2a · October 27, 2016, 10:08am

So i have a scraper that writes a well formed JSON into a local directory. I’d like to read the values of that JSON and base some automations on it. My options, as far as i understand them, are:

read the file using command line sensor - cat //local/local.json - and template the results
trigger the original scrapy crawler via command line sensor and read the response directly

Anything i’ve missed? Should i be using json at all? Thanks for your thoughts as ever…

fabaff · October 27, 2016, 5:19pm

I think that both ways should work.

A scrape sensor was introduced with the recent release which could serve as an option to your scraper.

d2a · October 28, 2016, 8:19am

yeah - thanks fabaff. I tried the scrape sensor and it didn’t quite give enough targeted control. Although that might be my failure to get the most out of beautifulsoup. Basically scraping some train times and their status - 7 mins late, on time etc. Idea is to have a panel displaying status of regular commute trains, so that I don’t run for the train that’s late, or cancelled. As a first cut, I have my json in the format:

[
{"07:05": "5 mins late"},
{"07:37": "5 mins late"},
{"08:04": null},
{"08:19": "5 mins late"},
{"08:26": "7 mins late"}
]

which i’m grabbing via a command line sensor. perhaps i’ll run the scrapy command via cron at 5 minutes intervals from 6:30 - 8, so the json is up to date. next job is get that data into something i can query via an automation, and perhaps add an action like an email or tweet to alert me. anyway first steps…

fabaff · October 28, 2016, 8:22am

You have JSON and beautifulsoup is for HTML

d2a · October 28, 2016, 8:26am

I know that! the json is the output of my scrapy spider of the html. I couldn’t get the scraper sensor via beautifulsoup to pull out the sets of data i needed (almost certainly my ignorance) so i went for the super powers of scrapy which grabbed the train time, and its status, from a specific train search. So my data is good - now i just need to get it into some sort of HA state.