So i have a scraper that writes a well formed JSON into a local directory. I’d like to read the values of that JSON and base some automations on it. My options, as far as i understand them, are:
read the file using command line sensor - cat //local/local.json - and template the results
trigger the original scrapy crawler via command line sensor and read the response directly
Anything i’ve missed? Should i be using json at all? Thanks for your thoughts as ever…
yeah - thanks fabaff. I tried the scrape sensor and it didn’t quite give enough targeted control. Although that might be my failure to get the most out of beautifulsoup. Basically scraping some train times and their status - 7 mins late, on time etc. Idea is to have a panel displaying status of regular commute trains, so that I don’t run for the train that’s late, or cancelled. As a first cut, I have my json in the format:
[
{"07:05": "5 mins late"},
{"07:37": "5 mins late"},
{"08:04": null},
{"08:19": "5 mins late"},
{"08:26": "7 mins late"}
]
which i’m grabbing via a command line sensor. perhaps i’ll run the scrapy command via cron at 5 minutes intervals from 6:30 - 8, so the json is up to date. next job is get that data into something i can query via an automation, and perhaps add an action like an email or tweet to alert me. anyway first steps…
I know that! the json is the output of my scrapy spider of the html. I couldn’t get the scraper sensor via beautifulsoup to pull out the sets of data i needed (almost certainly my ignorance) so i went for the super powers of scrapy which grabbed the train time, and its status, from a specific train search. So my data is good - now i just need to get it into some sort of HA state.