Guidance for First pyscript

I’d like to configure HA to trigger every 5 minutes, and scrape a website of police/ems dispatches. A pyscript should interpret that data and update a hass-helper, as well as determine whether to notify me (notification service already setup)

I already have a python script working by itself, but I could not get the pyscript jupyter kernel to work, so I am looking for some guidance on how to integrate this with HA.

What I have so far:

def scrape():
    # DONE: hits the url
    # DONE: parses html
    # DONE: returns the datetimes, and dispatches

def keep(cutoff_datetime, keywords, payload):
    # the payload is the dispatches, plus datetimes

    # DONE: current_last_known is from the payload
    # DONE: filters the results to only dispatches that I want (are local, with keywords, since the prior_last_known
    # DONE: returns the dispatches I would care about (if any)

def main():
    # DONE: payload = scrape()

    # TBD: retrieve hass-helper prior_last_known?
    # DONE: calculate current_last_known from the data
    
    # DONE: keep(prior_last_known, keywords, payload)

    # DONE: check if the results are empty
        # TBD: if NOT, call a service to notify (this is already configured) 
    # TBD: update hass-helper prior_last_known to the current_last_known

A couple of questions:
(a) any glaring issues with my logic here
(b) any pointers, examples of similar pyscripts that I can use to complete this?
(c) will the timing/trigger have to be in this script, or will I be able to do a time-trigger in a different automation, and then call service - main() as the action?

This seems like it can be done with the scrape integration and a simple jinja template, and an automation that notifies you when the sensor changes state.

I tried that, but unfortunately the webpage objects are too complex to accomplish with the scrape integration, so I had to do it myself in python to get the data I needed. The Overview page does also say, it will most likely only work with simple web pages and it can be time-consuming to get the right section

In case someone comes across this post someday and finds it helpful, I’ll post what I have working now.

It’s not great, but I did get it working at a bare minimum.

Step 1: Install pyscript, dont forget to turn on the service and edit the config file so that your jupyter can talk to it.

Step 2: Get it working in jupyter first, this is really a time saver for debugging

Step 3: Add the code in the config/pyscript folder, and if you tag your functions as @service then they show up under services so you can call them with automations!

In my case, I am simply running this on a time pattern.

import aiohttp
from datetime import datetime
from bs4 import BeautifulSoup

def scrape():
    '''Scrapes the dispatch calls from a website, and returns the datetimes and the calls'''
    
    # start by grabbing the latest data
    URL = "XXXXXXXXXXXXXX"
    async with aiohttp.ClientSession() as session:
        async with session.get(URL) as resp:
            page = resp.text()
            soup = BeautifulSoup(page, "html.parser")
    results = soup.find_all("table")
    
###  --- 8< ----------- 8< ---
# 
# Snip: a bunch of custom stuff, to get the data back properly 
#
###  --- 8< ----------- 8< ---
        
    return datetimes, calls
    
def keep(cutoff, keywords, payload):
    '''Filters the calls for "new" ones, and only matching specific keywords'''
    
    # separate calls and datetimes
    datetimes = payload[0]
    calls = payload[1]
    
    # only keep certain ones
    keep = list()
    for i in range(len(calls)-1):
        
        # new ones based on time
        if datetimes[i] > cutoff:
            
            # presence of certain keywords
            for k in keywords:
                if k in calls[i]:
                    keep.append(i)
            keep = list(set(keep))
    
    return [datetimes[i] for i in keep], [calls[i] for i in keep]

@service
def scan_dispatch():
    '''Main wrapper for the entire process'''
    
    # TODO: How to make this service accept a list of keywords as a parameter?
    keywords = ['MyStreet','FireBox 99', 'MyParentsStreet','MyWork']
    
    # get the data from the website
    datetimes, calls = scrape()
    
    # calculate time helpers
    current_last_known = max(datetimes)
    prior_last_known = datetime.strptime(input_text.dispatch_last_known_datetime , "%Y-%m-%d %H:%M:%S")

    # filter for ones I care about
    results = keep(prior_last_known, keywords, [datetimes, calls])
        
    # TODO: managing the notification in the code, sucks because automations are much more powerful and easy to configure
        
    # only sent notification if the message > 0
    if len(calls)>0:
        service.call("notify", 
                     "mobile_app", 
                     blocking=True, 
                     return_response=False, 
                     message='Something happened you might want to be aware of', 
                     title='Dispatch Alert', 
                     data={'clickAction':"URL"})
    

    # update a counter so I can easily skip notification if there are 0
    input_number.recent_dispatch_count.set_value(len(calls))
    
    # update the last known time so subsequent calls will filter to only new ones
    input_text.dispatch_last_known_datetime.set_value(current_last_known)
    
    return True