Query on scraping websites

denver · November 14, 2019, 10:07pm

Ive just created my first python script and placed it in the python_scripts folder,

The script scraps a council website for rubbish day information and I have a sensor set up as:

- platform: command_line
  name: Rubbish
  command: "python3 /config/python_scripts/rubbish1.py"

I then have a sensor card in Lovelace that displayed the text printed from the python script. All very basic as I know but I’m new to this and have only dabbled in python in the last couple of months.

My question is, How often would this python script be run, in other words what defines the frequency of the script running as I don’t want to be scraping the website more than once a day.

The script is:

import datetime
import requests
from bs4 import BeautifulSoup
from math import floor
from datetime import date


# Returns the week of the month for the specified date.
def week_of_month_floor(dt):
    first_day = dt.replace(day=1)
    dom = dt.day
    adjusted_dom = dom + first_day.weekday()

    return int(floor(adjusted_dom / 7.0))


def get_sack_colour(val):
    for sack_col, rubbish_day in rubbish.items():
        if val == rubbish_day:
            sack_col = sack_col.split(" ")
            return sack_col[0]


# Scrapes rubbish collection dates
URL = "https://apps.castlepoint.gov.uk/cpapps/index.cfm?roadID=2757&fa=wastecalendar.displayDetails"
raw_html = requests.get(URL).text
data = BeautifulSoup(raw_html, "html.parser")
todays_date = datetime.date.today()
print(todays_date.day)
weeks_number_f = week_of_month_floor(datetime.datetime.now())
#####
rubbish = {}
for idx in range(4):
    rubbish["black " + str(idx)] = int(data.select(".normal")[idx].text)
    rubbish["pink " + str(idx)] = int(data.select(".pink")[idx].text)
#####
print(f"Selected scrap {rubbish}")
### Cycle through rubbish.items to find the date and runs the def get_sack_colour
for sack_col, rubbish_day in rubbish.items():
    if (todays_date.day) == rubbish_day:
        print(f"Sack colour today is {get_sack_colour(todays_date.day)}")

for sack_col, rubbish_day in rubbish.items():
    for day in range(1, 7):
        tdelta = datetime.timedelta(days=day)
        next_rubbish_day = (todays_date + tdelta).day
        if next_rubbish_day == rubbish_day:
            print(
                f"Sack colour next week is {get_sack_colour(next_rubbish_day)} on the {rubbish_day}"
            )
            exit()

tom_l · November 14, 2019, 10:11pm