Long post incoming!
Having struggled with this for a few days, I will post my experience/solutions here in case it helps someone else.
I live in Bucks, using the Chiltern waste calendar. The website uses a multistage process to identify the correct address and show data, with a cookie set from one page to the next. The only way to scrape the data therefore, is to use a headless browser to navigate between pages. This requires the use of a remote webdriver, and I couldn’t find any good documentation as to how to set this up. I also hit problems with the HA integration for Bucks, that I couldn’t get to create any entities.
I use Proxmox on my home server environment, so my solution is as follows:
- Set up an ubuntu LXC container, install docker, run a headless selenium webdriver
- Use the excellent script from @Robbrad to get the JSON data from the Chiltern site
- Serve this data via NGINX on my local network
- Create REST sensors in HA to parse and display the information
Setting up the webdriver
After trying a few ways of doing this, I settled on a docker container as the easiest and quickest solution.
Install docker:
apt install docker.io
Install standalone chrome. The following command installs it, runs it as a daemon and starts on boot, and forwards port 4444 to the host:
docker run -d -p 4444:4444 —restart=always selenium/standalone-chrome
Get bin data in JSON format and save it to the NGINX webserver directory
Install NGINX:
apt install nginx
Visit the IP of your container in a broswer to check it is up and running.
Next, clone the GIT repository to your home folder:
mkdir bins
cd bins
git init
git clone https://github.com/robbrad/UKBinCollectionData
Install python dependencies:
apt install pip
pip install pandas
pip install selenium
pip install uk_bin_collection
Navigate to the directory containing the collect_data.py script, and test it out:
python3 collect_data.py BuckinghamshireCouncil https://chiltern.gov.uk/collection-dates -s -p "HP11 XXX” -n "4 STREET NAME, HIGH WYCOMBE" -w http://localhost:4444/wd/hub
You should see some JSON returned:
{
"bins": [
{
"type": "Domestic Garden Collection",
"collectionDate": "27/11/2023"
},
{
"type": "Domestic Paper/Card Collection",
"collectionDate": "27/11/2023"
},
{
"type": "Domestic Refuse Collection",
"collectionDate": "20/11/2023"
},
{
"type": "Domestic Mixed Dry Recycling Collection",
"collectionDate": "27/11/2023"
},
{
"type": "Domestic Food Collection",
"collectionDate": "20/11/2023"
}
]
}
Once you are happy it is working, modify the collect_data.py script to save the output to the webserver directory. Replace the last line of the file:
print(data)
with:
f = open(‘/var/www/html/bins.json', 'w')
f.write(data)
f.close()
Now if you navigate to http://xxx.xxx.xxx.xxx/bins.json, you your get the json returned to the browser.
You’ll also need to add the script to a cron job to update regularly:
crontab -e
And add the line:
0 */6 * * * python3 /home/simon/bins/UKBinCollectionData/uk_bin_collection/uk_bin_collection/collect_data.py BuckinghamshireCouncil https://chiltern.gov.uk/collection-dates -s -p "HP11 XXX" -n "4 STREET NAME, HIGH WYCOMBE" -w http://localhost:4444/wd/hub
This updates the JSON file every 6 hours.
Sensor config in HA
This just requires some REST sensors to be defined:
- platform: rest
name: General Refuse Collection
resource: http://192.168.0.179/bins.json
value_template: "{{ value_json['bins'] | selectattr('type', 'match', 'Domestic Refuse Collection') | map(attribute='collectionDate') | list | first }} "
icon: mdi:trash-can
- platform: rest
name: Garden Waste Collection
resource: http://192.168.0.179/bins.json
value_template: "{{ value_json['bins'] | selectattr('type', 'match', 'Domestic Garden Collection') | map(attribute='collectionDate') | list | first }} "
icon: mdi:sprout
- platform: rest
name: Paper Recycling Collection
resource: http://192.168.0.179/bins.json
value_template: "{{ value_json['bins'] | selectattr('type', 'match', 'Domestic Paper/Card Collection') | map(attribute='collectionDate') | list | first }} "
icon: mdi:newspaper-variant
- platform: rest
name: Mixed Dry Recycling Collection
resource: http://192.168.0.179/bins.json
value_template: "{{ value_json['bins'] | selectattr('type', 'match', 'Domestic Mixed Dry Recycling Collection') | map(attribute='collectionDate') | list | first }} "
icon: mdi:recycle
- platform: rest
name: Food Waste Collection
resource: http://192.168.0.179/bins.json
icon: mdi:food
value_template: "{{ value_json['bins'] | selectattr('type', 'match', 'Domestic Food Collection') | map(attribute='collectionDate') | list | first }} "
The value_template finds each bin type and extracts the next collection date. Put it all in a card:
Hope someone finds this useful!