Bin / Waste Collection

I have been trying this with the Chiltern District council site, and have hit a problem with session cookies. If I search at https://isa.chiltern.gov.uk/jointwastecalendar/ for a postcode, it returns a list of addresses and then I can select mine to get the bin data.

If I sniff the requests, I can see that I can get directly to the bin page e.g. https://isa.chiltern.gov.uk/jointwastecalendar/calendar.asp?uprn=100080536405

However, if I try this without visiting the first page and creating a session cookie, I get a 500 internal server error.

Is there any way to create a scrape sensor that fires off two requests, separated by a pause? Or perhaps two scrape sensors that don’t update automatically, but only update manually with an automation to fire them in order once a day?

I’m not sure how you’d do that in HA, but in node-red, you could create a flow that performs:

first web-scarpe -> waits for reply -> delay XX seconds after reply -> performs second web-scrape -> send result to HA (via MQTT in my case)

Thanks. Not currently using node red, but will look into it.

Managed to get this working using my own bodge, with a little help from https://www.experts-exchange.com/questions/26271031/PHP-screen-scraping-from-page-that-requires-you-to-be-loged-in.html.

In short, I have written a PHP script hosted on my own server that uses CURL to fire off the two requests, one after the other. The first request sets the session cookie, and the second collects the data for my address. I then use the PHP simple DOM parser to extract the information that I need, create an array, format it as JSON:

<?php
include('simple_html_dom.php');

error_reporting(E_ALL);

// READ THE FIRST PAGE TO SET THE COOKIE
$baseurl = 'https://isa.chiltern.gov.uk/jointwastecalendar/';

// GET THE ACTUAL DATA
$nexturl = 'https://isa.chiltern.gov.uk/jointwastecalendar/calendar.asp?uprn=xxxxx';

// SET UP OUR CURL ENVIRONMENT
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $baseurl);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR,  'cookie.txt');
curl_setopt($ch, CURLOPT_FAILONERROR, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);

// CALL THE FIRST PAGE
$htm = curl_exec($ch);
$err = curl_errno($ch);
$inf = curl_getinfo($ch);
if ($htm === FALSE)
{
    echo "\nCURL GET FAIL: $baseurl CURL_ERRNO=$err ";
    var_dump($inf);
    die();
}

// WAIT A RESPECTABLE PERIOD OF TIME
sleep(1);

// NOW ON TO THE NEXT PAGE
curl_setopt($ch, CURLOPT_URL, $nexturl);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_POSTFIELDS, '');

$xyz = curl_exec($ch);
$err = curl_errno($ch);
$inf = curl_getinfo($ch);
if ($xyz === FALSE)
{
    echo "\nCURL 2ND GET FAIL: $posturl CURL_ERRNO=$err ";
    var_dump($inf);
}

// PARSE DATA
$html = str_get_html($xyz);

$bins = [
	'rubbish' => $html->find('td', 5)->plaintext,
    'recycling' => $html->find('td', 8)->plaintext,
    'paper' => $html->find('td', 11)->plaintext,
    'food' => $html->find('td', 14)->plaintext,
    'garden' => $html->find('td',17)->plaintext
	];

header('Content-Type: application/json');
echo json_encode($bins);

?>

The data are then pulled in to Home Assistant using a REST Sensor:

- platform: rest
  resource: 'https://mywebhost/scrape.php'
  name: 'Garden Waste'
  value_template: '{{ value_json.garden }}'
  scan_interval: 3600

I am sure I could achieve the same using NodeRed or a python script in HA itself, but as I am more familiar with PHP this works for me.

2 Likes

A few changes to make it more user- and server-friendly. I updated the php script to calculate the number of days to the next collection:

<?php
include('simple_html_dom.php');

error_reporting(E_ALL);

// READ THE FIRST PAGE TO SET THE COOKIE
$baseurl = 'https://isa.chiltern.gov.uk/jointwastecalendar/';

// GET THE ACTUAL DATA
$nexturl = 'https://isa.chiltern.gov.uk/jointwastecalendar/calendar.asp?uprn=12345678';

// SET UP OUR CURL ENVIRONMENT
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $baseurl);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR,  'cookie.txt');
curl_setopt($ch, CURLOPT_FAILONERROR, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);

// CALL THE FIRST PAGE
$htm = curl_exec($ch);
$err = curl_errno($ch);
$inf = curl_getinfo($ch);
if ($htm === FALSE)
{
    echo "\nCURL GET FAIL: $baseurl CURL_ERRNO=$err ";
    var_dump($inf);
    die();
}

// WAIT A RESPECTABLE PERIOD OF TIME
sleep(1);

// NOW ON TO THE NEXT PAGE
curl_setopt($ch, CURLOPT_URL, $nexturl);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_POSTFIELDS, '');

$xyz = curl_exec($ch);
$err = curl_errno($ch);
$inf = curl_getinfo($ch);
if ($xyz === FALSE)
{
    echo "\nCURL 2ND GET FAIL: $posturl CURL_ERRNO=$err ";
    var_dump($inf);
}

// PARSE DATA
$html = str_get_html($xyz);

$bins = [
	'rubbish' => $html->find('td', 5)->plaintext,
    'recycling' => $html->find('td', 8)->plaintext,
    'paper' => $html->find('td', 11)->plaintext,
    'food' => $html->find('td', 14)->plaintext,
    'garden' => $html->find('td',17)->plaintext
	];

date_default_timezone_set ('Europe/London');
$now = new DateTime();

foreach ($bins as $key => &$field){
	  $field = end(explode(' ',$field));
      $field = \DateTime::createFromFormat('d/m/Y',$field);	
      $field = "In " . $field->diff($now)->format("%a") . " days";
	  if ($field == "In 0 days") {
		  $field = "Today";
	  };
	  	  if ($field == "In 1 days") {
		  $field = "Tomorrow";
	  };
};
	
header('Content-Type: application/json');
echo json_encode($bins);

?>

I then used a template sensor so that all bin dates can be parsed from a single server call:

- platform: rest
  name: bins
  resource: 'my-server-script.php'
  value_template: 'OK'
  json_attributes:
    - rubbish
    - recycling
    - food
    - garden
    - paper
- platform: template
  sensors:
    rubbish:
      friendly_name: 'General Rubbish'
      icon_template: 'mdi:trash-can'
      value_template: '{{ states.sensor.bins.attributes["rubbish"] }}'
    garden:
      friendly_name: 'Garden Waste'
      icon_template: 'mdi:tree'
      value_template: '{{ states.sensor.bins.attributes["garden"] }}'
    recycling:
      friendly_name: 'Recycling'
      icon_template: 'mdi:recycle'
      value_template: '{{ states.sensor.bins.attributes["recycling"] }}'
    paper:
      friendly_name: 'Paper and Cardboard'
      icon_template: 'mdi:package-variant'
      value_template: '{{ states.sensor.bins.attributes["paper"] }}'
    food:
      friendly_name: 'Food Waste'
      icon_template: 'mdi:food'
      value_template: '{{ states.sensor.bins.attributes["food"] }}'

This results in:

1 Like

With much help from this post, from @lolouk44 and The EPIC Time Conversion and Manipulation Thread!

This is the working sensors setup if anyone else finds it useful (postcode changed)

It is for East Riding Council:
https://binscollections.eastriding.gov.uk/Output/Results?searchString=XXXX+XXXt&ButtonSearchTrial=Search

bins

- platform: scrape
  resource: https://binscollections.eastriding.gov.uk/Output/Results?searchString=XXXX+XXXt&ButtonSearchTrial=Search
  select: ".er-bin-item-row-wrapper > div:nth-of-type(77) > div:nth-of-type(1) > div:nth-of-type(3) > div:nth-of-type(1)"
  name: EYRC Next Green Bin
  scan_interval: 3600

- platform: scrape
  resource: https://binscollections.eastriding.gov.uk/Output/Results?searchString=XXXX+XXX&ButtonSearchTrial=Search
  select: ".er-bin-item-row-wrapper > div:nth-of-type(77) > div:nth-of-type(1) > div:nth-of-type(4) > div:nth-of-type(1)"
  name: EYRC Next Blue Bin
  scan_interval: 3600

- platform: scrape
  resource: https://binscollections.eastriding.gov.uk/Output/Results?searchString=XXXX+XXX&ButtonSearchTrial=Search
  select: ".er-bin-item-row-wrapper > div:nth-of-type(77) > div:nth-of-type(1) > div:nth-of-type(5) > div:nth-of-type(1)"
  name: EYRC Next Brown Bin
  scan_interval: 3600

Also used this to create a sensor to display number of days until each one is collected, one of these needed for each bin. Date comes from the scraper as “Mon, 19 April 2019” so had to do a bit of conversion (thanks again @lolouk44):

- platform: template
  sensors:
    green_bin_days:
      value_template: >-
        {% set date_in = states.sensor.eyrc_next_green_bin.state|replace('\n', '') %}
        {% set bin = strptime((date_in), "%a, %d %B %Y") %}
        {% set diff = as_timestamp(bin) - as_timestamp(now()) %}
        {% set days = (diff /86400) | int %}
        {% if days == 0 %}
          Today
        {% elif days == 1 %}
          Tomorrow
        {% else %}
          {{ days }} days
        {% endif %}
3 Likes

Nice work :+1:

What you using for the bin colours - I was using custom UI but it’s broken.

Using these images from google, re-coloured and sized on a glance card:

blue_bin2 brown_bin2 green_bin2

1 Like

Custom UI has been fixed again :slight_smile:

1 Like

My bins just come every Monday for the general waste and then alternating weeks for green waste and recycling. Has anyone done anything simple like that just using dates? I hadn’t thought of it until I saw this thread so looking to piggy back off existing effort!

I did but then I was able to setup a scrape from my local council website
You can just use week numbers if you want something simple - then determine if week is odd or even

1 Like

Pretty much what I did.

Used @Rookeh script form Here

and For days left I added This

Cheers

mb

1 Like

Thanks! Perfect!

My little Raspberry Pi must be working hard for some reason as those scripts take some time to execute

1 Like

sorry all, been playing around for a while and cannot work out how to scrape info from my council (if thats the best thing here?)
note: they dont publish a calendar :frowning:


(fake address )

If you right click on the web page and select view source, it will breakdown the page format for
Had a quick look and the title is held in object H3
So if you scrape “h3:nth-of-type(1)” is should return “Your next bin collections for BN14 9EL”
Then just work out what other fields you want after that, so your next scrape would be
“h3:nth-of-type(1) td:nth-of-type(3)” would give your grey bin collection date
Trouble with scrape is you can only scrape one field at a time
So to get 4 sensors, you would have to do 4 scrapes

2 Likes

Thanks!! I was looking at the “inspect” page
Follow ups please

  1. How do you test this without updating your config.yaml and rebooting each time? Or it is painful
  2. Is there a better way then to scape the info you can recommend sorry?

Sorry - its painful, update sensor then reboot to retry
People have used code red to scrape but I haven’t played with that

1 Like

thanks, the first one worked (h3:nth-of-type(1)), however the value for this is always “unknown”

select: “h3:nth-of-type(1) td:nth-of-type(4)”

Thought I would share my setup for this. Fortunately a group from one of the tech schools in my city created an API for waste collection. I wanted to take it a bit further by making a readable version for my wife that displays Next Thursday's pickup is trash only. or Tomorrow's pickup includes recycling.. This template will include “next” if the day is 7 days away or greater and “today” or “tomorrow” when necessary.

image

The sensors used that I made from my city’s API:

  • states.sensor.trash.state returns trash pickup month-day-year. i.e. 1-6-2019
  • states.sensor.trash.state returns recycling pickup month-day-year. i.e. 1-6-2019
  • states.sensor.trash_day returns the pickup’s day of the week. i.e. Monday
- platform: template
  sensors:
    trash_day:
      icon_template: >-
        {% if states.sensor.trash.state == states.sensor.recycling.state %}
          mdi:recycle
        {% else %}
          mdi:delete
        {% endif %}
      value_template: >-
        {% set between = strptime(states.sensor.trash.state, '%m-%d-%Y') - strptime((''~now())[:10], '%Y-%m-%d') %}
        {% set between = float((''~between)[:2]) %}
        {% if between == '0:' %} # when less than a day difference between returns '0:' 
          {% set between = 0 %}  # set to just '0' without the colon
        {% endif %}
        {% set day = states.sensor.trash_day %}
        {% if between > 6 %}
          {% set prefix = 'Next' %}
        {% elif between < 1 %}
          {% set day = 'Today' %}
        {% elif between < 2 %}
          {% set day = 'Tomorrow' %}  
        {% endif %}
        {% if states.sensor.trash.state == states.sensor.recycling.state %}
          {{prefix}} {{day}}'s pickup includes recycling.
        {% else %}
          {{prefix}} {{day}}'s pickup is trash only.
        {% endif %}
1 Like

love this idea…Cant get it to work for me, but love the idea