Managed to get this working using my own bodge, with a little help from https://www.experts-exchange.com/questions/26271031/PHP-screen-scraping-from-page-that-requires-you-to-be-loged-in.html.
In short, I have written a PHP script hosted on my own server that uses CURL to fire off the two requests, one after the other. The first request sets the session cookie, and the second collects the data for my address. I then use the PHP simple DOM parser to extract the information that I need, create an array, format it as JSON:
<?php
include('simple_html_dom.php');
error_reporting(E_ALL);
// READ THE FIRST PAGE TO SET THE COOKIE
$baseurl = 'https://isa.chiltern.gov.uk/jointwastecalendar/';
// GET THE ACTUAL DATA
$nexturl = 'https://isa.chiltern.gov.uk/jointwastecalendar/calendar.asp?uprn=xxxxx';
// SET UP OUR CURL ENVIRONMENT
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $baseurl);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_FAILONERROR, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// CALL THE FIRST PAGE
$htm = curl_exec($ch);
$err = curl_errno($ch);
$inf = curl_getinfo($ch);
if ($htm === FALSE)
{
echo "\nCURL GET FAIL: $baseurl CURL_ERRNO=$err ";
var_dump($inf);
die();
}
// WAIT A RESPECTABLE PERIOD OF TIME
sleep(1);
// NOW ON TO THE NEXT PAGE
curl_setopt($ch, CURLOPT_URL, $nexturl);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_POSTFIELDS, '');
$xyz = curl_exec($ch);
$err = curl_errno($ch);
$inf = curl_getinfo($ch);
if ($xyz === FALSE)
{
echo "\nCURL 2ND GET FAIL: $posturl CURL_ERRNO=$err ";
var_dump($inf);
}
// PARSE DATA
$html = str_get_html($xyz);
$bins = [
'rubbish' => $html->find('td', 5)->plaintext,
'recycling' => $html->find('td', 8)->plaintext,
'paper' => $html->find('td', 11)->plaintext,
'food' => $html->find('td', 14)->plaintext,
'garden' => $html->find('td',17)->plaintext
];
header('Content-Type: application/json');
echo json_encode($bins);
?>
The data are then pulled in to Home Assistant using a REST Sensor:
- platform: rest
resource: 'https://mywebhost/scrape.php'
name: 'Garden Waste'
value_template: '{{ value_json.garden }}'
scan_interval: 3600
I am sure I could achieve the same using NodeRed or a python script in HA itself, but as I am more familiar with PHP this works for me.