I’m trying to use the built-in scrape sensor to grab weather alerts from Environment Canada. Ideally, I want to grab two tags and join them, but not have to create two separate sensors. I’ve read through the Beautiful Soup documentation, and tried a few things but no luck so far.
So I’m trying to grab both the “title” and “summary” from under the “entry” tag. I can easily get either, but not both. This current setup only gives me the title.
Solved it by hacking a custom version of the scrape sensor.
If anyone cares, I added an optional second “select” field that if it exists just gets joined to the get of the data gathered by the first “select” field.
config\custom_components\sensor\scrape.py
"""
Support for getting data from websites with scraping.
For more details about this platform, please refer to the documentation at
https://home-assistant.io/components/sensor.scrape/
"""
import logging
import voluptuous as vol
from requests.auth import HTTPBasicAuth, HTTPDigestAuth
from homeassistant.components.sensor import PLATFORM_SCHEMA
from homeassistant.components.sensor.rest import RestData
from homeassistant.const import (
CONF_NAME, CONF_RESOURCE, CONF_UNIT_OF_MEASUREMENT, STATE_UNKNOWN,
CONF_VALUE_TEMPLATE, CONF_VERIFY_SSL, CONF_USERNAME, CONF_HEADERS,
CONF_PASSWORD, CONF_AUTHENTICATION, HTTP_BASIC_AUTHENTICATION,
HTTP_DIGEST_AUTHENTICATION)
from homeassistant.helpers.entity import Entity
import homeassistant.helpers.config_validation as cv
REQUIREMENTS = ['beautifulsoup4==4.6.3']
_LOGGER = logging.getLogger(__name__)
CONF_ATTR = 'attribute'
CONF_SELECT = 'select'
CONF_SELECT2 = 'select2'
DEFAULT_NAME = 'Web scrape'
DEFAULT_VERIFY_SSL = True
PLATFORM_SCHEMA = PLATFORM_SCHEMA.extend({
vol.Required(CONF_RESOURCE): cv.string,
vol.Required(CONF_SELECT): cv.string,
vol.Optional(CONF_SELECT2): cv.string,
vol.Optional(CONF_ATTR): cv.string,
vol.Optional(CONF_AUTHENTICATION):
vol.In([HTTP_BASIC_AUTHENTICATION, HTTP_DIGEST_AUTHENTICATION]),
vol.Optional(CONF_HEADERS): vol.Schema({cv.string: cv.string}),
vol.Optional(CONF_NAME, default=DEFAULT_NAME): cv.string,
vol.Optional(CONF_PASSWORD): cv.string,
vol.Optional(CONF_UNIT_OF_MEASUREMENT): cv.string,
vol.Optional(CONF_USERNAME): cv.string,
vol.Optional(CONF_VALUE_TEMPLATE): cv.template,
vol.Optional(CONF_VERIFY_SSL, default=DEFAULT_VERIFY_SSL): cv.boolean,
})
def setup_platform(hass, config, add_entities, discovery_info=None):
"""Set up the Web scrape sensor."""
name = config.get(CONF_NAME)
resource = config.get(CONF_RESOURCE)
method = 'GET'
payload = None
headers = config.get(CONF_HEADERS)
verify_ssl = config.get(CONF_VERIFY_SSL)
select = config.get(CONF_SELECT)
select2 = config.get(CONF_SELECT2)
attr = config.get(CONF_ATTR)
unit = config.get(CONF_UNIT_OF_MEASUREMENT)
username = config.get(CONF_USERNAME)
password = config.get(CONF_PASSWORD)
value_template = config.get(CONF_VALUE_TEMPLATE)
if value_template is not None:
value_template.hass = hass
if username and password:
if config.get(CONF_AUTHENTICATION) == HTTP_DIGEST_AUTHENTICATION:
auth = HTTPDigestAuth(username, password)
else:
auth = HTTPBasicAuth(username, password)
else:
auth = None
rest = RestData(method, resource, auth, headers, payload, verify_ssl)
rest.update()
if rest.data is None:
_LOGGER.error("Unable to fetch data from %s", resource)
return False
add_entities([
ScrapeSensor(rest, name, select, select2, attr, value_template, unit)], True)
class ScrapeSensor(Entity):
"""Representation of a web scrape sensor."""
def __init__(self, rest, name, select, select2, attr, value_template, unit):
"""Initialize a web scrape sensor."""
self.rest = rest
self._name = name
self._state = STATE_UNKNOWN
self._select = select
self._select2 = select2
self._attr = attr
self._value_template = value_template
self._unit_of_measurement = unit
@property
def name(self):
"""Return the name of the sensor."""
return self._name
@property
def unit_of_measurement(self):
"""Return the unit the value is expressed in."""
return self._unit_of_measurement
@property
def state(self):
"""Return the state of the device."""
return self._state
def update(self):
"""Get the latest data from the source and updates the state."""
self.rest.update()
from bs4 import BeautifulSoup
raw_data = BeautifulSoup(self.rest.data, 'html.parser')
_LOGGER.debug(raw_data)
try:
if self._attr is not None:
value = raw_data.select(self._select)[0][self._attr]
else:
if self._select2 is not None:
value = raw_data.select(self._select)[0].text + " " + raw_data.select(self._select2)[0].text
else:
value = raw_data.select(self._select)[0].text
_LOGGER.debug(value)
except IndexError:
_LOGGER.error("Unable to extract data from HTML")
return
if self._value_template is not None:
self._state = self._value_template.render_with_possible_json_value(
value, STATE_UNKNOWN)
else:
self._state = value
I implemented the scrape sensor a few days ago, but didn’t know what the alert actually showed up as (and didn’t care to look for a place with an alert at the time). I set a notification to appear when there was an alert, and went to work formatting. And here we are!
The style of that screenshot looks great, might steal it for how to display in HA card
FWIW, I created an Environment Canada driver for my HA system about nine years ago. It gets its data from EnviroCan’s XML feeds (not RSS). For example, here’s the URL for your neck of the woods:
The XML contains a warnings node. Here’s what it says for Collingwood right now. The important stuff is in the first two lines (high priority warning, snow squall warning):
My driver polls their site every 30 minutes and gets the latest weather data. It also checks if warnings contains data and then announces it in my home:
“Attention! There is a high priority weather warning in effect. Snow Squall Warning.”
It also displays the warning on the thermostat and changes the thermostat’s backlight color to red (warning=red, watch=yellow, end of warning/watch=green).
The one thing EnviroCan has overlooked to include in the XML data (for > 9 years) is the weather warning’s descriptive text! My driver follows the URL in the warnings node then scrapes that web-page for the text. Over the years they’ve changed the HTML formatting so every few years or so I have to tweak the code to adapt to their modifications.
Whereas the RSS link provides a summary of the weather warning (i.e. image in my previous post), EnviroCan often has a lot more to say on their warnings web-page : https://weather.gc.ca/warnings/report_e.html?on18