Problem with command_line or scrape sensor to scrape a text from a very simple webpage

bigcookie · November 2, 2020, 5:24pm

Hello,

I try to scrape a simple text from a very simple webpage. The commandline is very suboptimal, but it works in the HA docker container - I wanted to try it out before tweaking the command…
When loading it in HA, the sensor value is empty. Please note that I integrate my sensors via the following in “configuration-yaml”:

sensor: !include_dir_list sensors

and each sensor has its own file, which content is the below. Thus e.g. “platform” has no indent or hyphen.

platform: command_line
command: "curl 'https://opendata.dwd.de/weather/text_forecasts/html/VHDL50_DWPG_LATEST_html'  2>&1 | awk '/<pre/,/<\/pre>/' | sed 's/^<[\/a-z\"-:= ]*>//g' | tr -d '\n' | tr -d '\r'"
name: "Wetter heute Berlin"
scan_interval: 900

The intention is to read the summary or the full weather text, I will most likely do everything in awk, but I am not aware of full awk syntax, thus using the weird combination of commandline tools :-)…

I tried to use the scrape sensor as well:

platform: scrape
resource: https://opendata.dwd.de/weather/text_forecasts/html/VHDL50_DWPG_LATEST_html
name: "Wetter heute Berlin"
select: "pre"

but this doesnt work (with ’ select: “strong” ’ it works for the summary at least). According to my reading from beautiful soap it should work, but the output is “unknown”… This is why I tried the commandline sensor above as well.

Thanks, regards

VDRainer · November 2, 2020, 5:51pm

Take a look at your logs.
I’m pretty sure that your problem is that a state can not be more then 255 characters.
This also explains that it works with ‘strong’.

bigcookie · November 2, 2020, 7:25pm

Thanks, this explains my issues with the “scrape” sensor. I will probably need to use a value_template and split it up…

But the command_line sensors are not working at all. I dont get error messages from them in the log. They also dont work for the text between the “strong”-elements… Any idea, why this is?