Scrape simple (?) select question

The webpage code interresting me looks like

<div class="minutecast-dial content-module non-ad">
	<p class="title">No precipitation for at least 120 min</p>

I am only trying to get this “No precipitation for at least 120 min” text. So I am trying to
`select: “.minutecast-dial.title p”

but with no luck.
Still [homeassistant.components.scrape.sensor] Unable to extract data from HTML

I was also trying with/without “p”, “.”, “title”, “minutecast-dial”, “content-module.non-ad.title”…

What should be the select in that case?

You want to select ".minutecast-dial p.title".

Your original is trying to look for a <p> child of an element that has both minutecast-dial and title classes, which doesn’t exist.

1 Like

Unfortunatelly @Troon, entity still has state “unknown” and this same error.

Please post your sensor configuration. Can’t help further with only the information provided.

of course

- platform: scrape
  name: Gdansk MinuteCast
  resource: https://www.accuweather.com/en/pl/gdansk/80-180/minute-weather-forecast/346299_pc
  select: ".minutecast-dial p.title"
  scan_interval: 60

there is also another sensor, also do not work:

- platform: scrape
  name: Gdansk MinuteCast updated
  resource: https://www.accuweather.com/en/pl/gdansk/80-180/minute-weather-forecast/346299_pc
  select: ".minutecast-dial p.update"

Might be worth asking here if they can add it to their integration ?

Hmm, same result here. Log:

2020-06-23 13:27:04 ERROR (SyncWorker_2) [homeassistant.components.scrape.sensor] Unable to extract data from HTML

So I fired up Python:

In [5]: import requests
In [6]: url = 'https://www.accuweather.com/en/pl/gdansk/80-180/minute-weather-forecast/346299_pc'
In [7]: page = requests.get(url)
In [8]: page.status_code
Out[8]: 403

and I’m getting a 403 Forbidden as a response, although I can view the web page just fine in a browser. Looks like the site is doing some browser-sniffing to repel scrapers. Let’s do a bit of lying in the user-agent string:

In [14]: page = requests.get(url, headers={'user-agent': 'Mozilla/5.0'})
In [15]: page.status_code
Out[15]: 200

A 200 OK response. So:

- platform: scrape
  name: scrapetest1
  resource: https://www.accuweather.com/en/pl/gdansk/80-180/minute-weather-forecast/346299_pc
  select: ".minutecast-dial p.title"
  headers:
    User-Agent: Mozilla/5.0
  scan_interval: 60

…and I’m officially a genius:

image

4 Likes

@Holdestmade thanks, but I quite well know that project;) Unfortunatelly company did not release MinuteCast for self-serve API yet. Expect that in next month but with no specified date.

@Troon, You deserve for separate, Your own comment.
I solemnly, publicly testify that you ARE a genius. My favourite one, at least in that part of Home Assistant coding.
Thank You a lot!

1 Like