Scrape issue, help with “select”

CaptainSweatpants · June 26, 2023, 5:11am

I’m trying to scrape the next 120min of Accuweathers Minutecast data. At the time of writing this the value is “No Parcipitation for the next 120min”, I’m trying to grab that value. Unfortunately, its not obtaining any data


  - platform: scrape
    resource: https://www.accuweather.com/en/ca/hamilton/l8p/minute-weather-forecast/55490
    name: Accuweather MinuteCast Rain Forcast
    select: ".minute-cast-chart .current-summary .summary" 
    scan_interval: 627
    headers:
      User-Agent: Mozilla/5.0

The section of HTML I’m looking at on that page is line 2178 - “ No precipitation for at least 120 min “

<div class="minute-cast-chart">
    <div class="current-summary">
        <div class="summary">
            **No precipitation for at least 120 min**
        </div>
        <div class="conditions">
            <div class="conditions-icon">
                <img class="icon" src="/images/weathericons/38.svg" width="64" height="64">
                <div>
                    <p class="time">12:34 AM</p>
                    <p class="icon-phrase">No Precipitation</p>
                </div>
            </div>
            <div class="temps" style="display: block;">
                <div>
                    <span class="current-temp">22°</span>
                    <span class="current-temp-unit">C</span>
                </div>
                <div class="realfeel-temp">
                    <span class="realfeel-temp__label">RealFeel®</span>
                    <span class="value">22°</span>
                </div>
            </div>
        </div>
    </div>

Any help targeting that value would be greatly appreciated. Thanks in advance,

vingerha · June 26, 2023, 6:07am

Use this in the 'select ’ of the scrape sensor:

body > div > div.two-column-page-content > div.page-column-1 > div.page-content.content-module > div.minute-cast-chart > div.current-summary > div.summary

Troon · June 26, 2023, 6:51am

<avril> Tell me, why d’you have go and make things so complicated? </avril>

There’s only one element of class summary on that page, so the full selector tree isn’t necessary. The issue is that the web site is checking the user-agent string.

By using

User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"

in the header field like this:

and with a select of div.summary (and, importantly, no device class, state class or UoM):

we get:

Note that the UA string I used was just copy/pasted from my browser for ease. It’s probably only checking for “Mozilla” or something.

vingerha · June 26, 2023, 7:25am

Yeah, forgot that … I just right click the element and use the path without much thinking.
On the side, it is a bit odd that the accuweather integration does not provide this, it has a plethora of entities, not this one… and this one is (imo) high on interest

CaptainSweatpants · June 26, 2023, 3:50pm

Well that did the trick, it was my select statement. I did previous have that full User Agent list, i was t/s the issue and removed some of it before posting.

Also i havent used Scrape much, i had to recreate some old sensors i lost. I didnt know there was a Scrape intergration nowadays, that sure is helpful. I last created these sensors 5 years ago

Thats again, worked like a charm!