Dynamic Scrape/Multiscrape

Hello HA Community!

I’ve been having a hard time trying to scrape a minimalist, dynamic, constantly updated webpage. Our local school closings are posted and changed very frequently. The page updates basically a large table with no CSS and due to nature of the data, the nth value is dynamic based on the length of the list. I have been able to pull only the first value of the list by either using just a plain “tr” or “td” under “select” in Scrape. I can also find the correct value if I know the proper nth value which defeats the purpose.

How would I search/scrape this page to see if a particular school is on this list?

Here’s the page: https://ftp2.wjrt.com/school_closings/wjrtclosings.html

Thank you in advance for taking the time to read my question.

You could use td:contains(school name) > td as the selector, then choose indexes 0, 1, and 2 for the Name, Status, or reason as you want.

at least, based on the current content, it worked for me; I guess I am assuming each table row always has 3 columns.

@imthefrizzlefry Thanks for your reply! The list is back (no snow days between posts to work on this until today). How do I define indexes? I had limited success with Multiscrape getting a select_list but ran into the 255 character limit on state. Not sure how to set or poll an attribute. Just learning templating and more error than success right now.

Turns out I just needed to make a binary sensor for each school and it worked! Then I set an Automation that calls a scene when the binary sensor changes from Unavailable to Off. Thanks again! Although I’d still like to learn about calling/setting the indexes, if you have a moment.

sorry to miss your messages. I see the index as a UI element in the Scrape integration:


The above screenshot is the add sensor dialog in the UI for the scrape integration.

I don’t know if you are setting it up in the config file, but there is a pretty nice UI that makes it easy to define a resource (aka page you are connecting to), and a sensor.

looking at the documentation for the Scrape integration, I can see there is a value for index as part of the sensor map, so the YAML would look something like this:

scrape:
  - resource:  https://ftp2.wjrt.com/school_closings/wjrtclosings.html
    sensor:
      - name: "School Name Closing Status"
        select: "td:contains(school name)"
        index: 1
      - name: "School Name Closing Reason"
        select: "td:contains(school name)"
        index: 2

So, that would give you sensors for the status and reason values (I assume you wouldn’t care for the school name…