Scrape a website with tables and changing rows

controlc · February 27, 2024, 3:55pm

Hello,
I would like to scrape a website. It is a website of my child’s school, on which the substitution schedule for the current or next day is displayed. On this website, however, I only want to extract the information relating to my child’s class. I’m also not sure how to save and display the filtered information. Basically it is a very simple HTML page without many parts. I already know that it is the third table on the page. But there is no fixed row that I have to scrape, but ultimately each row of the third table has to be analyzed to see if the first column contains the class “8-2”. Because only these rows are needed. Then the other cells in this row have to be saved individually so that I can display them as text in the frontend. The number of rows to be filtered as well as the number of filtered rows can certainly change from day to day.
Is this somehow feasible for HA? And if so, how could I go about it?

The syntax of the website is like that:

<!DOCTYPE html>
<html>
  <head>
    <title>Vertretungsplan</title>
    <meta charset="iso-8859-1" />
    <link href="vp.css" type="text/css" rel="stylesheet"/>
  </head>
  <body>
    [...]
    <br/>
    <span class="ueberschrift">Geänderte Unterrichtsstunden:</span>
    <br/>
    <br/>
    <table class="tablekopf" border="2">
      <tr>
        <th class="thplanklasse">Klasse/Kurs</th>
        <th class="thplanstunde">Stunde</th>
        <th class="thplanfach">Fach</th>
        <th class="thplanlehrer">Lehrer</th>
        <th class="thplanraum">Raum</th>
        <th class="thplaninfo">Info</th>
      </tr>
      <tr>
        <td class="tdaktionen">8-2</td>
        <td class="tdaktionen">5</td>
        <td class="tdaktionenneu">DA</td>
        <td class="tdaktionenneu">ABC</td>
        <td class="tdaktionen">112</td>
        <td class="tdaktionen">EN KST fällt aus</td>
      </tr>
     [...]
    </table>
    <br/>
  </body>
</html>

Thanks,
controlc