Multiscrape and multi entity help

Hi, I’m still very new to HA. I’m trying to put something on my wife’s dashboard, if I keep her happy I get more HA time.

I am using multiscrape to get info from a webpage plain html. and then display it on her dashboard. I feel there must be a more efficient way. The webpage is plain html with a table 11 rows and 5 columns. (I control the webpage so of there is a better way to put that up let me know that also.) I feel my “scrape” is too basic and therefore crazy busy.
right now (testing) I’m only pulling a little data basically a uniquename, name, and select statement for each cell in the table. is there a more efficient route?


sensor:
    - unique_id: r1c1
      name: r1c1
      select: "table > tr > td"
    - unique_id: r1c2
      name: r1c2
      select: "table > tr > td:nth-child(2)"
    - unique_id: r1c3
      name: r1c3
      select: "table > tr > td:nth-child(3)"
    - unique_id: r1c4
      name: r1c4
      select: "table > tr > td:nth-child(4)"
    - unique_id: r2c1
      name: r2c1
      select: "table > tr:nth-child(2) >td"
    - unique_id: r2c2
      name: r2c2
      select: "table > tr:nth-child(2) > td:nth-child(2)"
    - unique_id: r2c3
      name: r2c3
      select: "table > tr:nth-child(2) > td:nth-child(3)"
    - unique_id: r2c4
      name: r2c4
      select: "table > tr:nth-child(2) > td:nth-child(4)"

etc…
Thanks

Yes. Output something machine-readable like JSON instead of / as well as the HTML table.

Alternatively, give each cell an HTML id like this:

<table>
<tbody>
<tr>
<td id="r1c1">First</td>
<td id="r1c2">Second</td>
</tr>
</tbody>
</table>

and then your select can be simply "td#r1c1", for example.

1 Like

can you point me to some instructions on using the json method?

and calling the cell by name is way simpler also.
Thanks

Depends what the data is and how you are generating the web page.

JSON spec is here but that’ll be baffling to a beginner.

It might be something like this, for a bin collection calendar:

{'black_bin': '2024-01-24', 'blue_bin': '2024-02-03'}

You can then pull that in with a rest sensor and do something like:

rest:
  resource: URL
  sensor:
    - name: "Black bin date"
      value_template: "{{ value_json.get('black_bin', 'unavailable') }}"
    - name: "Blue bin date"
      value_template: "{{ value_json.get('blue_bin', 'unavailable') }}"
    - name: "Brown bin date"
      value_template: "{{ value_json.get('brown_bin', 'unavailable') }}"

Doesn’t matter which way around the bin dates are, or how many of them there are: it’ll pick up the appropriate one provided it’s in the data, and this example will return unavailable if not. That’s a trivial example: you can get much more complex data structures yet they’re easy to work with.

In comparison, the scrape sensor is like trying to use character recognition to read off a screenshot — particularly on public sites, where any change of styling and layout can screw up your sensors. It should be a last resort where there is no other method available.

I’m going to be a super-pedant and note that HTML name attribute is not the same as the id attribute I suggested, nor is it valid on a <td> element: