Scrape actual HTML

I’d like to display a table from a webpage in my HomeAssistant dashboard, which I was expecting to be as easy as scraping the table using the Scrape integration, and spitting the value out into a markdown card.

However, the Scrape integration appears to only return the text content of the elements I select. For example, if the table looks like this:

Company Contact Country
Alfreds Futterkiste Maria Anders Germany
Centro comercial Moctezuma Francisco Chang Mexico

I would like the sensor to contain the string:

<table>\n <tr>\n <th>Company</th>\n <th>Contact</th>\n <th>Country</th>\n </tr>\n <tr>\n <td>Alfreds Futterkiste</td>\n <td>Maria Anders</td>\n <td>Germany</td>\n </tr>\n <tr>\n <td>Centro comercial Moctezuma</td>\n <td>Francisco Chang</td>\n <td>Mexico</td>\n </tr>\n</table>

So far I’ve tried setting attribute to “innerHTML” but not only does that not work, apparently it’s not possible to remove an attribute once you configure it! You have to delete the sensor and create a new one. (you can modify the attribute field with other text but if you set it to blank it will keep the old value).

Has somebody else been able to do this? I have my doubts this is possible because setting the value to {{ value is string }} sets the state to True, so it seems like it scrapes out the text content and throws away the structure :frowning:

Just rebuild the structure again.

What exactly do you want to do with it? How would you display the html on your dashboard?
With multiscrape you could use select_list to retrieve a column. You could do this for each column, but it won’t provide you with the raw html.

Unfortunately for this table nearly every cell may contain spaces. For a table with r rows and c columns, I would have to make r × c scrape entities, each retrieving one specific cell of the table.

I do at least know the number of rows/columns, but if I grab the whole table, I have no way of parsing the content to replicate the original structure.

In the example I provided, the text returned from grabbing the whole table would look like this:

Company Contact Country — — — Alfreds Futterkiste Maria Anders Germany Centro comercial Moctezuma Francisco Chang Mexico

The Markdown card supports HTML tags (with exceptions). If I could grab the InnerHTML of the table, I could just put {{ states('sensor.that_table') }} in a Markdown card.

I was hoping it was possible to accomplish this using the built-in integration. Sounds like I’ll have to look at multiscrape.

1 Like

You should be able to do a select on td or table or maybe tr