Scrape component and working with multiple tables

I am attempting to scrape a web page from my OilPal modem and capture a specific value.

The web page section I’m trying to parse is built with two tables.

Table 1 contains various modem info and Table 2 contains the sensor data which is where the data I need is.

I’m having issues capturing the value I need.

Below is the tables I’m working with.

I’ve tried several different “select:” parameters, ie. tr:nth-of-type(5) etc etc which isn’t capturing the correct result.

The result I’m trying to capture is the 69 00 7 [07 00 45 ] from the first row in the second table.

Any help with the correct CSS select would be great, thanks in advance.

<div id="content">

<h1>TEK608 Diagnostics</h1>
<p>This page displays the TEK608 Diagnostic Information.</p>

<table style="padding-left: 10px;">
<tr><td><b>Stack Version:</b></td><td>&amp;nbsp;</td><td>v5.42</td></tr>
<tr><td><b>Build Date:</b></td><td>&amp;nbsp;</td><td>Dec 11 2017 12:51:46</td></tr>
<tr><td><b>System RTC:</b></td><td>&amp;nbsp;</td><td>00:54 (248)</td></tr>
<tr><td><b>Firmware:</b></td><td>&amp;nbsp;</td><td>1.8</td></tr>
</table>

<BR><BR>

<table border="1">
<tr>
<th>Device #</th>
<th>RF Address</th>
<th>#Rx</th>
<th>Rx Time (h)</th>
<th>Data Aux Bat [Cache]</th>
<th>[FL Hi Lo Dif]</th>

</tr>

<tr><td>1</td><td>0x00000ea0</td><td> 10894</td><td>00:52(0)</td><td>69 00 7 [07 00 45 ]</td><td>00 00 00 10</td></tr>
<tr><td>2</td><td>0xffffffff</td><td> 0</td><td>00:00(0)</td><td>No Data</td><td>00 00 00 00</td></tr>
<tr><td>3</td><td>0xffffffff</td><td> 0</td><td>00:00(0)</td><td>No Data</td><td>00 00 00 00</td></tr>
<tr><td>4</td><td>0xffffffff</td><td> 0</td><td>00:00(0)</td><td>No Data</td><td>00 00 00 00</td></tr>
<tr><td>5</td><td>0xffffffff</td><td> 0</td><td>00:00(0)</td><td>No Data</td><td>00 00 00 00</td></tr>
<tr><td>6</td><td>0xffffffff</td><td> 0</td><td>00:00(0)</td><td>No Data</td><td>00 00 00 00</td></tr>
<tr><td>7</td><td>0xffffffff</td><td> 0</td><td>00:00(0)</td><td>No Data</td><td>00 00 00 00</td></tr>
<tr><td>8</td><td>0xffffffff</td><td> 0</td><td>00:00(0)</td><td>No Data</td><td>00 00 00 00</td></tr>

</table>

<br><br>

</div>

on top of my head you could try table:nth-of-type(2) td:nth-of-type(5) but I’m at work so can’t test for you

@lolouk44 thanks for this, it was exactly what I needed. I had been using tr which wasn’t returning what I wanted.

2 Likes

For all others having similar trouble, i used this helper website.
https://try.jsoup.org/
It can fetch your website and you can try your selector.

1 Like

Hi c0rnflake,

i have almost the same issue with scraping a value out of a table.

It’s one table with two columns and some rows - i’d like to get a value
in the second column and the 7th row. Using inspection and copy selector gives me
this:
‘’’’’
#showdata > table > tbody > tr:nth-child(7) > td:nth-child(2) > span
‘’’’’

Unfortunately i can’t show you the website because it’s just in my local network.

Could you please tell me how to get the value - as you already managed to get your
own value out of a table.

Might be a little late but this should to the trick
tr:nth-child(7) td:nth-child(2)

I also need a little help here.
From the page: http://vreme-podljubelj.zevs.si/, I would like to get the temperature.

My code is:

- platform: scrape
  resource: http://vreme-podljubelj.zevs.si
  name: Vreme Podljubelj C
  select: "body > table.MsoNormalTable > tbody > tr:nth-child(10) > td:nth-child(3) > p > span:nth-child(2)"

But my result is “unknown”, although when I use this site https://try.jsoup.org/ ,to check my input, and it normally returns a value.

Any ideas?

completely untested, but when I select the temperature I get this selector:
body > table.MsoNormalTable > tbody > tr:nth-child(5) > td:nth-child(3) > p > span:nth-child(3) > nobr

Or…

http://vreme-podljubelj.zevs.si/realtime.txt

The data is actually here.

So you would just use a REST sensor, and use split to get the value.

 - platform: rest
   resource: http://vreme-podljubelj.zevs.si/realtime.txt
   value_template: "{{ value.split(' ')[2] | float(0) }}"
   name: Vreme Podljubelj C

There are more temperatures on the page, that’s why you got a little different selector. Tested also with added " > nobr", still HA doesn’t output anything, while https://try.jsoup.org/ gives the correct result.

mobile.andrew.jones’s solution also works, still why doesn’t the scrape…

i’ve managed to successfully set up a bunch of scrapers but there is one website giving me hell and out of stubbornness i’ve refused to give up for a while… i just cant work it out. it shouldnt be so difficult.

Its MSI Global - The Leading Brand in High-end Gaming & Professional Creation and im just trying to scrape the latest vesion and release date


sensors look like this. ive tried what seems like 50 different iterations but this one has me stumped. Would appreciate it if any genius has an idea for me.