nat
(phoniclynx)
February 26, 2025, 12:53am
1
How can I grab a value out of a HTML page that is just a table? In the instructions they talk about using a CSS field, but the page I want to scrape has just a simple table. I want to be able to search that page every now and then and read the value of the that is associated with that row ().
I will show the table below:
<div class="tables-container">
<div class="table-wrapper">
<div class="table-container">
<h2>Decentralized Client Stats</h2>
<table>
<tr>
<td class="label">Shares Accepted:</td>
<td>0 (0 diff)</td>
</tr>
<tr>
<td class="label">Shares Rejected:</td>
<td>0 (0 diff)</td>
</tr>
<tr>
<td class="label">Status:</td>
<td><svg viewBox='0 0 100 100' role='img' style='width:1em;height:1em'><circle cx='50' cy='60' r='35' style='fill:lime' /></svg> Non-Pooled Mode</td>
</tr>
<tr>
<td class="label">Pool Host:</td>
<td>N/A</td>
</tr>
<tr>
<td class="label">Pool Tag:</td>
<td>"DATUM"</td>
</tr>
<tr>
<td class="label">Secondary/Miner Tag:</td>
<td>"DATUM"</td>
</tr>
<tr>
<td class="label">Pool Current MinDiff:</td>
<td>0</td>
</tr>
<tr>
<td class="label">Pool Pubkey:</td>
<td class="fixed-width">123456</td>
</tr>
</table>
</div>
etc etc etc
I want to monitor the “0” next to the Shares Accepted: field above so I can see when it changes. But I need to get it into HA first somehow?
Suggest to add the link if possible to share. If this is a js in the background then it will not work fafaik
nat
(phoniclynx)
February 26, 2025, 9:45am
3
I can’t send the link because it’s an internal server… Having said that is is just simple HTML with a head, body and table tr and td as the tags… the above is all there is in the page, basically… it’s not JS it’s an application front end that creates a simple HTML file
What I do/did is to use the dev view of a page (F12), select the element you need and right click to get a menu to copy the path
An example to test for you
Add a scrape, use this url Planetary Fact Sheet
And then the select : tr:nth-child(3) > td:nth-child(6)
Which will get you 6792
nat
(phoniclynx)
February 26, 2025, 10:12am
7
I don’t really understand how to do this? (kinda feel stoopid haha).
Is this the tool I’m looking for? :Scrape - Home Assistant
? you donot know how to add a scrape sensor?
It is under devices > add integration > scrape (is easier to update on the fly than the yaml one)
Scrape - Home Assistant
nat
(phoniclynx)
February 26, 2025, 10:30am
9
OK, so I learnt something, I didn’t know there was a non-version.
I set up what I thought was the one in the GUI and I got “unavailable”.
I ‘think’ there is a password it is looking for, but I can’t see if there is. and I’ll post the HTML here too just so you can see too:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Gateway Status</title>
<link rel="icon" type="image/x-icon" href="/assets/icons/favicon.ico">
<link rel="stylesheet" type="text/css" href="./assets/style.css">
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
.table-wrapper {
justify-content: center;
}
.table-container {
max-width: 800px;
}
table {
border-collapse: collapse;
}
td {
word-break: break-all;
}
.leading_zeros {
color: gray;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1><img src="/assets/icons/ki.svg" alt="(DATUM Logo)" style="vertical-align: text-top" width="28" height="33"> <span>GATEWAY</span></h1>
</div>
<div class="menu-container">
<a href="/" style="background-color: darkslategrey;">Status</a>
<a href="/clients">Clients</a>
<a href="/threads">Threads</a>
<a href="/coinbaser">Coinbaser</a>
</div>
</div>
<div class="tables-container">
<div class="table-wrapper">
<div class="table-container">
<h2>Client Stats</h2>
<table>
<tr>
<td class="label">Shares Accepted:</td>
<td>0 (0 diff)</td>
</tr>
<tr>
<td class="label">Shares Rejected:</td>
<td>0 (0 diff)</td>
</tr>
<tr>
<td class="label">Status:</td>
<td><svg viewBox='0 0 100 100' role='img' style='width:1em;height:1em'><circle cx='50' cy='60' r='35' style='fill:lime' /></svg> Non-Pooled Mode</td>
</tr>
<tr>
<td class="label">Pool Host:</td>
<td>N/A</td>
</tr>
<tr>
<td class="label">Pool Tag:</td>
<td>" on "</td>
</tr>
<tr>
<td class="label">Secondary/Miner Tag:</td>
<td>" User"</td>
</tr>
<tr>
<td class="label">Pool Current MinDiff:</td>
<td>0</td>
</tr>
<tr>
<td class="label">Pool Pubkey:</td>
<td class="fixed-width">f12345</td>
</tr>
</table>
</div>
<div class="table-container">
<h2> Server Info</h2>
<table>
<tr>
<td class="label">Active Threads:</td>
<td>1</td>
</tr>
<tr>
<td class="label">Total Connections:</td>
<td>1</td>
</tr>
<tr>
<td class="label">Total Work Subscriptions:</td>
<td>1</td>
</tr>
<tr>
<td class="label">Estimated Hashrate:</td>
<td>1 Th/sec</td>
</tr>
</table>
</div>
</div>
<div class="table-wrapper">
<div class="table-container" style="min-width: fit-content;">
<h2>Current Job</h2>
<table>
<tr>
<td class="label">Job ID:</td>
<td>5 (165) @ 5</td> <!-- Job ID (Index) @ Timestamp -->
</tr>
<tr>
<td class="label">Block Height:</td>
<td>1</td>
</tr>
<tr>
<td class="label">Block Value:</td>
<td>3.19542906</td>
</tr>
<tr>
<td class="label">Previous Block:</td>
<td class="fixed-width"><span class='leading_zeros'>0000000000000000000</span>11</td>
</tr>
<tr>
<td class="label">Block Target:</td>
<td class="fixed-width"><span class='leading_zeros'>0000000000000000000</span>11</td>
</tr>
<tr>
<td class="label">Witness Commitment:</td>
<td class="fixed-width">12345</td>
</tr>
<tr>
<td class="label">Block Difficulty:</td>
<td>5</td>
</tr>
<tr>
<td class="label">Version:</td>
<td>20000000 (5)</td>
</tr>
<tr>
<td class="label">Bits:</td>
<td>6</td>
</tr>
<tr>
<td class="label">Time:</td>
<td>Current: 5 / Min: 5</td>
</tr>
<tr>
<td class="label">Limits:</td>
<td>Size: 4, Weight: 4, SigOps: 80000</td>
</tr>
<tr>
<td class="label">Size:</td>
<td>4</td>
</tr>
<tr>
<td class="label">Weight:</td>
<td>4</td>
</tr>
<tr>
<td class="label">Sigops:</td>
<td>4</td>
</tr>
<tr>
<td class="label">Txn Count:</td>
<td>4</td>
</tr>
</table>
</div>
</div>
</div>
<p class="note">Note: This page does not automatically refresh</p>
</body>
</html>
First things first, did you get my example working. I of course have no clue how you get to your data with/out user, pwd, headers, etc.
From your picture
right mouse click on the 0 and then copy > selector…this will give you a path. If it is a simple table then you can just skip a lot of things (compare to my example where the select or shows
body > p:nth-child(5) > table > tbody > tr:nth-child(3) > td:nth-child(6)
but only the last two will do the trick
EDIT: with you this is likely : tr:nth-child(1) > td:nth-child(2)
nat
(phoniclynx)
February 26, 2025, 10:42am
11
Yes I got your example to work, sorry I didn’t let you know.
When I select the selector I get:
body > div.tables-container > div:nth-child(1) > div:nth-child(1) > table > tbody > tr:nth-child(1)
So I put this into a different browser, and I get the login page. This page is a login page by default and it looks like it runs a JS. This computer runs a stack of different related apps and is under a password to access the main page. I logged into the main page a while ago and then installed and opened the app, so I never got the password presented to me. But in FireFox I get the password request.
<!doctype html>
<html class="h-full min-h-full">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />
<meta name="theme-color" content="#000000" />
<meta name="robots" content="noindex, nofollow" />
<meta name="referrer" content="no-referrer" />
<link rel="apple-touch-icon" sizes="180x180" href="/favicon/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="/favicon/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/favicon/favicon-16x16.png">
<link rel="manifest" href="/site.webmanifest">
<title>Umbrel</title>
<script type="module" crossorigin src="/assets/index-05a380ad.js"></script>
<link rel="stylesheet" href="/assets/index-cb25441d.css">
</head>
<body style="background: black; color: white" class="h-full min-h-full">
<noscript>
<h1>umbrelOS</h1>
<p>You need to enable JavaScript to run this app.</p>
</noscript>
<div id="root" class="h-full min-h-full"></div>
</body>
</html>
This is I have no idea how to see if that’s what the scrape plugin is getting or if I’ve simply not configured it correctly.
This goes beyond what I can do on knowledge / support, the scrape sensor allow user/pwd to be added but no clue how it handles that, it may also be a header that is needed?
Can you get the pagecontent via CURL?
EDIT: try to use the authentication method in the scrape with user/pwd
nat
(phoniclynx)
February 26, 2025, 10:49am
13
Curl gives me:
curl http://umbrel.b.com:21000/ ✔
Found. Redirecting to http://umbrel.b.com:2000/?origin=host&app=datum&path=%2F%
and I don’t know how to get CURL to show me the next page
EDIT I tried putting the password in (dosn’t use a username) tried both basic and digest and didn’t change anything
Sorry, no clue how to get further, maybe someone else can chip in. Try to find a curl that works, this may help construct the scrape
I have no clue what umbrel does but can’t it possibly supply an output somewhere that you can query?
nat
(phoniclynx)
February 26, 2025, 11:26am
16
Not as far as I can see… whenever I go to that port it tries to authenticate you