Getting Data from HTML Website

dietlman · February 25, 2022, 7:56pm

Dear Community!

I would need some help getting data from my Temperature/Humidity device which can be displayed on a webpage. I thought it would be possible using scrape but have no clue where and how to start with.
Maybe somebody can help me with this, below I have a screenshot of the HTML content, I would like to get Temperature and Humidity, Airpressure is not needed.
Will be happy for a nice solution.

Thanks a lot!

AdmiralStipe · February 26, 2022, 7:51am

Not sure if this will work, but try right clicking on the row, where your temperature value is (13.9°C), then select Copy → Copy selector.
Paste that value to the sensor under “select” and that should do it. Repeat for humidity.
Smthg like this:

sensor:
  - platform: scrape
    name: Cellar Temperature 
    resource: http://77.119.243.51:86/smart
    select: "#mv0"
  - platform: scrape
    name: Cellar Humidity 
    resource: http://77.119.243.51:86/smart
    select: "#mv1"

Beware, that this will be string, not a number, as the selected value (13.9°C) is a string, and you won’t be able to do further calculations or whatever.

dietlman · February 26, 2022, 11:05am

Thanks for your answer, I have tried your config, but getting an error
Logger: homeassistant.components.rest.data
Source: components/rest/data.py:74
Integration: RESTful (documentation, issues)
First occurred: 12:02:33 (6 occurrences)
Last logged: 12:02:54

Error fetching data: http://192.168.1.1:86/smart failed with
Error fetching data: http://192.168.1.1:86 failed with illegal chunk header: bytearray(b’F9 \r\n’)

Not sure what it means, but the sensor is not showing up.

AdmiralStipe · February 26, 2022, 12:02pm

Sorry, no idea. My understanding is, that your server isn’t creating a “valid” header inside html code, but I don’t know how to deal with it.

dietlman · February 27, 2022, 7:06am

ok thanks, no worries, maybe someone else can help with this error.

dietlman · November 7, 2022, 5:47pm

maybe someone else has an idea how to fix this?
Thanks

Hellis81 · November 7, 2022, 5:56pm

Did you actually do what admiral said or did you just copy the config he posted as a pseudo example?

dietlman · November 7, 2022, 6:01pm

I have done what admiral posted above, but I am getting an error, so I thought this is probably the wrong approach. I am still not even sure if scrape is the proper way to get what I want.
I have also tried this config, but didn#t work either:

platform: scrape
resource: “http://192.168.1.1:86”
name: Humidity Cellar
select: “td”
index: 8 #experiment with this number to find the correct value
value_template: ‘{{ ((value.split(" “)[0]) | replace (”,", “.”)) }}’
unit_of_measurement: “%”
platform: scrape
resource: “http://192.168.1.1:86”
name: Air Pressure Cellar
select: “td”
index: 8 #experiment with this number to find the correct value
value_template: ‘{{ ((value.split(" “)[0]) | replace (”,", “.”)) }}’

Hellis81 · November 7, 2022, 6:06pm

Perhaps you can use regex to capture the values?

dietlman · November 7, 2022, 6:08pm

do you think you could give me an example on how to do/use regex on my case above?
thx

Hellis81 · November 7, 2022, 6:11pm

Sure but not from an image of html.

dietlman · November 7, 2022, 6:13pm

I have put in a port forwarding, so you should be able to connect to the sensor using this link:
http://192.168.1.1:86/smart
you could also use this link:

Hellis81 · November 7, 2022, 6:14pm

Dude… no delete that!

Just paste the html here in a code block.

Although most people won’t do anything posting links to your home IP with open ports is not a good idea.

Oh… i see now that is the same url you posted earlier. Still just paste the html here

dietlman · November 7, 2022, 6:27pm

ok got it thanks,
I tried to copy the HTML code but can’t select and use copy in the window you see below. Sorry but I don’t know how to copy that content as text as I am not even available to select it in this window.


if (self!= top)
{	alert('Um diese Seite auf Ihrem Smartphone zu laden \ngeben Sie als URL http://ip-adresse/smart ein \n\nTo display this page on your Smart Phone,\nplease enter as URL http://ip-address/smart');
}
var password = '';
var interval = 5000;
var CommandString = new Array();
mval = ['','','','','','','',''];
mtag = ['<a href="info.htm?Ili=0&amp;Ref=smart.htm?Vpos=39167&amp;DTb=0&amp;">Temperature Cellar</a>','<a href="info.htm?Ili=1&amp;Ref=smart.htm?Vpos=39167&amp;DTb=0&amp;">Humidity Cellar</a>','<a href="info.htm?Ili=2&amp;Ref=smart.htm?Vpos=39167&amp;DTb=0&amp;">Airpressure Cellar</a>','','','','',''];
var maxm = 0;

function Commandloop()
{	DataRequest(CommandString[0]);
	maintimer = setTimeout("Commandloop()", interval);
}

function DataRequest(sendstring)
{	var xmlHttp = window.ActiveXObject ? new ActiveXObject("Microsoft.XMLHTTP") : xmlHttp = new XMLHttpRequest();
	if (xmlHttp)
	{	xmlHttp.onreadystatechange = function()
		{	if (xmlHttp.readyState == 4)
			{	if (xmlHttp.status == 200)
				{	updateDisplay(xmlHttp.responseText)
					xmlHttp=null;
				}
			}
		}
		xmlHttp.open("GET", sendstring, true);
		xmlHttp.setRequestHeader("Connection", "close");
		xmlHttp.setRequestHeader("If-Modified-Since", "Thu, 1 Jan 1970 00:00:00 GMT");
		xmlHttp.send(null);
	}
}

function updateDisplay(ReceiveString)
{	var ReceiveData = ReceiveString.split(";");
	for (i=0;i<maxm;i++)
    { document.getElementById('mv'+i).innerHTML = ReceiveData[ReceiveData.length-maxm+i];

    }
}

function createDisplay()
{	for (i=0;i<mtag.length;i++)
	{	if (mtag[i]!='')
		{	maxm++;
		}
	}
	for (var i = 0; i < maxm; i++)
	{	var singletag = mtag[i].split('>');
		mtag[i] = singletag[1].substring(0, singletag[1].length - 3)
	}
	document.body.innerHTML+='<div class="ds" id="dn"> Cellar </div>';
	for (i = 0; i < maxm; i++)
	{	document.body.innerHTML += '<div class="ds">'+mtag[i]+'</div>';
		document.body.innerHTML += '<div id="mv'+i+'" class="val iov">'+mval[i]+'</div>';
	}
	CommandString = ['single'];
	Commandloop();
}

Hellis81 · November 7, 2022, 7:28pm

If that code block is the code then that is the issue.
If there is javascript in a webpage then a scrape tool won’t work since the values isn’t there but are placed there by JavaScript and that can’t be fetched with scrape tools.

dietlman · November 7, 2022, 7:31pm

ok that makes sense, because I am quite sure that there is javascript in that webpage. Does that mean there is no way to read data form that site?

Hellis81 · November 7, 2022, 7:38pm

There are ways but none that I have ever managed to get working in any programming language that I know.
You need a headless scraper.

They probably do work but I just never tried it enough or spent enough time

dietlman · November 7, 2022, 7:40pm

ok but thanks for trying and for your time. Guess I will just buy a new sensor for that purpose.