erik7
(Erik)
March 8, 2021, 10:51am
44
@danieldotnl Will you update the integration to solve the following warning/requirement?
No 'version' key in the manifest file for custom integration 'multiscrape'. This will not be allowed in a future version of Home Assistant. Please report this to the maintainer of 'multiscrape'
Thank you.
1 Like
Hello,
A little question.
Is it possible to maintain a HTTP livestream of a website in order to retrieve the data live and also update it?
Since my weather station offers a website where real-time data is played back, it would be cool if I could also use this.
I updated the opening post of this thread with an with an update on the new repository that’s now in the default HACS store. Please read this when you are using the multiscrape custom component!
Somehow I keep missing notifications from this thread, but the version has been added!
So you still do not plan adding scraping after logging into where required?
I’m actually looking into that, by popular request
1 Like
drogfild
(Drogfild)
May 21, 2021, 10:58am
50
Merging from my dev branch should not be that bad. It’s not up to date but async working just fine.
Biggest problem I have with it that it doesn’t allow logging to same page with multiple different credentials. So having multiple sensors for same page but with different credentials. It uses same session so it’s already logged in.
manjotsc
(Manjot Singh)
May 21, 2021, 9:59pm
51
Hi,
I need help with two sensors not working, value shows up as empty.
select: #acc ’ is not not working value ends up empty
<span id="acc">ON</span>
select: ‘#gpsSpeed ’ works
<span id="gpsSpeed">0</span>
select: ‘#driverName ’ works
<div class="col-sm-12 c-009934 fw-b" id="driverName">Uplander LT</div>
select: ‘#coordinate ’ is not not working value ends up empty
<span id="coordinate">-73.63061 / 45.52565</span>
<input type="hidden" id="c_latitude" value="45.52565">
<input type="hidden" id="c_longitude" value="-73.63061">
Does it work in chrome? You can try out the css selectors in the chrome console like:
$$("#acc")
manjotsc
(Manjot Singh)
May 23, 2021, 4:11am
54
This is how my config looks,
select: “#coordinate ” not working
select: “#gpsSpeed ” working
select: “#acc ” not working
select: “#gpsTime ” not working
select: “#driverName ” working
- platform: multiscrape
resource: https://example.com/
name: Uplander LT
scan_interval: 10
headers:
User-Agent: Mozilla/5.0
selectors:
uplander_lt_location:
name: Uplander LT Location
select: "#coordinate"
uplander_lt_speed:
name: Uplander LT Speed
select: "#gpsSpeed"
uplander_lt_status:
name: Uplander LT Status
select: "#acc"
uplander_lt_time:
name: Uplander LT Update Time
select: "#gpsTime"
uplander_lt_drivername:
name: Uplander LT Driver Name
select: "#driverName"
I need aswell help here
I thought i get it but i’m struggling for weeks now, i guess there are a few links missing in my brain
I have a local gateway with live weather data
How the heck do i get those plain htm lines out of this website
PLEAAAAAAAAAASE please help me out i’m totally stucked
This is a part of the livedata.htm page:
For example if i want to scrape the wind speed with the name avgwind i’m totally lost how to achieve the correct scrape configuration
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>LiveData</title>
<link href="axcss0.css" rel="stylesheet" type="text/css" />
</head>
<body>
<table width="800" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td colspan="2" align="right" bgcolor="#0088F7"> </td>
</tr>
<tr>
<td colspan="2" bgcolor="#FFFFFF"><table border="0" cellpadding="0" cellspacing="0">
<tr>
<td width="20" height = "80"> </td>
<td ><img src="img/1.jpg" width="74" height="80" ></td>
<td width="10"> </td>
<td class="txtstyle_1" >AmbientWeather 4.5.8</td>
</tr>
</table></td>
</tr>
<tr>
<td colspan="2" align="right" bgcolor="#60B7FF"> </td>
</tr>
<tr>
<td colspan="2" align="left" bgcolor="#C0C0C0">
<table width="20" border="0" cellpadding="0" cellspacing="0">
<tr>
<td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="bscsetting.htm">Local Network</a></div></td>
<td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="weather.htm">Weather Network</a></div></td>
<td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="station.htm">Station Settings</a></div></td>
<td bgcolor="#EDEFEF"><div class="menuitem_1"><a href="livedata.htm">Live Data</a></div></td>
<td bgcolor="#C0C0C0"><div class="menuitem_1"><a href="correction.htm">Calibration</a></div></td>
</tr>
</table>
</td>
</tr>
<form name="livedata" method="POST" onsubmit="return chkForm(0);">
<tr>
<td colspan="2" bgcolor="#EDEFEF"> </td>
</tr>
<tr>
<td colspan="2" bgcolor="#EDEFEF"><div class="subitem_1">Live Data</div></td>
</tr>
<tr>
<td width="448" bgcolor="#EDEFEF"><div class="item_1">Receiver Time:</div></td>
<td width="352" bgcolor="#EDEFEF">
<input name="CurrTime" disabled="disabled" type="text" class="item_2" style="WIDTH: 120px" value="19:07 5/27/2021" maxlength="16"/></td>
</tr>
<tr>
<td width="448" bgcolor="#EDEFEF"><div class="item_1">Indoor Sensor ID and Battery </div></td>
<td width="352" bgcolor="#EDEFEF">
<input name="IndoorID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0x--" maxlength="5" />
<input name="inBattSta" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="- -" maxlength="12" />
</td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Outdoor Sensor ID and Battery</div></td>
<td bgcolor="#EDEFEF">
<input name="Outdoor1ID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0xcb" maxlength="5" />
<input name="outBattSta1" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="Normal" maxlength="12" />
</td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Outdoor2 Sensor ID and Battery</div></td>
<td bgcolor="#EDEFEF">
<input name="Outdoor2ID" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="0x--" maxlength="5" />
<input name="outBattSta2" disabled="disabled" type="text" class="item_2" style="WIDTH: 100px" value="- -" maxlength="12" />
</td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Indoor Temperature</div></td>
<td bgcolor="#EDEFEF"><input name="inTemp" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="--.-" maxlength="5" /></td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Indoor Humidity</div></td>
<td bgcolor="#EDEFEF"><input name="inHumi" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="--" maxlength="3" /></td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Absolute Pressure </div></td>
<td bgcolor="#EDEFEF"><input name="AbsPress" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="----" maxlength="6" /></td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Relative Pressure </div></td>
<td bgcolor="#EDEFEF"><input name="RelPress" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="----" maxlength="6" /></td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Outdoor Temperature</div></td>
<td bgcolor="#EDEFEF"><input name="outTemp" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="13.9" maxlength="5" /></td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Outdoor Humidity </div></td>
<td bgcolor="#EDEFEF"><input name="outHumi" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="55" maxlength="3" /></td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Wind Direction </div></td>
<td bgcolor="#EDEFEF"><input name="windir" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="255" maxlength="5" /></td>
</tr>
<tr>
<td bgcolor="#EDEFEF"><div class="item_1">Wind Speed </div></td>
<td bgcolor="#EDEFEF"><input name="avgwind" disabled="disabled" type="text" class="item_2" style="WIDTH: 80px" value="11.2" maxlength="5" /></td>
</tr>
Small scraping guide:
Load the page in Chrome
Right-click the value you want to scrape
Choose ‘Inspect’
Right-click on the selected line in the html
Copy → Copy Selector
This is the value you should paste in the ‘select’ field of the sensor.
Note that sometimes Chrome adds add a ‘tbody’ element tables in the html which might not be in the original site’s html and therefore not retrieved by multiscrape. In this case, you need to remove those from the select field.
E.g.: table > tbody > tr:nth-child(7) > td:nth-child(2) > font > b
becomes:
table > tr:nth-child(7) > td:nth-child(2) > font > b
2 Likes
Troon
(Troon)
May 28, 2021, 12:58pm
57
They might be in the original, if the site was properly-written. You just have to check manually.
Yup thanks, slightly updated the guide.
@everybody : feel free to contribute with some nice screencasts
1 Like
Has anyone been testing pre-release 4.0.0? I would like to receive some feedback before releasing it as a stable release.
kongo09
(kongo09)
June 6, 2021, 10:29pm
60
I’m a bit lost with the configuration of this integration.
Goal is to create 1 sensor with four attributes capturing the toner level of my printer. I tried:
# Dell Printer
multiscrape:
- resource: "http://printer/status.asp"
scan_interval: 600
sensor:
- name: DELL Printer Toner Level
selectors:
cyan:
name: Cyan
select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(2) > td:nth-child(1) > b'
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
magenta:
name: Magenta
select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(4) > td:nth-child(1) > b'
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
yellow:
name: Yellow
select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(6) > td:nth-child(1) > b'
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
black:
name: Black
select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(8)) > td:nth-child(1) > b'
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
but this only results in:
Error while setting up multiscrape platform for sensor
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 250, in _async_setup_platform
await asyncio.shield(task)
File "/config/custom_components/multiscrape/sensor.py", line 34, in async_setup_platform
if rest.data is None:
UnboundLocalError: local variable 'rest' referenced before assignment
How can I do that?
Hi, did you check the upgrade notes ?
It should be like this (it will create different sensors though instead of attributes, that’s the goal of this component):
multiscrape:
- resource: "http://printer/status.asp"
scan_interval: 600
sensor:
- name: cyan
select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(2) > td:nth-child(1) > b'
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
- name: magenta
select: 'body > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table > tbody > tr:nth-child(4) > td:nth-child(1) > b'
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
kongo09
(kongo09)
June 7, 2021, 9:11pm
63
Thanks for your support. Here is the working code:
# Dell Printer
multiscrape:
- resource: "http://192.168.0.20/status.asp"
scan_interval: 10
sensor:
- name: Cyan
select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(2) > td:nth-child(1) > b"
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
- name: Magenta
select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(4) > td:nth-child(1) > b"
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
- name: Yellow
select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(6) > td:nth-child(1) > b"
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
- name: Black
select: "body > table > tr > td > table:nth-child(6) > tr > td > table > tr:nth-child(8) > td:nth-child(1) > b"
value_template: '{{ value|regex_findall_index(find="(\d+\%)", index=0, ignorecase=False) }}'
However, this still does not do what I want. It now creates four separate sensors but I was under the impression that I can use the component to make the values available as attributes on one sensor instead of having four sensors. Or did I misunderstand?
1 Like
Troon
(Troon)
June 8, 2021, 6:19am
64
The main aim of multiscrape
is to make multiple sensors available from a single REST call, rather than having to make multiple requests. If you want a single sensor with attributes, you can create a template sensor from your four separate ones, but that seems a bit pointless.
template:
- sensor:
- name: Dell printer inks
state: 'dummy state'
attributes:
cyan: "{{ states('sensor.cyan') }}"
magenta: "{{ states('sensor.magenta') }}"
yellow: "{{ states('sensor.yellow') }}"
black: "{{ states('sensor.black') }}"
Why would you prefer a single sensor with attributes over four individual ones, though, which can then have unit_of_measurement
and are easily put into Lovelace?
kongo09
(kongo09)
June 8, 2021, 6:23am
65
Understood, thanks for clarifying. It might help to make your description clearer, at least I was confused. If you like, I can suggest a pull request.
I was looking for a combined sensor to keep things closer together that belong together and to avoid flooding of my entity space.
1 Like