the file link is https://aquarea-service.panasonic.com/installer/api/function/status
I added this code to the yaml configuration but not work
rest:
- authentication: basic
username: 'dimare.gabriele'
password: 'xxxx'
scan_interval: 30
resource: 'https://aquarea-service.panasonic.com/installer/api/function/status'
sensor:
- name: Panasonic Compressor
value_template: "{{ value_json['statusDataInfo']['function-status-text-060']['value'] }}"
Troon
(Troon)
February 22, 2023, 2:19pm
396
OK: whatâs shown in the logs? If thereâs nothing obvious there, replace the value_template
line with:
value_template: "{{ value[:100] }}"
which will give you the first 100 characters of the response, which might be helpful in debugging what to do.
Did you reload? Developer Tools / YAML / REST ENTITIES AND NOTIFY SERVICES.
the sensor returns this error:
{"errorCode":4194816,"message":[{"errorMessage":"System error occurred","errorCode":"9999-9999"}]}
which is the same error you get if you try to click on the link directly. https://aquarea-service.panasonic.com/installer/api/function/status
I think the login is not working
Hi web scrapers
Does anyone know if I can scrape the data from here:
https://oxontime.com/departure-single-page/1636
I can see the selectors in my browserâs developer tools but multi-scrape records errors in the log saying the selectors are missing.
Is there another way to get this data into HA?
Thanks in advance
Troon
(Troon)
March 8, 2023, 3:50pm
399
The page is making a request to this URL which is returning JSON. Use a rest
sensor to pull in and manipulate that:
https://oxontime.com/pwi/departureBoard/340000093BP3
Thanks @Troon , Iâll give it a go.
Hoberg
April 2, 2023, 12:49am
402
Hi Scrapers
Iâm pretty new to HA as well as .yaml and anything to do with the smart home community.
Iâve now tried for multiple hours to get this to work, but i just canât.
I have the problem with the authentication in regards to the scraper.
After many hours of trial and error iâve now found out that the website doesnât give me a session ID cookie, but instead incorporates it into the url with a query. Like the following
http://192.168.1.74/cgi-bin/overview.tcl?sid=xxxxxx7129387
Is there any way to bypass this or do anything about it?
If i just do it without the form_submit and just enter the full URL with sessionID it has no problem scraping
- resource: http://192.168.1.74/cgi-bin/overview.tcl
scan_interval: 60
log_response: true
form_submit:
submit_once: True
resource: http://192.168.1.74/cgi-bin/login_page.tcl
select: "#loginForm"
input:
user: !secret solar_user
pw: !secret solar_password
sensor:
- unique_id: solceller_prod_live
name: Solcelle produktion Live
select: "#curr_power"
- unique_id: solceller_prod_idag
name: Solcelle produktion i dag
select: "#prod_today"
on_error:
log: warning
Thanks in advance
Youâd be partly out of luck, because:
The site will probably set a cookie in the session. The sessions are reused between all scraping requests.
From Form submit functionality · danieldotnl/ha-multiscrape Wiki · GitHub .
But, what you can do, is to use RESTful Sensor - Home Assistant or RESTful - Home Assistant , with the former slightly the better option.
First, youâll need to figure out the details of the form submit. If itâs a POST to an URL, then thatâs what youâll use to configure it, but you also need to check what response you receive. Hopefully, your session token would be easily accessible. It will then be your sensorâs value.
If itâs more complicated, then write a Bash script using curl and use it in a Command Line - Home Assistant .
At this point you should have the token in a sensor, which means youâll drop the auth stuff from the scraperâs config and use the token sensor in a template in the request URL.
resource_template: "http://192.168.1.74/cgi-bin/overview.tcl?sid={{ states('sensor.my_token_sensor') }}"
Note that this will allow anybody that has access to your HA will have access to that token.
Lastly, you need to figure out for how long a session token is valid and set the with sensorâs scan interval appropriately or write an automation using homeassistant.update_entity
to update it on a specific schedule.
EDIT: Unless youâre using it for other cases where you want to retrieve multiple values in one go, you can actually just stick with the built-in Scrape - Home Assistant .
1 Like
Hoberg
April 2, 2023, 8:21pm
404
Thank you so much for the detailed answer!
I will try it as soon as possible!
EDIT:
I have now tried with a post request in postman and got it working and i get the data i want. However, i get the sid in a HTML body. And i canât get that to work in the Restful Sensor.
Is there any way to get this into restful, multiscraper or any other system to get that information out, into an URL and scrape the data afterwards?
Here is the return value from postman:
<html>
<head></head>
<body onLoad="top.window.location='/cgi-bin/frameset.tcl?sid=8351692669043402850'">Logger ind...</body>
</html>
1 Like
Shadorlo
(Shadorlo)
April 2, 2023, 10:15pm
405
Hello, I am looking to recover an energy value on an EDF ENR panel
Link to the login page : https://espaceclient.edfenr.com/
Source code of the text I want to retrieve :
<div class="truncate text-sm font-bold leading-4 mb-px">--</div>
Photo du panel
(Itâs marked âââ because itâs 00h00 and therefore no sun hihi)
For information when I enter my login I am directly redigirated on the right page to see the consumption
Troon
(Troon)
April 3, 2023, 6:31am
406
What does
http://192.168.1.74/cgi-bin/frameset.tcl?sid=8351692669043402850
return? Does that sid
change each time or is it constant?
1 Like
Hoberg
April 3, 2023, 12:58pm
407
Hi Troon
Thanks for your response!
The frameset.tcl
returns the default entry page. Inside the entry page is a iframe view of the overview.tcl
page that i want to scrape.
I can only get a sid
once. If i try again it gets denied, unless it hasnât been used for 10 minutes or 24 hours has passed since itâs been delivered.
Itâs a bit hard to simulate this since I donât have access to that endpoint of yours, but Iâll try to describe a few ideas.
Option 1:
Iâve saved your HTML snippet to a file called test.html
. If you want to go the command line sensor route, you can create a curl
command and process the output, e.g. like this:
cat test.html | sed -n "s/.*sid=\([0-9]*\).*/\1/p"
The part before the pipe |
will be the curl
command.
Option 2:
This should in principle also work with a REST sensor: Instead of using value_json
in your template, just use value
, which is the raw output. You can then use HAâs regex functions on it.
Something like this:
{% set value = "<html>
<head></head>
<body onLoad=\"top.window.location='/cgi-bin/frameset.tcl?sid=8351692669043402850'\">Logger ind...</body>
</html>" %}
{{ (value | regex_findall_index(find="sid=[0-9]+", index=0, ignorecase=True))[4:] }}
Option 3:
You might indeed be able to do this with scrape or multiscrape. Youâll need to use your browserâs dev tools to determine the CSS selector and then set the attribute (onLoad
). You will again have access to a value
in your template (for the scrape sensorâs value_template
) and should be able to do something similar with a regex as above.
I would personally go for option 2.
Hoberg
April 6, 2023, 11:10am
409
So after talks back and forth with @parautenbach we finally got the solution.
First of all, we tried to do it with Curl, but for some odd reason it just wouldnât give me an output. After about an hour of me ripping my hair out i decided to go back to REST and try with a GET method. Somehow i now responded, i think that was because of the scan_interval now being there (Earlier it had just come up with an error, probably due to it already overwritting it).
The code was now:
sensor:
- platform: rest
name: SID
resource: http://192.168.1.74/cgi-bin/handle_login.tcl?user=HA&pw=XXXXXXXX&submit=Log+p%C3%A5&sid=
method: GET
value_template: '{{ (value | regex_findall_index(find="sid=[0-9]+", index=0, ignorecase=True))[4:] }}'
scan_interval: 86400 # 24 hours = 60 * 60 * 24
And it responded only with the SID (Session ID).
Now onto Multiscrape.
It now needed the variable on the resource template and after @parautenbach helping me out yet again it worked as well.
- resource_template: "http://192.168.1.74/cgi-bin/overview.tcl?sid={{ states('sensor.sid') }}"
scan_interval: 60
name: Standard
log_response: true
sensor:
- unique_id: solceller_prod_live2
name: Solcelle produktion Live2
select: "#curr_power"
- unique_id: solceller_prod_idag2
name: Solcelle produktion i dag2
select: "#prod_today"
on_error:
log: warning
A huge thanks to @parautenbach who was just amazing and dedicated time to help me solve my problem.
What an amazing community to enter, for a newcomer who just installed HA 10 days ago.
2 Likes
jayjay
(JJ)
April 17, 2023, 11:47pm
410
I lost all my values and am totally stuck, my old SMA inverter is only reachable via scrape no other SMA integration works, alas now the scrape did stop too, what am I missing how to transform this old scrape to the new one.
Webscrapping for SMA Inverter values
- name: pv_einheit
platform: scrape
resource: "https://www.sunnyportal.com/Templates/PublicPage.aspx?page=my page"
select: "#ctl00_ContentPlaceHolder1_PublicPagePlaceholder1_PageUserControl_ctl00_PublicPageLoadFixPage_energyYieldWidget_energyYieldUnit"
- name: pv_periode
platform: scrape
resource: "https://www.sunnyportal.com/Templates/PublicPage.aspx?page=my page"
select: "#ctl00_ContentPlaceHolder1_PublicPagePlaceholder1_PageUserControl_ctl00_PublicPageLoadFixPage_energyYieldWidget_energyYieldPeriodTitle"
- name: pv_wert
platform: scrape
resource: "https://www.sunnyportal.com/Templates/PublicPage.aspx?page=my page"
select: "#ctl00_ContentPlaceHolder1_PublicPagePlaceholder1_PageUserControl_ctl00_PublicPageLoadFixPage_energyYieldWidget_energyYieldValue"
value_template: '{% if is_state("sensor.pv_einheit", "Wh") %}{{ value | float / 1000 }}{% else %}{{ value | float }}{% endif %}'
unit_of_measurement: 'kW/h'
Troon
(Troon)
April 18, 2023, 6:23am
411
Assuming the web page hasnât changed and the old select
s worked, then according to the docs this should work:
scrape:
- resource: "https://www.sunnyportal.com/Templates/PublicPage.aspx?page=my page"
sensor:
- name: pv_einheit
select: "#ctl00_ContentPlaceHolder1_PublicPagePlaceholder1_PageUserControl_ctl00_PublicPageLoadFixPage_energyYieldWidget_energyYieldUnit"
- name: pv_periode
select: "#ctl00_ContentPlaceHolder1_PublicPagePlaceholder1_PageUserControl_ctl00_PublicPageLoadFixPage_energyYieldWidget_energyYieldPeriodTitle"
- name: pv_wert
select: "#ctl00_ContentPlaceHolder1_PublicPagePlaceholder1_PageUserControl_ctl00_PublicPageLoadFixPage_energyYieldWidget_energyYieldValue"
value_template: '{% if is_state("sensor.pv_einheit", "Wh") %}{{ value|float(0)/1000 }}{% else %}{{ value|float(0) }}{% endif %}'
unit_of_measurement: 'kWh'
Iâve corrected your (incorrectly-indented) unit_of_measurement
to 'kWh'
as 'kW/h'
is meaningless.
Thereâs a potential race condition here: the pv_wert
template assumes that the pv_einheit
scrape is evaluated first. That assumption might need testing.
jayjay
(JJ)
April 18, 2023, 8:54am
412
Thank you I give it a try!
dimmuboy
(Dimmu Boy)
May 11, 2023, 9:19pm
413
Guys, do you know if is possible to get all div siblings, or for example first 10 at least from site like this?
div.container-table.hlasenie-block > .table-row
I have more similar websites where are some warnings and itâs dynamic, I donât know how much can shows on website. I really donât know how can I handle this type of scrapping. Should I try add first 10 selectors to attribute of sensor or I need to do that via python script?
jaymu
May 12, 2023, 2:43pm
414
I have the exact same page as you for my solar panels. This is my page_soup.txt:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css">
.in_body
{
margin-top:0px;
margin-left:0px;
margin-right:0px;
margin-bottom:0px;
background-color:transparent;
}
.div_c
{
margin-left:50px;
margin-right:50px;
margin-top:50px;
margin-bottom:50px;
}
.cu
{
cursor:pointer;
}
.b
{
font-weight:bold;
}
.lab_5
{
font-size:16px;
color:#666666;
margin-left:-20px;
}
.lab_l2
{
float:left;
width:32%;
color:#666666;
margin-bottom:-2px;
font-size:14px;
}
.lab_r2
{
float:left;
width:68%;
color:#666666;
text-align:right;
font-size:14px;
}
.cl
{
clear:left;
}
.line
{
height:1px;
background-color:#666666;
width:100%;
margin-top:5px;
margin-bottom:5px;
}
.sp_5
{
height:5px;
width:500px;
}
.sp_20
{
height:20px;
width:500px;
}
.label
{
float:left;
width:50%;
color:#666666;
margin-bottom:-2px;
font-size:14px;
}
.lab_r
{
float:left;
width:50%;
color:#666666;
text-align:right;
font-size:14px;
}
.lab_l
{
float:left;
width:40%;
color:#666666;
margin-bottom:-2px;
margin-left:10%;
font-size:14px;
}
.line_l
{
height:1px;
background-color:#666666;
width:450px;
margin-top:5px;
margin-bottom:5px;
margin-left:50px;
}
.sub
{
display:inline-block;
width:16px;
text-align:center;
}
</style>
<script type="text/javascript">
var height=0;function fileText(id,value){if(document.getElementById(id)){document.getElementById(id).innerHTML=value}}function changeFont(){reCon("main_div").style.fontFamily=window.parent.reFont()}function child_getH(){var nh=document.body.offsetHeight+100;if(nh<500||nh==null){nh=500}if(height!=nh){height=nh;window.parent.child_height(height)}}function reCon(id){return document.getElementById(id)}function ready(){try{window.parent.show_ifr()}catch(e){}child_getH()}function show(v){var c=document.getElementById(v);if(c!=null){c.style.display=""}}function hide(v){var c=document.getElementById(v);if(c!=null){c.style.display="none"}};
</script>
<script type="text/javascript">
var webdata_sn = "";
var webdata_msvn = "V1.18.00";
var webdata_ssvn = "V1.18.00";
var webdata_pv_type = "";
var webdata_rate_p = "";
var webdata_now_p = "1441";
var webdata_today_e = "6.72";
var webdata_total_e = "10851.4";
var webdata_alarm = "";
var webdata_utime = "0";
var cover_mid = "";
var cover_ver = "";
var cover_wmode = "APSTA";
var cover_ap_ssid = "";
var cover_ap_ip = "";
var cover_ap_mac = "";
var cover_sta_ssid = "";
var cover_sta_rssi = "70%";
var cover_sta_ip = "";
var cover_sta_mac = "";
var status_a = "1";
var status_b = "0";
var status_c = "0";
function initPageText(){var list=window.parent.reList("status");fileText("st1",list["t1"]);fileText("st2",list["t2"]);fileText("st3",list["t3"]);for(var i=1;i<=27;i++){if(i!=14){fileText("tx"+i,list[i])}}changeFont();child_getH()}function upfold(v){if(document.getElementById("up_"+v+"_div").style.display=="none"){show("up_"+v+"_div");reCon("p_"+v).innerHTML="-"}else{hide("up_"+v+"_div");reCon("p_"+v).innerHTML="+"}}function init_main_page(){var on=window.parent.reTip("1");var off=window.parent.reTip("2");document.getElementById("cover_mid").innerHTML=cover_mid;document.getElementById("cover_ver").innerHTML=cover_ver;document.getElementById("cover_ap_status").innerHTML=off;document.getElementById("cover_sta_status").innerHTML=off;if(cover_wmode!="STA"){document.getElementById("cover_ap_status").innerHTML=on;document.getElementById("cover_ap_ssid").innerHTML=cover_ap_ssid;document.getElementById("cover_ap_ip").innerHTML=cover_ap_ip;document.getElementById("cover_ap_mac").innerHTML=cover_ap_mac}if(cover_wmode!="AP"){document.getElementById("cover_sta_status").innerHTML=on;document.getElementById("cover_sta_ssid").innerHTML=cover_sta_ssid;document.getElementById("cover_sta_rssi").innerHTML=cover_sta_rssi;document.getElementById("cover_sta_ip").innerHTML=cover_sta_ip;document.getElementById("cover_sta_mac").innerHTML=cover_sta_mac}if(webdata_sn==""){webdata_sn="---"}fileText("webdata_sn",webdata_sn);if(webdata_msvn==""){webdata_msvn="---"}fileText("webdata_msvn",webdata_msvn);if(webdata_ssvn==""){webdata_ssvn="---"}fileText("webdata_ssvn",webdata_ssvn);if(webdata_pv_type==""){webdata_pv_type="---"}fileText("webdata_pv_type",webdata_pv_type);if(webdata_rate_p==""){webdata_rate_p="---"}fileText("webdata_rate_p",webdata_rate_p+" W");if(webdata_now_p==""||webdata_now_p==0){webdata_now_p="---"}fileText("webdata_now_p",webdata_now_p+" W");if(webdata_today_e==""){webdata_today_e="---"}fileText("webdata_today_e",webdata_today_e+" kWh");if(webdata_total_e==""){webdata_total_e="---"}fileText("webdata_total_e",webdata_total_e+" kWh");if(webdata_alarm==""){webdata_alarm="---"}fileText("webdata_alarm",webdata_alarm);if(webdata_utime==""){if(document.getElementById("webdata_sn").innerHTML=="---"){webdata_utime="---"}else{webdata_utime=value+window.parent.reTip("5")}}fileText("webdata_utime",webdata_utime);var st_en=window.parent.reTip("3");var st_dis=window.parent.reTip("4");var st_un=window.parent.reTip("41");if(status_a=="1"){document.getElementById("cover_remote_status_a").innerHTML=st_en}else{if(status_a=="0"){document.getElementById("cover_remote_status_a").innerHTML=st_dis}else{document.getElementById("cover_remote_status_a").innerHTML=st_un}}if(status_b=="1"){document.getElementById("cover_remote_status_b").innerHTML=st_en}else{if(status_b=="0"){document.getElementById("cover_remote_status_b").innerHTML=st_dis}else{document.getElementById("cover_remote_status_b").innerHTML=st_un}}};
</script>
</head>
<body class="in_body" onload="init_main_page();">
<div class="div_c" id="main_div">
<div class="lab_5 cu b" onclick="upfold(1);child_getH();"><span class="sub" id="p_1">-</span><span id="st1" style="margin-left:3px"></span></div>
<div class="sp_5"></div>
<div id="up_1_div">
<div class="lab_l2" id="tx1"></div>
<div class="lab_r2" id="webdata_sn"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx2"></div>
<div class="lab_r2" id="webdata_msvn"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx3"></div>
<div class="lab_r2" id="webdata_ssvn"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx4"></div>
<div class="lab_r2" id="webdata_pv_type"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx5"></div>
<div class="lab_r2" id="webdata_rate_p"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx6" style="color:#666666;font-weight:bold;"></div>
<div class="lab_r2" id="webdata_now_p" style="color:#666666;font-weight:bold;"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx7" style="color:#666666;font-weight:bold;"></div>
<div class="lab_r2" id="webdata_today_e" style="color:#666666;font-weight:bold;"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx8" style="color:#666666;font-weight:bold;"></div>
<div class="lab_r2" id="webdata_total_e" style="color:#666666;font-weight:bold;"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx9" style="color:#666666;font-weight:bold;"></div>
<div class="lab_r2" id="webdata_alarm" style="color:#666666;font-weight:bold;"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l2" id="tx10" style="color:#666666;font-weight:bold;"></div>
<div class="lab_r2" id="webdata_utime" style="color:#666666;font-weight:bold;"></div>
<div class="cl"></div>
<div class="line"></div>
</div>
<div class="sp_20"></div>
<div class="lab_5 cu b" onclick="upfold(2);child_getH();"><span class="sub" id="p_2">+</span><span id="st2" style="margin-left:3px"></span></div>
<div class="sp_5"></div>
<div id="up_2_div" style="display:none">
<div class="label" id="tx11"></div>
<div class="lab_r" id="cover_mid"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="label" id="tx12"></div>
<div class="lab_r" id="cover_ver"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="label" id="tx13"></div>
<div class="lab_r" id="cover_ap_status" style="color:#666666;font-weight:bold;"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l" id="ap_ssid">SSID</div>
<div class="lab_r" id="cover_ap_ssid"></div>
<div class="cl"></div>
<div class="line_l"></div>
<div class="lab_l" id="tx15"></div>
<div class="lab_r" id="cover_ap_ip"></div>
<div class="cl"></div>
<div class="line_l"></div>
<div class="lab_l" id="tx16"></div>
<div class="lab_r" id="cover_ap_mac"></div>
<div class="cl"></div>
<div class="line_l"></div>
<div class="label" id="tx17"></div>
<div class="lab_r" id="cover_sta_status" style="color:#666666;font-weight:bold;"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="lab_l" id="tx18"></div>
<div class="lab_r" id="cover_sta_ssid"></div>
<div class="cl"></div>
<div class="line_l"></div>
<div class="lab_l" id="tx19"></div>
<div class="lab_r" id="cover_sta_rssi"></div>
<div class="cl"></div>
<div class="line_l"></div>
<div class="lab_l" id="tx20"></div>
<div class="lab_r" id="cover_sta_ip"></div>
<div class="cl"></div>
<div class="line_l"></div>
<div class="lab_l" id="tx21"></div>
<div class="lab_r" id="cover_sta_mac"></div>
<div class="cl"></div>
<div class="line_l"></div>
</div>
<div class="sp_20"></div>
<div class="lab_5 cu b" onclick="upfold(3);child_getH();"><span class="sub" id="p_3">+</span><span id="st3" style="margin-left:3px"></span></div>
<div class="sp_5"></div>
<div id="up_3_div" style="display:none">
<div class="label" id="tx25"></div>
<div class="lab_r" id="cover_remote_status_a"></div>
<div class="cl"></div>
<div class="line"></div>
<div class="label" id="tx26"></div>
<div class="lab_r" id="cover_remote_status_b"></div>
<div class="cl"></div>
<div class="line"></div>
</div>
</div>
<script type="text/javascript">
initPageText();
ready();
</script>
</body>
</html>
And Iâm using your sensor config, but Iâm still getting:
Scraper_noname_3 # Today Solar Generation # Unable to scrape data: Could not find a tag for given selector Consider using debug logging and log_response for further investigation.
Any pointers what is going on here or which selector I should be using?
Troon
(Troon)
May 12, 2023, 2:50pm
415
You only have one <script>
element ahead of the data, so use:
select: "body > script:nth-child(2)"
instead.