Scrape from iframe or Javascript - solved

Hi,
I have got a solar panel with a built in web interface. Within the index.html there is an iframe that shows the current power in div called #webdata_now_p, (e.g. 164W).

When I open the iframe page (status.html), the lines are empty but you can recognize the structure:

There is a javascript that contains the values, e.g.
var webdata_now_p ="164"

The value is accurate, the question is, how do I get it into Homeassistant?

The scrape and multiscrape sensors cannot fetch the CSS Selector from the index.html, because it is in an iframe.
And they also cannot fetch it from the status.html, because there it is only a javascript variable. :roll_eyes:

Do you see this value when you open the source of the page also?

yes, just within the iframe tag:

But the scrape integration does not find it on the index.html. I also unsuccessfully tried

#childpage:iframe #webdata_now_p

any other ideas? I managed to scrape the Webversion from the index.html, but that’s not within the iframe.

That value is probably set from the javascript, previous image was in head → javascript. Does the value exist there in the source? Make sure it’s the source you are reading and not the developer tools because the developer tools could be “live” where as source is the source

If that code you posted is in the HTML of status.html rather than dynamically-generated by the browser from scripts, you should be able to get it:

# my_domain/test.html:

<html>
<head>
<script type="text/javascript">
var dummy = "";
</script>
<script type="text/javascript">
var webdata_sn = "";
var webdata_now_p = "164";
var webdata_alarm = "";
</script>
</head>
<body>
<p>I rock!</p>
</body>
</html>
# my scrape sensor config
- resource: https://my_domain/test.html
  sensor:
    - name: "Scrape test"
      unique_id: 894cdf58-39d8-4a77-a9bb-ffeb535d7cb0
      select: "head script"
      index: 1
      unit_of_measurement: 'W'
      device_class: power
      value_template: >-
        {% set js = value.replace('\n','') %}
        {{ js|regex_replace('^.*now_p\ = \"','')|regex_replace('\".*$','') }}

EDIT: a slightly more compact value_template, assuming it’s always an integer number, would be:

      value_template: "{{ value|regex_findall('webdata_now_p = \"(\d+)\"')|first }}"
1 Like

So on the status.html source, the value exists only as a variable in the header’s javascript, but not in a div within the body tag.

On the index.html, the value does not exist in the page source. Only the code to load the child page.

if(child_page){child_page.window.initPageText()}}var sel=1;function opt_sel(v){getCon("op_"+sel).className="opt_no";getCon("op_"+v).className="opt_sel";sel=v;initDiv(v);show_help(v)}function super_opt(v,t){hide("help_res");var child=document.getElementById("child_page");child.style.display="none";if(t==5){show("help_5");for(var i=1;i<=4;i++){if(i==v){show("div_5_"+i);var cont=document.getElementById("op_5_"+v);if(cont){cont.className="opt_sel2"}}else{hide("div_5_"+i);var cont=document.getElementById("op_5_"+i);if(cont){cont.className="opt_no2"}}}switch(v){case 1:child.src=getUrl(51);break;case 2:child.src=getUrl(52);break;case 3:child.src=getUrl(53);break;case 4:child.src=getUrl(54);break}div_5_help(v)}else{show("help_7");for(var i=1;i<=2;i++){if(i==v){show("div_7_"+i);var cont=document.getElementById("op_7_"+v);
if(cont){cont.className="opt_sel2"}}else{hide("div_7_"+i);var cont=document.getElementById("op_7_"+i);if(cont){cont.className="opt_no2"}}}switch(v){case 1:child.src=getUrl(71);break;case 2:child.src=getUrl(72);break}div_7_help(v)}}function initDiv(v){hide("help_res");var child=document.getElementById("child_page");child.style.display="none";child.src=getUrl(v);if(v==2){div_2_help(1)}if(v==3){div_3_help(1)}if(v==5){div_5_help(2);show("op_5_sel");super_opt(2,5)}else{hide("op_5_sel")}if(v==7){div_7_help(1);show("op_7_sel");super_opt(1,7)}else{hide("op_7_sel")}}function getUrl(v){switch(v){case 1:return"status.html";break;case 2:return"wizard.html";break;case 3:return"wireless.html";break;case 4:return"cable.html";break;case 6:return"account.html";break;case 7:return"update.html";break;case 8:return"restart.html";break;case 9:return"reset.html";break;case 51:return"select.html";break;case 52:return"remote.html";break;case 53:return"port.html";break;case 54:return"wirepoint.html";break;case 71:return"update.html";break;case 72:return"invupdate.html";break}}function div_2_help(v){for(var i=1;i<=7;i++){if(i==v){show("help_2_"+i)}else{hide("help_2_"+i)}}}function div_3_help(v){for(var i=1;i<=2;i++){if(i==v){show("help_3_"+i)}else{hide("help_3_"+i)}}}function div_5_help(v){for(var i=1;i<=4;i++){if(i==v){show("help_5_"+i)}else{hide("help_5_"+i)}}}function div_7_help(v){for(var i=1;i<=2;i++){if(i==v){show("help_7_"+i)}else{hide("help_7_"+i)}}}function show(v){var c=document.getElementById(v);if(c!=null){c.style.display=""}}function hide(v){var c=document.getElementById(v);if(c!=null){c.style.display="none"}}function show_help(v){for(var i=1;i<=9;i++){var c=document.getElementById("help_"+i);if(c!=null){if(i==v){c.style.display=""}else{c.style.display="none"}}}}function child_height(v){var a=document.getElementById("child_page");if(a!=null){a.style.height=v+"px"}var b=document.getElementById("back_div");if(b!=null){b.style.height=v+"px"}var c=document.getElementById("menu_div");if(c!=null){c.style.height=v+"px"}var d=document.getElementById("help");if(d!=null){d.style.height=v+"px"}var e=document.getElementById("help_msg_div");if(e!=null){e.style.height=(v-39)+"px"}}function show_ifr(){document.getElementById("child_page").style.display=""}function rt(page,num){switch(slanV){case"CN":return main_cn[page][num];break;case"EN":return main_en[page][num];break}return main_en[page][num]}function reBtn(num){switch(slanV){case"CN":return btn_cn[num].t;break;case"EN":return btn_en[num].t;break}return btn_en[num].t}function reList(page){switch(slanV){case"CN":return main_cn[page];break;case"EN":return main_en[page];break}return main_en[page]}function reTip(num){switch(slanV){case"CN":return tip_cn[num];break;case"EN":return tip_en[num];break}return tip_en[num]}function alertTip(num){switch(slanV){case"CN":alert(tip_cn[num]);return;break;case"EN":alert(tip_en[num]);return;break}alert(tip_en[num])}function showtip2(c,d){var server_list=servertip_en;switch(slanV){case"CN":server_list=servertip_cn;break;case"EN":server_list=servertip_en;break}switch(c){case 3:alert(server_list["1"]+server_list["h"+(d+1)]+server_list["2"]);break;case 4:alert(server_list["3"]+server_list["h"+(d+1)]+server_list["4"]);break;case 5:alert(server_list["5"]+server_list["h"+(d+1)]+server_list["6"]);break;case 6:alert(server_list["7"]+server_list["h"+d]+server_list["8"]);break;case 7:alert(server_list["9"]);break}}function init_page(){document.getElementById("child_page").src="status.html"};
</script>

will try that, thanks!

So long as it is statically served, you can read it. The scrape integration doesn’t treat the <head> and the <body> differently — as you can see from my prior post.

OMG it works! You area genius! Thanks so much Troon!!!

1 Like

Last question on that topic: how do I get the scraper integration to scrape every 60 seconds?

Set the scan_interval: Scrape - Home Assistant

1 Like

alright, I’ll have to look into this. It’s not available on the GUI. Probably have to recreate the integration in the config.yaml

Your other option is to create an automation that calls the homeassistant.update_entity service every minute:

service: homeassistant.update_entity
target:
  entity_id: sensor.whatever
1 Like

I have exactly the same issue here.
I have exactly the same web interface, running on my local network. This time though, the inverter is branded “Sofar”.

I need to scrape the line:

var webdata_total_e = “23096.0”;

The difference with my requirement is that it’s April 2023, and if I understand things properly, the way that Sensors are defined has changed.

Other than installing Home Assistant onto my RPi 4 yesterday, and trying for most of yesterday and today to get this, I’m a complete HA noob.

I thought that what I needed to my configuration.yaml was:

scrape:
  - resource: http://10.1.1.62/index_cn.html
    username: admin
    password: admin
    timeout: 20
    scan_interval: 30
    sensor:
      - select: script:contains('webdata_today_e')
        value_template: "{{ value.split('=')[1].strip().replace(';', '') }}"
        name: "Sofar Energy Today (kWh)"
        unit_of_measurement: "kWh"
        state_class: total
        icon: mdi:solar-power

NOTE: As it’s only accessible via my home network, I have zero concern that the username and password is open for all to see.

So, the process I am using to set this is:
Modify configuration.yaml
Developer Tools

Check Configuration [OK]
RESTART
Quick reload [Reloading]

Settings

Devices & Services
Entities
Type “sofar” into the “Search Entities”, click on “Sofar Energy Today (kWh)”
The value flicks between “Unknown” and “Unavailable”

The logfile shows the following:
2023-04-21 16:19:56.476 WARNING (MainThread) [homeassistant.components.scrape.sensor] Index ‘0’ not found in sensor.sofar_energy_today_kwh

2023-04-21 16:19:56.479 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: ‘None’ has no attribute ‘split’ when rendering ‘{{ value.split(’=‘)[1].strip().replace(’;‘, ‘’) }}’

What am I doing wrong?