Sensor Scrape: Scraping a webpage of a local power inverter webpage

Hi!
I am trying to scrape the values of my power inverter (SofarSolar 8.8KTL-X), as the addon I found is not maintained any more.
I can acess a status page via html: http://192.168.1.120/status.html
Using a basicauth username and password I can get an html page with javascript variables:

<script type="text/javascript">
var webdata_sn = "SF4ES008XXXXXX  ";
var webdata_msvn = "V250";
var webdata_ssvn = "";
var webdata_pv_type = "SF4ES008";
var webdata_rate_p = "";
var webdata_now_p = "3640";
var webdata_today_e = "35.82";
var webdata_total_e = "376.0";
var webdata_alarm = "";
var webdata_utime = "0";
var cover_mid = "1747221067";
var cover_ver = "LSW3_14_FFFF_1.0.47";
var cover_wmode = "APSTA";
var cover_ap_ssid = "AP_1747221067";
var cover_ap_ip = "10.10.100.254";
var cover_ap_mac = "30:EA:E7:34:XXXX";
var cover_sta_ssid = "FRITZ!Box 6490 Cable";
var cover_sta_rssi = "70%";
var cover_sta_ip = "192.168.1.120";
var cover_sta_mac = "34:EA:E7:34:XXXX";
var status_a = "1";
var status_b = "0";
var status_c = "0";

Using scrape I can fetch the first values and throw away unneeded parts.

sensor:
  - platform: scrape
    resource: http://192.168.1.120/status.html
    name: wechselrichter_today
    authentication: basic
    username: XXX
    password: XXX
    select: "script"
    index: 1
    value_template: '{{ (( value.split(";")[6] ) | replace ("var webdata_today_e = ",""))  }}'

Delivers:

Only thing i am now struggeling with is removing the " from the values

Tested:

    value_template: '{{ (( value.split(";")[6] ) | replace ("var webdata_today_e = ","")) | replace (""", "") }}'

does of course not work…
I tried to change " and ’

    value_template: "{{ (( value.split(';')[6] ) | replace ('var webdata_today_e = ','')) | replace ('"', '') }}" 

but his gives a bad configuration…

can anyone help how to remove the " ?

You’re very close: you need to escape the double quote in your replace, otherwise it is parsed as the end of your template. You can add a |float to the end to cast it to a number:

    value_template: "{{ (value.split(';')[6])|replace('var webdata_today_e = ','')|replace('\"', '')|float }}"
2 Likes

I try to make integration with SOFAR by scrape also , my code as below but I cannot see entity. how should “power” be visible in HA?

#==================================
#   SOFAR
#==================================
sensor:
  - platform: scrape
    resource: http://192.168.1.xx/status.html
    name: power
    authentication: basic
    username: admin
    password: admin
    select: "script"
    index: 1
    value_template: "{{ (( value.split(';')[5] ) | replace ('var webdata_now_p =','' ) |replace('\"', '') |float) }}"
    scan_interval: 30
    unit_of_measurement: "W"
    

Please do a View Source on your page and save it to Pastebin, then paste the link here.

Hi folks,
I’m trying to scrape the current AC Power (P AC) of my old-fashioned Steca Inverter who provides data on a web site as follows:

document.write(“

Inverter

NameValueUnit
P DC 63.99W
U DC 342.50V
I DC 0.19A
U AC 232.32V
I AC 0.35A
F AC 50.02Hz
P AC 65.94W
”);

My sensor request in config.yaml looks like:
scrape:

As result I only get “unknown”.

Could anybody please give me some hints to scrape this data successfully?

Many thanks in advanced,
RaikertHA

document.write means that the webpage does not have information on it, and the web browser executes the javascript to add the data when the page is displayed. You can’t scrape the table, because the table doesn’t exist until the javascript has run.

You can use a Rest sensor on the Javascript though. Something like:

rest:
  - resource: http://STECA-Inverter-IP/gen.measurements.table.js
    sensor:
      - name: Steca P DC
        value_template: "{{ value|regex_findall('<td>P DC[^\-0-9]*([\-0-9\.]*)<')|first }}"
      - name: Steca U DC
        value_template: "{{ value|regex_findall('<td>U DC[^\-0-9]*([\-0-9\.]*)<')|first }}"

…and so on. The value_template says:

  • <td>P DC: start by finding <td>P DC (or any of the other labels)
  • [^\-0-9]*: look past any number of characters that are not a digit or a minus sign
  • ([\-0-9\.*]): look for and record any digits, decimal points or minus signs.
  • <: stop at the first <.

Add unit_of_measurement and device_class to each sensor as needed.

1 Like

Hi Troon,

thanks for your quick support. I added your string for P AC sensor as stated, but unfortunately there is a error message when checking my configuration.

snip from config.yaml:

rest:
#STECA-PV
  - resource: http://STECA-IP/gen.measurements.table.js
    sensor:
    - name: "Steca P AC"
      value_template: "{{ value|regex_findall('<td>P AC[^\-0-9]*([\-0-9\.]*)<')|first }}"
      unit_of_measurement: "W" 
      device_class: "energy"

Screenshot:

error message:
Error loading /config/configuration.yaml: while scanning a double-quoted scalar
in “/config/configuration.yaml”, line 21, column 23
found unknown escape character ‘-’
in “/config/configuration.yaml”, line 21, column 58

So it does. Works fine in the template editor. It passes the configuration check when written like this, avoiding the quoting problem:

rest:
  - resource: http://STECA-Inverter-IP/gen.measurements.table.js
    sensor:
      - name: Steca P DC
        value_template: >
          {{ value|regex_findall("<td>P DC[^\-0-9]*([\-0-9\.]*)<")|first }}
      - name: Steca U DC
        value_template: >
          {{ value|regex_findall("<td>U DC[^\-0-9]*([\-0-9\.]*)<")|first }}

When you are posting code, please format it properly: see here:

1 Like

Hey Troon,

thank you so much for your work…it works perfectly!

This problem has been solved.

May I ask you a (hopefully) final question to get out the daily production rate from this resource (document.getElementById(“labelValueId”).innerHTML = " 0.009kWh 17.01.2024";)?

var chartData =
{
"labels":
[
"00:00", "", "", "", "", "",
"01:00", "", "", "", "", "",
"02:00", "", "", "", "", "",
"03:00", "", "", "", "", "",
"04:00", "", "", "", "", "",
"05:00", "", "", "", "", "",
"06:00", "", "", "", "", "",
"07:00", "", "", "", "", "",
"08:00", "", "", "", "", "",
"09:00", "", "", "", "", "",
"10:00", "", "", "", "", "",
"11:00", "", "", "", "", "",
"12:00", "", "", "", "", "",
"13:00", "", "", "", "", "",
"14:00", "", "", "", "", "",
"15:00", "", "", "", "", "",
"16:00", "", "", "", "", "",
"17:00", "", "", "", "", "",
"18:00", "", "", "", "", "",
"19:00", "", "", "", "", "",
"20:00", "", "", "", "", "",
"21:00", "", "", "", "", "",
"22:00", "", "", "", "", "",
"23:00", "", "", "", "", ""
],
"datasets":
[
{
"strokeColor":" rgba(64,178,83,1.0)",
"data": [
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0]
}
]
}
var max = 3750;
var steps = 15;
var input = document.getElementById("inputId");
input.setAttribute("min",   "2023-12-18");
input.setAttribute("max",   "2024-01-17");
input.setAttribute("value", "2024-01-17");
document.getElementById("labelValueId").innerHTML = "   0.009kWh 17.01.2024";
document.getElementById("buttonPrevId").disabled  = false;
document.getElementById("buttonNextId").disabled  = true;
var myLine = new Chart(document.getElementById("canvasId")
.getContext("2d"))
.Line(chartData,
{
"pointDot": false,
"datasetFill": false,
"scaleOverride": true,
"scaleLabel": "<%=value%> W",
"scaleSteps": steps,
"scaleStartValue": 0,
"scaleStepWidth": Math.ceil(max / steps),
"scaleLineColor":" rgba(170,170,170,1.0)",
"scaleFontColor":" rgba(170,170,170,1.0)",
"scaleGridLineColor":" rgba(68,68,68,1.0)"});

Exactly the same principle:

{{ value|regex_findall("innerHTML[^\-0-9]*([\-0-9\.]*)k")|first }}

You can try these things out in Developer Tools / Template: you have to fake the value variable. I’ve cut chunks out just to keep the screenshot to a manageable size.

1 Like

Troon, you’re the best…works as requiered!

And thanks for the advice concerning the template testing functionality.

Best wishes,
RaikertHA