Help with Scrape Website

I would need some help with scrape integration.
Trying to get value from a website which shows my solar energy statistics.
Here is the public link to the website:
https://www.solarweb.com/Home/GuestLogOn?pvSystemId=3a6f1c9d-41ac-4f87-bff2-32fb87d701d4

I am trying to get the 4 different values shown on this site:
image

I know there is an integration for Fronius and I already have that integration in my HA, but for me it is not really working as I have two inverters (different model) and I am unable to calculate proper data with that integration.
So that is why I would like to get that data from that website using scrape.
First of all I am not even sure if this is possible to do using scrape and the content of that site seems to be a bit complex.
Maybe someone could have a look at this and let me know how to start with.

Have a look here GitHub - drc38/Fronius_solarweb: Home Assistant integration for cloud-based Fronius Solar.web api

That’s not a scrape job, as the webpage is dynamically-generated. The data is coming across as a JSON response from this URL:

https://www.solarweb.com/ActualData/GetCompareDataForPvSystem?pvSystemId=b16d8d79-566d-449d-b809-a5ee0164faf9&_=[UNIX_TIMESTAMP]

returning this to the web page:

{
  "IsOnline": true,
  "AllOnline": true,
  "P_Grid": -2.71,
  "P_Load": -1150.6899999999996,
  "P_Akku": -2397.7000000000003,
  "P_PV": 3551.1,
  "SOC": 29,
  "BatMode": 1,
  "Ohmpilots": [],
  "Wattpilots": [],
  "Consumers": [],
  "Generators": []
}

but there appears to be some sort of authentication going on as I can’t access that URL directly — it takes me to a login page.

Suggest you investigate setting up a REST sensor (with the [UNIX_TIMESTAMP] replaced with {{ now()|as_timestamp|int(0) }} in the resource_template definition):

@farmio this integration looks great, but the thing is that I don’t have an API key, fronius doesn’t provide that anymore to private customers, so I have no clue where I could get a proper API from:

This looks good, I get all the values with that command, I have already added to my configuration.yaml like this:

#Solarweb Rest

and I think this is ok but I have no clue how to create tepmlates to get the values for “P_Grid” “P_Load” “P_Akku” “P_PV” and “P_SOC”
Maybe you could help with that as well.
Screenshot of the output I get please seee below:

- platform: rest
  resource_template: https://www.solarweb.com/ActualData/GetCompareDataForPvSystem?pvSystemId=b16d8d79-566d-449d-b809-a5ee0164faf9&_={{ now()|as_timestamp|int(0) }}
  value_template: "{{ value_json['SOC'] }}"
  json_attributes:
    - P_Grid
    - P_Load
    - P_Akku
    - P_PV

should give you a sensor with the SOC as the state, and the other values as attributes.

Thanks Troon, I have tried that but I am not able to see a Sensor called SOC after reboot.
I guess I am not smart enough to do this.
I would like to get a Sensor for each item:
P_Grid
P_Load
P_Akku
P_PV

I am completely confused, sorry for that.
This is what I see when I check the states:

Apologies: I missed the name property off the definition. It will be there somewhere (Developer Tools / States, search the Attribute column for Akku).

Use the RESTful integration then:

rest:
  - resource_template: https://www.solarweb.com/ActualData/GetCompareDataForPvSystem?pvSystemId=b16d8d79-566d-449d-b809-a5ee0164faf9&_={{ now()|as_timestamp|int(0) }}
    sensor:
      - name: "SOC"
        value_template: "{{ value_json['SOC'] }}"
        unit_of_measurement: "%"
        device_class: battery
      - name: "Grid power"
        value_template: "{{ value_json['P_Grid'] }}"
        unit_of_measurement: "W"

I’ve done the first two for you. That should give you sensor.soc and sensor.grid_power.

Ok, have added this to my YAML:

#Solarweb Rest 

  - platform: rest
    resource_template: https://www.solarweb.com/ActualData/GetCompareDataForPvSystem?pvSystemId=b16d8d79-566d-449d-b809-a5ee0164faf9&_={{ now()|as_timestamp|int(0) }}
    name: "Grid Power"
    value_template: "{{ value_json['P_Grid'] }}"
    unit_of_measurement: "W"

The sensor "Grid Power" is created but the value is "unknown" 
![image|561x500](upload://nEk4DbGnJd6CDRUGzvLD7avfDsN.jpeg)

Try this: at least then you’ll know whether the data is coming in correctly. Make sure you put the YAML under the correct headings: there must only be one sensor: and one template: at the top level:

sensor:
  - platform: rest
    resource_template: https://www.solarweb.com/ActualData/GetCompareDataForPvSystem?pvSystemId=b16d8d79-566d-449d-b809-a5ee0164faf9&_={{ now()|as_timestamp|int(0) }}
    name: "Solarweb response"

template:
  - sensor:
      - name: "Grid Power"
        state: "{{ (states('sensor.solarweb_response')|from_json)['P_Grid'] }}"
        unit_of_measurement: "W"

Once that’s working, sensor.solarweb_response should have the full JSON (provided it’s still under 255 characters), and sensor.grid_power should have the grid number in it.

If not, post a screenshot of the states page for each.

Ok so I have done this now,
This is what I have added in the section template:

template:

# Solarweb

  - sensor:
      - name: "Grid Power"
        state: "{{ (states('sensor.solarweb_response')|from_json)['P_Grid'] }}"
        unit_of_measurement: "W"


then this is what I have added to the section Sensor:

# Solarweb Sensor

  - platform: rest
    resource_template: https://www.solarweb.com/ActualData/GetCompareDataForPvSystem?pvSystemId=b16d8d79-566d-449d-b809-a5ee0164faf9&_={{ now()|as_timestamp|int(0) }}
    name: "Solarweb response"



So I made sure that there is only one Sensor: and one Template: as a header, however I have of course more other sensors and templates in this section, but I am sure that is ok.

The outcome is still “unavailable” on the “Grid Power” sensor, the state attributes are shown so the sensor was created, but the state is “unavailable”

For the Solarweb_response sensor, that one can’t be found in the system at all.
Not sure what to do, but when I put in the link in my browser:
https://www.solarweb.com/ActualData/GetCompareDataForPvSystem?pvSystemId=b16d8d79-566d-449d-b809-a5ee0164faf9&_=%7B%7B%20now()%7Cas_timestamp%7Cint(0)%20%7D%7D
then I still get all the results.
So it really behaves a bit strange…

At a guess, there’s some sort of authentication between your browser and the server: likely a cookie. The request from HA doesn’t have that. That’s going to be very hard for me to diagnose “remotely” I’m afraid.

All I can suggest is to have a look in the HA logs; and to use your browser’s Inspect tools to look at the request headers and response.

Understand this is very difficult for you. I could offer you a guest account with login to Solarweb if you let me know your email address then I could send you an invitation. Appreciate your effort

Sorry, I don’t have time to go into that level of depth. Time for you to learn about HTTP headers and cookies :wink: .

that is fair enough, I will try to dig into the topic with HTTP headers and cookies.
Thanks for your help anyway

1 Like