Thanks, that’s an idea
I have been using HA Multiscrape for quite some time now. But now I am getting
Read from Hobolink # Updating failed with exception: Client error '403 Forbidden' for url 'https://www.hobolink.com/p/64a1f43abba604a54534b301a2663722' For more information check: https://httpstatuses.com/403
Is it the website that blocks my attempt to scrape? If I just open the site it’s ok.
Try adding a user agent in the request header like this:
headers:
User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0"
Works like a charm thanks!
This seems like a relative common issue, it might be worth to mention it in the wiki.
Hi Matteo,
if You need…
I just came across this custom component and wondered if it could help me to scrap the data from EnPhase server. I’m not very code conscious so not sure if this component can make it easier to achieve. I reached out a few weeks ago for assistance about the cURL and how to pull the data in to HA but never got any reply from anyone.
Here is my request for help, with the examples of where I got to. Do you think Multi Scrape could get the data in to sensors?
This is my first attempt at the config but I’m really just guessing and using snips of code from other cURL setups that I have:
multiscrape:
- resource: https://api.enphaseenergy.com/api/v4/systems/3793806/summary?key=51[redacted]
scan_interval: 3600 #EnPhase only get data every 60 mins.
headers:
Authorization: Bearer 1NiJ9[redacted]
sensor:
- name : "Ustou EnPhase system_id"
value_template: >
{%- for dict_item in value_json.dataList -%}
{%- if dict_item.key == "system_id"-%}
{{dict_item.value}}
{%- endif -%}
{%- endfor -%}
- name : "Ustou EnPhase Current Consumption Wh"
value_template: >
{%- for dict_item in value_json.dataList -%}
{%- if dict_item.key == "current_power"-%}
{{dict_item.value}}
{%- endif -%}
{%- endfor -%}
unit_of_measurement: 'Wh'
device_class: energy
- name : "Ustou EnPhase Energy Produced lifetime Wh"
value_template: >
{%- for dict_item in value_json.dataList -%}
{%- if dict_item.key == "energy_lifetime"-%}
{{dict_item.value}}
{%- endif -%}
{%- endfor -%}
unit_of_measurement: 'Wh'
device_class: energy
- name : "Ustou EnPhase Solar Production Today Wh"
value_template: >
{%- for dict_item in value_json.dataList -%}
{%- if dict_item.key == "energy_today"-%}
{{dict_item.value}}
{%- endif -%}
{%- endfor -%}
unit_of_measurement: 'kWh'
device_class: energy
- name : "Ustou EnPhase Number of Inverters"
value_template: >
{%- for dict_item in value_json.dataList -%}
{%- if dict_item.key == "modules"-%}
{{dict_item.value}}
{%- endif -%}
{%- endfor -%}
- name : "Ustou EnPhase Total Array Size (W)"
value_template: >
{%- for dict_item in value_json.dataList -%}
{%- if dict_item.key == "size_w"-%}
{{dict_item.value}}
{%- endif -%}
{%- endfor -%}
- name : "Ustou EnPhase System Status"
value_template: >
{%- for dict_item in value_json.dataList -%}
{%- if dict_item.key == "status"-%}
{{dict_item.value}}
{%- endif -%}
{%- endfor -%}
I’m not in a position to check your question deeper right now, but you need to redact your key and token ASAP.
Post edited to remove key and bearer.
Thanx for your concern. I already removed sections of each the token and key so no risk. But thank you anyway.
Maybe i need to be more specific in my question here about Multiscrape. Can it handle this data that I get back from my CURL and if so how would I construct the sensors in the multiscrape configuration? (System id has been changed for protection)
{
"system_id": 3793806,
"current_power": 0,
"energy_lifetime": 943,
"energy_today": 8,
"last_interval_end_at": 1672206300,
"last_report_at": 1672207041,
"modules": 0,
"operational_at": 1671720300,
"size_w": 0,
"source": "meter",
"status": "normal",
"summary_date": "2022-12-28"
}
I don’t think you need this integration at all. If you’re getting a JSON response, just use a REST sensor. It can do what you want. Perhaps start a new post. You’re welcome to tag me for help, if needed, but I think you can just follow the examples in the docs.
Looks like your value_templates are overly complicated. Doesn’t this work?
value_template: "{{ value_json['system_id'] }}"
Also note that the Bearer token will probably expire so you need to update your configuration each time that happens with a new token…
Hi all,
Is there someone who can help me to get the wright select?
My config looks like this:
- resource: https://mijn.essent.nl/mijn-contract/tarieven?contract=000250578938&address=0700596204&contract_id=4002108673
authentication: basic
username: !secret username
password: !secret password
scan_interval: 86400
log_response: true
sensor:
- unique_id: essent_kwh_prijs
name: "kWh prijs"
# select: "div > div:nth-child(2)"
select: "body > app-root > wl-jss-route > sc-placeholder > wl-row-container > div > section > wl-colset-mijn-363 > wl-experience-editor > article > div.col-12.col-lg-6.order-lg-1 > div > wl-show-tariffs-container > wl-experience-editor > wl-show-tariffs > wl-experience-editor > wl-show-tariffs-totals > wl-experience-editor > div > div > div.d-flex.justify-content-between.mb-1 > div.text-right.text-dark.text-nowrap"
unit_of_measurement: €
value_template: '{{ value.split("€ ")[1].split("/")[0]}}'
This is what I get when I use the Copy → Copy selector
body > app-root > wl-jss-route > sc-placeholder > wl-row-container > div > section > wl-colset-mijn-363 > wl-experience-editor > article > div.col-12.col-lg-6.order-lg-1 > div > wl-show-tariffs-container > wl-experience-editor > wl-show-tariffs > wl-experience-editor > wl-show-tariffs-totals > wl-experience-editor > div > div > div.d-flex.justify-content-between.mb-1 > div.text-right.text-dark.text-nowrap
Hope that someone can help me.
Thanx in advance!
Hi,
I am trying to scrape this page:
https://www.in-pocasi.cz/predpoved-pocasi/cz/praha/praha-324/
There are two tables/charts at the bottom of the page, which work similarly with tabs on top. In the first one (“Předpověď na další dny”), I can get different data by clicking the day of the week tabs.
Scraping the data from the first tab is fine – e.g. I can use the CSS selector: #h3_0_t
to get the first temperature. But if I click on the second tab, the selector for the corresponding data is the same.
I’ve looked in the log of multiscrape, and all the data is present there, it’s not downloaded dynamically, but it is located in different parts of a table . Is there some way of getting at it with multiscrape?
Is there any way to use index: in multiscrape (like in regular scrape)?
Best regards,
Michał
Hi Fellows!
I wanna make a sensor from NOAA. I’ve found the css element what i need. It contains a letter and a number. The letter is alone when there is no solar or geomagnetic event and the number (from 1 to 5) appear when something happend in the last 24 hours.
But i can scrap only the letter and the number is unreachable for me. Here is what i’ve done:
- name: SWPC Radio
resource: https://www.swpc.noaa.gov
scan_interval: 300
headers:
User-Agent: Firefox/10
sensor:
- unique_id: swpc-radio
name: SWPC Radio
select: "div.noaa_scale_bg_1:nth-child(2) > div:nth-child(1)"
I beleive i need a value_template but after sever hours of reeding and testing what i find in this topic and homeassistant’s template helper i can not figure out.
I’m not a programmer but i think the css is changing when the data is changing.
I ask your help.
Please post a screenshot marking up, or a description of, the value you’re trying to read.
I think that the data you want is being dynamically pulled in and rendered from this URL:
https://services.swpc.noaa.gov/products/noaa-scales.json
and that this Javascript is updating the page and (as you suspect) modifying CSS classes:
https://www.swpc.noaa.gov/sites/all/modules/custom/swx_noaa_scales/swx_noaa_scales.js
If I’m right, this is a job for a REST sensor processing that JSON response.
Thank you for your fast answare.
You are right, this is exactly what i like to scrap. I have try the rest sensors too but did not work, the attribute what i need is ‘-1’ and i can’t get yaml work with this negative number, have no idea why…
You probably need to use ['-1']
notation.
So you want those three values (letter+number, but they hide any 0s) as individual sensors?
The value can be like R, R1, R2, R3, R4, R5. R means no event. But in the json R=0, R1=1, etc…
I’ve try [’-1’] but not working:
Invalid config for [sensor.rest]: invalid template (TemplateSyntaxError: expected name or number) for dictionary value @ data['value_template']. Got '{{ (value_json.[-1].R.Scale }}'. (See ?, line ?).
the json looks like this:
"-1":{"DateStamp":"2023-02-16","TimeStamp":"18:34:00","R":{"Scale":"0","Text":"none","MinorProb":null,"MajorProb":null},"S":{"Scale":"0","Text":"none","Prob":null},"G":{"Scale":"0","Text":"none"}}}
value_json['-1']['R']['Scale']
No dot before the brackets, quotes around the -1. Always safer to use bracket notation rather than dot notation.
Also, you had a (
without matching )
in the template in the error message.