I just copied what a fellow kiwi had done as I had no clue Scrape sensor improved - scraping multiple values - #282 by joem
I’m glad that you’ve got it working @xbmcnut
Question: did you try to trim the select
content to something like this dd:nth-child(4) > div.compound-read-low > ul
? Just to make the code shorter.
@KevinE try using fieldset
in the select
Thanks for your comments. Would be great if you can help to improve the documentation! (either by a PR on github or by sending me your suggested changes)
Here my answers to your questions:
- To find the form on the HTML page, multiscrape needs a CSS selector. CSS selectors that refer to the id of an element, always require a hashtag.
For retrieving the input fields of the form, the name is being used, as this is also what is submitted.
So in your case it means:
form_submit:
submit_once: true
resource: "https://eyeonwater.ca/signon"
select: "#signin_account"
input:
username: !secret eyeonwater_username
password: !secret eyeonwater_password
- https://stackoverflow.com/questions/19109912/yaml-do-i-need-quotes-for-strings-in-yaml
- Try to find out if it loads a new page after the mouse click (then use that one for scraping) or check (in your browser developer tools) if the values your want to scrape are already retrieved from the server. In that case, multiscrape is not bothered by the mouse click but just continues scraping.
Hi @danieldotnl
Thanks for a great home assistant add-on.
Can you please assist I have been battling to get the right selection for my scraping but it keeps on giving me the same error at the same position of the ID=4 which it don’t like the “4” and if I add “#4” it also don’t like it. What is the right selection for this scrape please. See below is the page, at the yellow highlight is what I am looking for at ID=4 which was “86.0%”
Here is one of many code selection in yellow
And here is one of the logs pointing at the “4” being the problem which I have had in plenty of different orders
Tope this is enough information to assist me
Thanks
Thanks for the response Daniel.
Don’t know if you looked at the logon page for eyeonwater.ca but there is no “username” field (just email address) so how are you mapping my email address to the correct field?
Still can’t get it to work. I get one error and one warning my system log. I think I am not getting authenticated but not sure (this would be a good debug message to log that you got passed/failed authentication). Same error below is logged when I use an incorrect password.
Logger: custom_components.multiscrape.coordinator
Source: custom_components/multiscrape/coordinator.py:62
Integration: Multiscrape scraping component ([documentation](https://github.com/danieldotnl/ha-multiscrape), [issues](https://github.com/danieldotnl/ha-multiscrape/issues))
First occurred: 9:19:51 AM (1 occurrences)
Last logged: 9:19:51 AM
Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page. Could not find form
And this warning:
Logger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:163
Integration: Multiscrape scraping component ([documentation](https://github.com/danieldotnl/ha-multiscrape), [issues](https://github.com/danieldotnl/ha-multiscrape/issues))
First occurred: 9:29:00 AM (1 occurrences)
Last logged: 9:29:00 AM
Scraper_noname_0 # Daily Water Consumption # Unable to scrape data: Could not find a tag for given selector. Consider using debug logging and log_response for further investigation.
I tried commenting out the sensor portion of my code to try and isolate an authentication issue and no errors showed in the log. Is this expected behaviour?
Cheers
Perhaps the ‘4’ is interpreted as a number instead of a string. I don’t know how to get around this however. You might try to write some basic python script with bs4 to try elsewhere?
Something like I did:
Do you get the page_response_body.txt
or page_soup.txt
generated?
Yes, not sure what I do with these. I have scanned them both but there is no rendered data in them.
Well an attribute ID as “4” is illegal in HTML or XML. It is the same as a name token. See Basic HTML data types.
Guys, hoping someone can help me out here.
I am trying to get the realtime electricity price from here: Price Information
The table moves every half an hour and shows the past (real price), present and future (forecasted price). The price that I want is the immediate past, which is always second row, fifth column (USEP
($/MWh)).
From Chrome Inspect, I get
#realtimeWindow > div > div.tabberlive > div:nth-child(2) > div > div > div.realtimeTableContainer > table > tbody > tr:nth-child(2) > td:nth-child(5)
Here is my configuration.yaml
multiscrape:
- resource: https://www.emcsg.com/marketdata/priceinformation
scan_interval: 30
sensor:
- unique_id: electricity_usep_price
name: Electricity USEP Price
select: "#realtimeWindow > div > div.tabberlive > div:nth-child(2) > div > div > div.realtimeTableContainer > table > tbody > tr:nth-child(2) > td:nth-child(5)"
#value_template: '{{ (value.split(":")[1]) }}'
I tried both WITH and WITHOUT tbody but got the same error in the log. My log is already set to DEBUG mode.
This error originated from a custom integration.
Logger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:163
Integration: Multiscrape scraping component (documentation, issues)
First occurred: 17:01:39 (9 occurrences)
Last logged: 17:05:41
Scraper_noname_0 # Electricity USEP Price # Unable to scrape data: Could not find a tag for given selector. Consider using debug logging and log_response for further investigation.
What am I missing?
There is a username:
There should be something in your logs like: Form seems to be submitted succesfully
Don’t think I can help further without credentials.
Hi everyone,
I hope someone can help me!
I’m trying to read some energy variable but I have some problem.
The website request to enter with password and then enter the date.
I’m able to enter with my credential but when I read the date the value is 01/01/2015!
This is very strange because I see the correct date when I use chrome:
This is my yaml file:
How can I insert the actual date before scraping?
Thank you
Hi Daniel,
I tried using both username: and email: with no success. There are no messages in the log that say I passed or failed authentication, even if I put an incorrect password in. I think it’s worth cleaning this part of the code up to help in debugging authentication issues.
Cheers,
Hi, I have a question.
I’m not sure if I’m properly authenticated.
How can I see using logger?
My yaml:
This folder has been created automatically:
Which file should I check?
Reading in this community I have read that I must see in my log a message like this: The form appears to have been submitted successfully.
Where is?
Kevin, did you enable logging in your configuration.yaml like this?
logger:
default: info
logs:
custom_components.multiscrape: debug
When I run this with your config, it tells me pretty clear that the reason why your config fails:
The form is hidden within a <script>
tag, and showed with Javascript. This makes it a complicated case, I’ll try to take a better look later.
I’m not sure if I can make this more clear, as in the end, it is a form-submit feature, and not just for authentication. So I cannot assume that authentication failed, it can be any kind of form. E.g. your address for retrieving a garbage collection schedule.
See answer above to Kevin:
Add this to your configuration.yaml:
logger:
default: info
logs:
custom_components.multiscrape: debug
Hi all, my problem is config weather station in HA with xml file and multiscrape, this is page
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<maintag>
<script/>
<misc>
<data misc="refresh_time">2022.10.12. 230831</data>
</misc>
<data realtime="temp">22.11111111111111</data>
</realtime>
and here is config in multiscrape.yaml
multiscrape:
-resource: https://192.168.1.39/realtime.xml
scan_interval: 30
sensor:
-unique_id: temp_out_weather
name: "TEMP"
select: realtime > data realtime="temp":nth-child(1)"
value_template: '{{ (value.split("")[1]) }}'
in developers tools / YAML show me this:
Invalid config for [multiscrape]: [multiscrape] is an invalid option for [multiscrape]. Check: multiscrape->multiscrape->0->multiscrape. (See /config/configuration.yaml, line 11).
in configuration.yaml line 11=
# Text to speech
tts:
- platform: google_translate
automation: !include automations.yaml
script: !include scripts.yaml
scene: !include scenes.yaml <------------------line--11------------
multiscrape: !include multiscrape.yaml
thank’s
You should not repeat the integration name when you include files. Try this in multiscrape.yaml:
- resource: https://192.168.1.39/realtime.xml
scan_interval: 30
sensor:
- unique_id: temp_out_weather
name: "TEMP"
select: 'realtime > data realtime="temp":nth-child(1)'
value_template: '{{ (value.split("")[1]) }}'