look here, when i manually put the percentage of the battery, the last selector i sended you should work
maybe try restarting that device that provides the data, as here i can scrape that 96%
Yeah Iāve never actually reset the Generac system, Iāll restart it and see if that helps. Thank you so much for bearing with me.
no problems at all, just let me know, iām glad to help
Can someone help determine why I get an āunknownā for this tankpercentage". Here is my yaml configā¦Tank percentage is in the form of ā62%ā
multiscrape:
- resource: AmeriGas Login
scan_interval: 3600
headers:
User-Agent: Mozilla/5.0
form_submit:
submit_once: True
select: āform-control-validā
input:
email: myusername
password: mypassword
sensor:
- select: ā#layoutDiv > main > div.container.pl-0.pr-0.pl-xl-3.pr-xl-3.pl-lg-3.pr-lg-3.pl-md-3.pr-md-3.pl-sm-0.pr-sm-0 > div:nth-child(2) > div.col-12.col-xl-6.col-lg-6.col-md-12.col-sm-12.pl-0.pr-0.pr-xl-3.pr-lg-3.pr-md-0.pr-sm-0 > div.col-12.bg-white.tankanddeliveries-padding.top-margin > div:nth-child(3) > div.col-12.col-xl-4.col-lg-4.col-md-12.col-sm-12.p-0.mt-3.EstimatedTankDiv > div > div.col-12.p-0.lblvalue-Estimatedtankā
name: Tankpercentage
- unique_id: Tank_percentage
Here is the error from Log:
ogger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:139
Integration: Multiscrape scraping component (documentation, issues)
First occurred: December 15, 2021, 10:20:05 PM (19 occurrences)
Last logged: 4:20:05 PM
Sensor Tankpercentage was unable to extract data from HTML
Thanks in advance
As of release v5.7.0, Multiscrape could also be used as an improved REST component. It now supports JSON in the value_templates, enabling you with the same syntax as the RESTful sensors but added to all the extras of Multiscrape. E.g. form-submit, entity pictures, icon templates, etc.!
how it works? I donāt understandā¦ can u tell some exemples?
Iām trying to get access to my electricity data from powernet.nz
The login page is https://secure.powershop.co.nz
To access it, Iām using
select: "#Container2"
input:
username: "user"
password: "password"
The error Iām getting is WARNING (MainThread) [custom_components.multiscrape.sensor] Sensor Powershop daily consumption was unable to extract data from HTML
Either I am using the wrong select
or there is something deeper hidden in the webpage that wonāt let me log in.
In the inspector I notice a token, that changes with every page reload: <input type="hidden" name="authenticity_token" value="xxxxxxxxxtNom93qzs2QsUFLwYswaz9uWG5PczZzzrJNBvXB78pnrKQUH9ss4vybfoxdaZiL9Bg==">
Could someone help me out?
It is very difficult to help without the username/password Those hidden input fields are taken into account (submitted) by the form-submit feature though.
Could you try-out pre-release 6.0.0? I released it yesterday and itās stuffed with extra logging and debug information! Also checkout the updated troubleshooting part in the readme.
Installed 6.0.0
the log, amongst other things returns the following:
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape] # Start loading multiscrape
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape] # Reload service registered
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape] # Start processing config from configuration.yaml
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape] # Found no name for scraper, generated a unique name: Scraper_noname_0
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Setting up multiscrape with config:
OrderedDict([('resource', 'https://secure.powershop.co.nz/customers/REDACTED/balance'), ('scan_interval', datetime.timedelta(seconds=3600)), ('form_submit', OrderedDict([('submit_once', True), ('resource', 'https://secure.powershop.co.nz/customers/REDACTED/balance'), ('select', '#Container2'), ('input', OrderedDict([('username', '[email protected]'), ('password', 'REDACTED')])), ('resubmit_on_error', True)])), ('sensor', [OrderedDict([('unique_id', 'powershop_daily_consumption'), ('name', 'Powershop daily consumption'), ('select', Template("#unit-balance-container > div.estimated-cost.white-box > p > span")), ('on_error', OrderedDict([('log', 'warning'), ('value', 'last')])), ('force_update', False)])]), ('timeout', 10), ('log_response', False), ('parser', 'lxml'), ('verify_ssl', True), ('method', 'GET')])
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Initializing scraper
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Found form-submit config
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Refresh triggered
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Continue with form-submit
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Requesting page with form from: https://secure.powershop.co.nz/customers/REDACTED/balance
2022-01-14 22:32:11 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Executing form_page-request with a GET to url: https://secure.powershop.co.nz/customers/REDACTED/balance.
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Response status code received: 302
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Start trying to capture the form in the page
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Parse HTML with BeautifulSoup parser lxml
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Try to find form with selector #Container2
2022-01-14 22:32:13 INFO (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Unable to extract form data from.
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Exception extracing form data: list index out of range
2022-01-14 22:32:13 ERROR (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Updating data from https://secure.powershop.co.nz/customers/REDACTED/balance
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Executing page-request with a get to url: https://secure.powershop.co.nz/customers/REDACTED/balance.
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Response status code received: 302
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Data succesfully refreshed. Sensors will now start scraping to update.
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Start loading the response in BeautifulSoup.
2022-01-14 22:32:13 DEBUG (MainThread) [custom_components.multiscrape] Finished fetching scraper data data in 2.041 seconds (success: True)
2022-01-14 22:32:26 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Powershop daily consumption # Setting up sensor
2022-01-14 22:32:26 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Powershop daily consumption # Start scraping to update sensor
2022-01-14 22:32:26 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Powershop daily consumption # Select selected tag: None
2022-01-14 22:32:26 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Exception occurred while scraping, will try to resubmit the form next interval.
2022-01-14 22:32:26 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Powershop daily consumption # Exception selecting sensor data: 'NoneType' object has no attribute 'name'
HINT: Use debug logging and log_response for further investigation!
2022-01-14 22:32:26 WARNING (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Powershop daily consumption # Unable to extract data
2022-01-14 22:32:26 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Powershop daily consumption # On-error, keep old value: None
2022-01-14 22:32:26 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Powershop daily consumption # Updated sensor and attributes, now adding to HA
So itās stuck finding the form because #Container2
is pointing to a div
and not to the form
.
Also, the input field is has id email
instead of username
.
Try this:
select: "#Container2 > div > form"
input:
email: "user"
password: "password"
Anyone else having troubles with debugging?
Iām trying to find out why scraping doesnāt work, but even though Iāve enabled debugging as per the user guide, when I add log_response
to file
, I get a problem when validating the config:
Invalid config for [multiscrape]: [log_response] is an invalid option for [multiscrape]. Check: multiscrape->multiscrape->0->log_response. (See /config/configuration.yaml, line 114).
I even get the same issue with adding log_response to the sample config:
multiscrape:
- resource: https://www.home-assistant.io
scan_interval: 3600
log_response: file
sensor:
- unique_id: ha_latest_version
name: Latest version
select: ".current-version > h1:nth-child(1)"
value_template: '{{ (value.split(":")[1]) }}'
- unique_id: ha_release_date
icon: >-
{% if is_state('binary_sensor.ha_version_check', 'on') %}
mdi:alarm-light
{% else %}
mdi:bat
{% endif %}
name: Release date
select: ".release-date"
This option has been added in the latest pre-release (6.0.0). So you need to either enable the pre-release in HACS or wait for the final release.
Never mind, I see I forgot to update the READme. The responses are always written to files, so the value should just be āTrueā instead of āfileā.
Update: the readme on github has been updated.
It was working with 6.0.0 and True I managed to figure out the problem, it was having special UTF characters in the URL. Thanks for the quick help!
Need help with a multiscrape sensor.
multiscrape:
- resource: "https://www.amaysim.com.au/my-account/my-amaysim/products"
name: Amaysim
scan_interval: 30
log_response: true
method: GET
form_submit:
submit_once: true
resubmit_on_error: false
resource: "https://accounts.amaysim.com.au/identity/login"
select: "#new_session"
input:
username: !secret amaysim_username
password: !secret amaysim_password
sensor:
- select: "#outer_wrap > div.inner-wrap > div.page-container > div:nth-child(2) > div.row.margin-bottom > div.small-12.medium-6.columns > div > div > div:nth-child(2) > div:nth-child(2)"
name: amaysim_remaining_data
value_template: "{{ value }}"
I can see from the logs that after successfully submitting the form, it says that it is getting the data from the resource url, with a response code of 200.
I have pasted the contents from the log_response file page_soup.txt
below
<html><body><p>/**/('OK')</p></body></html>
Below is the content from the form_submit_response_body.txt
<html><body>You are being <a href="https://accounts.amaysim.com.au/identity">redirected</a>.</body></html>
It seems that after submitting the form, the sensor is scraping data from the intermediate page and not from the resource url.
@danieldotnl first let me thank you for this great component! Iām using form submit and need to extract to a sensor (or sensor attribute) a XSFR-TOKEN and a Cookie that is showed in page āpage_response_headers.txtā . Is there any easy way? With this then I cal the endpoint address (api json) to grab the information that I need since the page content is created with JavaScript. Can you help?
Thank you!
I am trying to scrape the XML file from my HP Envy printer, but each element contains predicates, this is a small sample of what I get from the printer:
<?xml version="1.0" encoding="UTF-8"?>
<!--THIS DATA SUBJECT TO DISCLAIMER(S) INCLUDED WITH THE PRODUCT OF ORIGIN.-->
<pudyn:ProductUsageDyn xsi:schemaLocation="http://www.hp.com/schemas/imaging/con/ledm/productusagedyn/2007/12/11 ../schemas/ProductUsageDyn.xsd" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/" xmlns:dd2="http://www.hp.com/schemas/imaging/con/dictionaries/2008/10/10" xmlns:pudyn="http://www.hp.com/schemas/imaging/con/ledm/productusagedyn/2007/12/11" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dd:Version>
<dd:Revision>SVN-IPG-LEDM.119</dd:Revision>
<dd:Date>2010-08-31</dd:Date>
</dd:Version>
<pudyn:PrinterSubunit>
<dd:TotalImpressions PEID="5082">369</dd:TotalImpressions>
<dd:MonochromeImpressions>0</dd:MonochromeImpressions>
<dd:ColorImpressions>135</dd:ColorImpressions>
<dd:A4EquivalentImpressions>
<dd:TotalImpressions PEID="5082">369</dd:TotalImpressions>
<dd:MonochromeImpressions>0</dd:MonochromeImpressions>
</dd:A4EquivalentImpressions>
<dd:SimplexSheets>64</dd:SimplexSheets>
<dd:DuplexSheets PEID="5088">143</dd:DuplexSheets>
<dd:JamEvents PEID="16076">3</dd:JamEvents>
<dd:MispickEvents>2</dd:MispickEvents>
<dd:TotalFrontPanelCancelPresses PEID="30033">4</dd:TotalFrontPanelCancelPresses>
<pudyn:UsageByMarkingAgent>
<dd2:CumulativeMarkingAgentUsed PEID="64100">
<dd:ValueFloat>12</dd:ValueFloat>
<dd:Unit>milliliters</dd:Unit>
</dd2:CumulativeMarkingAgentUsed>
<dd2:CumulativeHPMarkingAgentUsed PEID="64101">
<dd:ValueFloat>12</dd:ValueFloat>
<dd:Unit>milliliters</dd:Unit>
</dd2:CumulativeHPMarkingAgentUsed>
<dd:CumulativeHPMarkingAgentInserted PEID="64001">
<dd:ValueFloat>14</dd:ValueFloat>
<dd:Unit>milliliters</dd:Unit>
</dd:CumulativeHPMarkingAgentInserted>
</pudyn:UsageByMarkingAgent>
</pudyn:PrinterSubunit>
</pudyn:ProductUsageDyn>
I have created the following sensor:
- resource: http://10.0.0.97/DevMgmt/ProductUsageDyn.xml
scan_interval: 10
method: get
sensor:
- name: HP Printer Total Impressions
unique_id: hp_printer_total_impressions
select: "TotalImpressions"
But I get an error:
2022-02-11 15:41:46 DEBUG (MainThread) [custom_components.multiscrape.scraper] Updating from http://10.0.0.97/DevMgmt/ProductUsageDyn.xml
2022-02-11 15:41:47 DEBUG (MainThread) [custom_components.multiscrape] Finished fetching scraper data data in 1.857 seconds (success: True)
2022-02-11 15:41:47 DEBUG (MainThread) [custom_components.multiscrape.sensor] Exception selecting sensor data: list index out of range
2022-02-11 15:41:47 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor HP Printer Total Impressions was unable to extract data from HTML
How do I get to select within XML
I have found a HACS HP Printer Integration that will parse the XML, but I would still like to know how to parse above XML, because my printer include detail on what type of paper has been used, and the HACS HP Printer Integration does not export thosse values.