Multiscrape / Scrape from VRM Victron

Hey Guys,
I need some help wit Scrape. I’m trying to solve this problem since days.

I want to scrape the “22” from the VRM site of Victron
the selector link is:
dashboard-wrapper > div.container.vrm-dashboard > div > div.col-xs-12.col-md-4 > div > div:nth-child(2) > div > div > div.summary

My code (not working) is:

  - resource: https://vrm.victronenergy.com/installation/**XXXXXXXXX**/overview
    scan_interval: 200
    form_submit:
      submit_once: True
      resource: https://vrm.victronenergy.com/login
      select: "#login-form"
      input:
        Email: "**Mail**"
        Password: "**Password**"
        extra: field
    sensor:
      - unique_id: victron_test
        select: "div > div.col-xs-12.col-md-4 > div > div:nth-child(2) > div > div > div.summary"
        name: Victron Test 

Please help me :joy: I’m really desperate

Forgot to say: I have to login. Attached the login window

Could you post your multiscrape logging?

Hey Daniel,
this is what I get from the log:
(I had to modify the Links/adresses in this post otherwise I couldn’t reply you: htXXtps= https)
.
.
.
.
023-01-10 21:46:35.853 INFO (MainThread) [homeassistant.setup] Setting up multiscrape
2023-01-10 21:46:35.854 DEBUG (MainThread) [custom_components.multiscrape] # Start loading multiscrape
2023-01-10 21:46:35.855 DEBUG (MainThread) [custom_components.multiscrape] # Reload service registered
2023-01-10 21:46:35.855 DEBUG (MainThread) [custom_components.multiscrape] # Start processing config from configuration.yaml
2023-01-10 21:46:35.855 DEBUG (MainThread) [custom_components.multiscrape] # Found no name for scraper, generated a unique name: Scraper_noname_0
2023-01-10 21:46:35.855 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Setting up multiscrape with config:
OrderedDict([(‘resource’, ‘htXXtps://vrm.victronenergy.com/installation/XXXXXXX/overview’), (‘scan_interval’, datetime.timedelta(seconds=200)), (‘form_submit’, OrderedDict([(‘submit_once’, True), (‘resource’, ‘htXXtps://vrm.victronenergy.com/login’), (‘select’, ‘#login-form’), (‘input’, OrderedDict([(‘Email’, ‘XXX’), (‘Password’, ‘XX’), (‘extra’, ‘field’)])), (‘resubmit_on_error’, True), (‘input_filter’, )])), (‘sensor’, [OrderedDict([(‘unique_id’, ‘victron_test’), (‘select’, Template(“div > div.col-xs-12.col-md-4 > div > div:nth-child(2) > div > div > div.summary”)), (‘name’, ‘Victron Test’), (‘force_update’, False)])]), (‘timeout’, 10), (‘parser’, ‘lxml’), (‘method’, ‘GET’), (‘list_separator’, ‘,’), (‘verify_ssl’, True), (‘log_response’, False)])
2023-01-10 21:46:35.885 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Initializing http wrapper
2023-01-10 21:46:35.886 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Initializing form submitter
2023-01-10 21:46:35.886 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Initializing scraper
2023-01-10 21:46:35.886 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Initializing scraper
2023-01-10 21:46:35.886 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Initializing http wrapper
2023-01-10 21:46:35.886 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Initializing coordinator
2023-01-10 21:46:35.901 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # New run: start (re)loading data from resource
2023-01-10 21:46:35.901 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Rendered resource template into: htXXtps://vrm.victronenergy.com/installation/xxxx/overview
2023-01-10 21:46:35.901 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Starting with form-submit
2023-01-10 21:46:35.901 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Requesting page with form from: htXXtps://vrm.victronenergy.com/login
2023-01-10 21:46:35.902 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing form_page-request with a GET to url: htXXtps://vrm.victronenergy.com/login.

2023-01-10 21:46:48.019 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2023-01-10 21:46:48.020 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Parse page with form with BeautifulSoup parser lxml
2023-01-10 21:46:48.032 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Try to find form with selector #login-form
2023-01-10 21:46:48.035 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page.
Could not find form
2023-01-10 21:46:48.038 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Request data from htXXtps://vrm.victronenergy.com/installation/XXXX/overview
2023-01-10 21:46:48.038 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a get to url: htXXtps://vrm.victronenergy.com/installation/xxxx/overview.

2023-01-10 21:46:48.019 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2023-01-10 21:46:48.020 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Parse page with form with BeautifulSoup parser lxml
2023-01-10 21:46:48.032 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Try to find form with selector #login-form
2023-01-10 21:46:48.035 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page.
Could not find form
2023-01-10 21:46:48.038 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Request data from htXXtps://vrm.victronenergy.com/installation/xxxx/overview
2023-01-10 21:46:48.038 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a get to url: htXXtps://vrm.victronenergy.com/installation/xxxx/overview.

It’s not finding the form because that selector is not valid. Try this instead:
select: form.ng-pristine

Hello Daniel,
I tried your suggestion: (Links changed https–>httXXps)


  - resource: httXXps://vrm.victronenergy.com/installation/XX/overview
    scan_interval: 200
    form_submit:
      submit_once: True
      resource: httXXps://vrm.victronenergy.com/login
      **[u]select: form.ng-pristine[/u]**
      input:
        Email: "XXXX"
        Password: "XXX"
        extra: field
    sensor:
      - unique_id: victron_test
        select: "div > div.col-xs-12.col-md-4 > div > div:nth-child(2) > div > div > div.summary"
        name: Victron Test 

Unfortunately still no success

2023-01-11 22:18:03.019 INFO (MainThread) [homeassistant.setup] Setting up multiscrape
2023-01-11 22:18:03.019 DEBUG (MainThread) [custom_components.multiscrape] # Start loading multiscrape
2023-01-11 22:18:03.020 DEBUG (MainThread) [custom_components.multiscrape] # Reload service registered
2023-01-11 22:18:03.020 DEBUG (MainThread) [custom_components.multiscrape] # Start processing config from configuration.yaml
2023-01-11 22:18:03.020 DEBUG (MainThread) [custom_components.multiscrape] # Found no name for scraper, generated a unique name: Scraper_noname_0
2023-01-11 22:18:03.020 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Setting up multiscrape with config:
OrderedDict([(‘resource’, ‘htXXtps://vrm.victronenergy.com/installation/XXXXXX/overview’), (‘scan_interval’, datetime.timedelta(seconds=200)), (‘form_submit’, OrderedDict([(‘submit_once’, True), (‘resource’, ‘htXXtps://vrm.victronenergy.com/login’), (‘select’, ‘form.ng-pristine’), (‘input’, OrderedDict([(‘Email’, ‘XXXXX’), (‘Password’, ‘XXXXX’), (‘extra’, ‘field’)])), (‘resubmit_on_error’, True), (‘input_filter’, [])])), (‘sensor’, [OrderedDict([(‘unique_id’, ‘victron_test’), (‘select’, Template(“div > div.col-xs-12.col-md-4 > div > div:nth-child(2) > div > div > div.summary”)), (‘name’, ‘Victron Test’), (‘force_update’, False)])]), (‘timeout’, 10), (‘parser’, ‘lxml’), (‘list_separator’, ‘,’), (‘verify_ssl’, True), (‘log_response’, False), (‘method’, ‘GET’)])
2023-01-11 22:18:03.048 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Initializing http wrapper
2023-01-11 22:18:03.048 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Initializing form submitter
2023-01-11 22:18:03.048 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Initializing scraper
2023-01-11 22:18:03.048 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Initializing scraper
2023-01-11 22:18:03.049 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Initializing http wrapper
2023-01-11 22:18:03.049 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Initializing coordinator
2023-01-11 22:18:03.053 INFO (MainThread) [homeassistant.setup] Setting up media_player
2023-01-11 22:18:03.070 INFO (MainThread) [homeassistant.setup] Setup of domain media_player took 0.0 seconds
2023-01-11 22:18:03.086 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # New run: start (re)loading data from resource
2023-01-11 22:18:03.086 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Rendered resource template into: httXXps://vrm.victronenergy.com/installation/XXXXX/overview
2023-01-11 22:18:03.086 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Starting with form-submit
2023-01-11 22:18:03.086 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Requesting page with form from: httXXps://vrm.victronenergy.com/login
2023-01-11 22:18:03.086 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing form_page-request with a GET to url: httXXps://vrm.victronenergy.com/login.

.
.
.
.
.

.
2023-01-11 22:18:18.163 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Victron Test # Start scraping to update sensor
2023-01-11 22:18:18.166 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Victron Test # Tag selected: None
2023-01-11 22:18:18.166 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Exception occurred while scraping, will try to resubmit the form next interval.
2023-01-11 22:18:18.167 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Victron Test # Unable to scrape data: Could not find a tag for given selector
Consider using debug logging and log_response for further investigation.
2023-01-11 22:18:18.174 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Victron Test # On-error, set value to None
2023-01-11 22:18:18.174 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Victron Test # Updated sensor and attributes, now adding to HA
.
.
.
Do you have any other Ideas?
Thanks

Looks like the form is somehow dynamically loaded. Try to see what this does:

form_submit:
      submit_once: True
      resource: https://vrm.victronenergy.com/login
      input:
        password: "XXX"
        remember_me: "false"
        sms_token: "null"
        username: "XXXX"

Hey Daniel
**I tried your suggestions. I think the error changed now. Could you check the logs:
.
.
.
**

023-01-12 22:57:19.167 INFO (MainThread) [homeassistant.setup] Setting up multiscrape
2023-01-12 22:57:19.168 DEBUG (MainThread) [custom_components.multiscrape] # Start loading multiscrape
2023-01-12 22:57:19.168 DEBUG (MainThread) [custom_components.multiscrape] # Reload service registered
2023-01-12 22:57:19.168 DEBUG (MainThread) [custom_components.multiscrape] # Start processing config from configuration.yaml
2023-01-12 22:57:19.168 DEBUG (MainThread) [custom_components.multiscrape] # Found no name for scraper, generated a unique name: Scraper_noname_0
2023-01-12 22:57:19.169 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Setting up multiscrape with config:
OrderedDict([(‘resource’, ‘httXXps://vrm.victronenergy.com/installation/XXX/overview’), (‘scan_interval’, datetime.timedelta(seconds=200)), (‘form_submit’, OrderedDict([(‘submit_once’, True), (‘resource’, ‘httXXps://vrm.victronenergy.com/login’), (‘input’, OrderedDict([(‘password’, ‘XXX’), (‘remember_me’, ‘false’), (‘sms_token’, ‘null’), (‘username’, ‘XXX’)])), (‘resubmit_on_error’, True), (‘input_filter’, )])), (‘sensor’, [OrderedDict([(‘unique_id’, ‘victron_test’), (‘select’, Template(“div > div.col-xs-12.col-md-4 > div > div:nth-child(2) > div > div > div.summary”)), (‘name’, ‘Victron Test’), (‘force_update’, False)])]), (‘timeout’, 10), (‘list_separator’, ‘,’), (‘verify_ssl’, True), (‘parser’, ‘lxml’), (‘log_response’, False), (‘method’, ‘GET’)])
2023-01-12 22:57:19.194 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Initializing http wrapper
2023-01-12 22:57:19.195 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Initializing form submitter
2023-01-12 22:57:19.195 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Initializing scraper
2023-01-12 22:57:19.195 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Initializing scraper
2023-01-12 22:57:19.195 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Initializing http wrapper
2023-01-12 22:57:19.195 DEBUG (MainThread) [custom_components.multiscrape] Scraper_noname_0 # Initializing coordinator
2023-01-12 22:57:19.292 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # New run: start (re)loading data from resource
2023-01-12 22:57:19.292 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Rendered resource template into: httXXps://vrm.victronenergy.com/installation/XXXX/overview
2023-01-12 22:57:19.292 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Starting with form-submit
2023-01-12 22:57:19.292 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Skip scraping form, assuming all input is given in config.
2023-01-12 22:57:19.292 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Merged input fields with input data in config. Result: {‘password’: ‘XXXX’, ‘remember_me’: ‘false’, ‘sms_token’: ‘null’, ‘username’: ‘XXXX’}
2023-01-12 22:57:19.292 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Determined the url to submit the form to: httXXps://vrm.victronenergy.com/login
2023-01-12 22:57:19.292 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Submitting the form
2023-01-12 22:57:19.293 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing form_submit-request with a POST to url: httXXps://vrm.victronenergy.com/login.
.
.
.
.
2023-01-12 22:57:22.428 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 405
2023-01-12 22:57:22.428 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Error executing POST request to url: httXXps://vrm.victronenergy.com/login.
Error message:
HTTPStatusError(“Client error ‘405 Not Allowed’ for url ‘httXXps://vrm.victronenergy.com/login’\nFor more information check: httXXps://httpstatuses.com/405”)
2023-01-12 22:57:22.428 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page.
Client error ‘405 Not Allowed’ for url ‘httXXps://vrm.victronenergy.com/login’
For more information check: httXXps://httpstatuses.com/405
2023-01-12 22:57:22.434 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Request data from httXXps://vrm.victronenergy.com/installation/XX/overview
2023-01-12 22:57:22.434 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a get to url: httXXps://vrm.victronenergy.com/installation/XX/overview.
.
.
.
.
2023-01-12 22:57:24.527 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Victron Test # Start scraping to update sensor
2023-01-12 22:57:24.531 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Victron Test # Tag selected: None
2023-01-12 22:57:24.531 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Exception occurred while scraping, will try to resubmit the form next interval.
2023-01-12 22:57:24.534 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Victron Test # Unable to scrape data: Could not find a tag for given selector
Consider using debug logging and log_response for further investigation.
2023-01-12 22:57:24.538 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Victron Test # On-error, set value to None
2023-01-12 22:57:24.538 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Victron Test # Updated sensor and attributes, now adding to HA

Too bad :frowning: but I’m afraid that unless you can provide me with valid credentials, I can’t help you any further…

Hey Daniel
I created an Testaccount at VRM:
https://vrm.victronenergy.com/login

Username: multiscrape.test(at)gmail.com
PW: Homeassistant1!

Maybe we can approach the problem step by step and solve the login issue with this account.
It’s still an empty account, maybe we can scrape the header: “My installations”

div.container > div.row.vrm-installation-header-row > div.col-sm-7 > page-title > h1 > span:nth-child(2)

Once more, thanks a lot for the support :pray:

Daniel

Hello Daniel,
did you have the chance to check the login scrape?
Best regards

I had a more in-depth look. I found out the site is using oauth authentication. It means you first need to request a token, and then send the token in the header of subsequent requests. Also, it needs to be refreshed once it expires.
That’s pretty complicated, a big effort and I’m not sure at all if I build this for this site, that it will work for all sites using oauth.
Therefore, I’m sorry but this one is a ‘nice for when i’m bored’ :slight_smile:
Of course, if you are able to, feel free to develop it and create a PR.

Hey Daniel,
I have a second project running.
Also having no sucess. could you check my coding?

thanks and greetings