Issue with Multiscrape / Scrape

Hi,

I am trying to scrape a few values from this website: LOGIN - Remocon NET

As you can see this requires a login which I think I have figured out with this configuration:

# Elco Remocon
multiscrape:
  - resource: 'https://www.remocon-net.remotethermo.com/BsbPlantDashboard/Index/F0AD4E0B7C60'
    scan_interval: 5
    form_submit:
      submit_once: True
      resource: 'https://www.remocon-net.remotethermo.com/Account/Login'
      select: "#login-form"
      input:
        Email: myUser
        Password: 'myPassword'
        extra: field
    sensor:
      - select: '#contentWrapper > div.container-fluid > table > tr > td:nth-child(1) > h5:nth-child(2)'
      # select: '#content > div:nth-child(2) > div > table > tr:nth-child(3) > td:nth-child(2) > div > table > tr > td:nth-child(2) > div > div > div'
        name: myvalue

This provides a value back as expected. But the issue is with some other values on the site which I can’t get displayed. The value I want to get is the temperature value as highlighted below:

If i use Copy selector I get:

#content > div:nth-child(2) > div > table > tbody > tr:nth-child(3) > td:nth-child(2) > div > table > tbody > tr > td:nth-child(2) > div > div > div

Trying this in the config gives me the following error in the logs (which I assume could be from this

2021-08-19 14:57:22 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor myvalue was unable to extract data from HTML
2021-08-19 14:57:22 ERROR (MainThread) [aiohttp.server] Unhandled exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1183, in _sendfile_fallback
read = await self.run_in_executor(None, file.readinto, view)
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 485, in start
resp, reset = await task
File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 440, in _handle_request
reset = await self.finish_response(request, resp, start_time)
File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 591, in finish_response
await prepare_meth(request)
File "/usr/local/lib/python3.9/site-packages/aiohttp/web_fileresponse.py", line 241, in prepare
return await self._sendfile(request, fobj, offset, count)
File "/usr/local/lib/python3.9/site-packages/aiohttp/web_fileresponse.py", line 96, in _sendfile
await loop.sendfile(transport, fobj, offset, count)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1162, in sendfile
return await self._sendfile_fallback(transport, file,
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1192, in _sendfile_fallback
await proto.restore()
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 263, in restore
self._transport.resume_reading()
File "/usr/local/lib/python3.9/asyncio/sslproto.py", line 343, in resume_reading
self._ssl_protocol._transport.resume_reading()
AttributeError: 'NoneType' object has no attribute 'resume_reading'
2021-08-19 14:57:29 ERROR (MainThread) [custom_components.multiscrape.sensor] Sensor myvalue was unable to extract data from HTML

If I try the config without the 2 tbodies like this:

- select: '#content > div:nth-child(2) > div > table > tr:nth-child(3) > td:nth-child(2) > div > table > tr > td:nth-child(2) > div > div > div'

I don’t get an error in the logs but my value is empty. Any help or hint would be very much appreciated.

Thanks,
Thomas

I think that the selector without the tbodys is correct, but that the value of the field is fetched by the browser after the initial request by a javascript request. You might be able to find this out by monitoring the requests in the developer tools of your browser.
If this is the case, you cannot scrape it :slightly_frowning_face:

Hi Daniel,

Thanks. I think you are right with that this is kind of a dynamic content loaded after a while. Is there no possibility of putting a delay or something like that in that it waits until it is completely loaded?

Thanks,
Thomas

I’m afraid it’s not that easy. It would mean that not need to interpret and execute Javascript which basically means building a black box browser under the hood…

I am considering adding rest sensor capabilities. This would mean that you could use the form-submit functionality to login, and then use that session to do the same request that’s normally done from Javascript.
Were you able to find that request?

Thanks. I would love to find out but I am not that skilled with the developer tools. In case you could provide some guidance I would be more than curious to find out.

Ok, It seems I found the request:

image

The result is shown as:

{outsideTemp: 15, hasOutsideTempProbe: true,…}
dhwComfortTemp: {min: 44, max: 55, value: 48, prevValue: 0, step: 1}
dhwEnabled: true
dhwMode: 1
dhwStorageTemp: 49.2
dhwStorageTempError: false
flameSensor: false
hasDhwStorageProbe: true
hasOutsideTempProbe: true
heatPumpOn: false
maxDateAsText: "8/23/2022"
outsideTemp: 15
outsideTempError: false
todayAsMilliseconds: 1629669600000
todayAsText: "8/23/2021"
utcOffset: 120
zone: {holidays: [,…], mode: {allowedOptions: [0, 1, 2, 3], value: 1}, isHeatingActive: true,…}

Would that kind of work with the solution you have in mind? At least this response shows me all the values I am missing at the moment :slight_smile:

This is excellent! And this URL is not directly accessible right? Does it give an authentication error?
I’m currently on vacation so it will take a while before I can work on it.

Thanks for your reply and enjoy your vacation. Correct the URL:

https://www.remocon-net.remotethermo.com/BsbPlantDashboard/GetPlantData/F0AD4E0B7C60?zoneNum=1&isFirstRoundTrip=true&rnd=1629792069874&_=1629792068973

is not directly accessible. If you enter the URL in a new browser tab you will be redirected to the login page.

Would be great if we could get this working as I can imagine it work be helpful for many other people as well. Happy to test, debug etc.

1 Like

Hi Daniel,

I hope you had a great vacation. May I ask if you already had the chance to look into this?

Thanks.

Didn’t really have time yet, but it’s not off the radar!

1 Like

Hi Daniel,

Any chance you will be able to have a look at this?

Thomas

Hi Thomas,

Did you find a solution? I can also request the values via URL but have problems to translate/create sensors out of it.

Best,
Robert

No. No further success on my side. But how do you get the values via URL? I would be very interested in learning more about that. Do you have an example you can share? Maybe I can help with the sensors?

Hi Thomas

I did the following:

homeassistant:
  allowlist_external_dirs:
    - '/config'
sensor:
  - platform: file
    name: 'Temperature'
    file_path: /config/Elco/Elco.json
    value_template: '{{ value_json.outsideTemp }}'
    unit_of_measurement: "°C"

Note: This example is searching for “outsideTemp”. You can change to any other term of the URL-output.

My problem is now - how do I set-up an automation that the content from the URL-link is automatically downloaded to my elco.json file :smirk:

Best Robi

why you dont scrape it directly with multiscrape ?

As this is a dynamically generated website Multiscrape doesn’t work. It only works for static webpages.

@Robi07 Very interesting approach. May I ask how you got the elco.json file initally? What manual step do you do to have it created?

@Robi07 Depending on how you get the json file… the rest sensor should work in case you get the json result from an URL. My question would be what is the URL you get the result from?

I did a rest sensor in the past for an IP camera which looked like this - just an example:

  - platform: rest
    resource: http://192.168.178.XX/cgi-bin/api.cgi?cmd=GetMdState&channel=3&rs=%3Crandom_string_of_characters_and_numbers%3E&user=MYUSER&password=MYPASSWORD
    name: motion_garden
    scan_interval: 1
    value_template: '{{ value_json[0].value.state }}'
    json_attributes:
      - 0

Hi Thomas,

The problem is that I created the json-file on my own i.e. i copied the output from the url and created a json-file (with the extension json) in Home Assistant. My initial idea was to download the data’s from the URL with Sitesucker and then synchronizing with File Sync. Problem is that i can enter the url with Sitesucker-web-view option but the download button is than disabled.

I tried several things (normal scrape and rest-sensor) but did not work out. I think it’s about the form_submit function that only multiscape offers.

Do you know a way to download the url-content with another app/software (including time scheduling options)? If, you can implement a sync-function (with file synced) on your PC and than it should be working.

The problem is that the output is shown as text / no selector-copy functionality (only getting “body > pre”). Do you know a way with value_template or something else to extract something out of the following picture e.g. “outsideTemp” at the beginning?