REST Sensor nearly there but yet so far

Hi there

So this is my first time using the rest sensor and I am trying to get information from a website which requires username and password. Here is my configuration:

sensor:
  - platform: rest
    name: petrol
    resource: https://app.petrolprices.com/map?fuelType=2&brandType=0&resultLimit=0&offset=0&sortType=price&lat=51.597033&lng=-0.349733&z=11&d=2
    username: XXXX
    password: XXXX
    json_attributes_path: "$.next.*.properties"
    value_template: "OK"
    json_attributes:
      - "name"
      - "price"
    headers:
      Content-Type: application/json

The link/resource is basically the following data after logging in from my Windows 10 laptop on Chrome:

{"error":false,"limitExceed":false,"data":{"type":"FeatureCollection","features":[{"type":"Feature","geometry":{"type":"Point","coordinates":[-0.317438,51.60131]},"properties":{"price":1587,"fuel_type":2,"user_id":1153383,"recorded_time":"2022-09-27T12:38:05.000Z","user_name":"nhroach1","idstation":1884,"fuel_brand":16,"fuel_brand_name":"ASDA","name":"BELMONT SERVICE STATION","address1":"BELMONT CIRCLE","address2":"BELMONT","town":"HARROW","county":"OUTER LONDON","postcode":"HA3 8SF","google":null,"phone":null,"open_hours":null,"Monday":null,"Tuesday":null,"Wednesday":null,"Thursday":null,"Friday":null,"Saturday":null,"Sunday":null,"distance_in_miles_from_given_coords":1.42,"reviews":{"count":2,"avg_rating":5}}},{"type":"Feature","geometry":{"type":"Point","coordinates":[-0.393835,51.597948]},"properties":{"price":1599,"fuel_type":2,"user_id":1,"recorded_time":"2022-09-21T19:38:00.000Z","user_name":"news","idstation":606,"fuel_brand":2,"fuel_brand_name":"SHELL","name":"SHELL PINNER (SHELL PINNER)","address1":"PINNER GREEN","address2":"","town":"PINNER","county":"OUTER LONDON","postcode":"HA5 2AF","google":null,"phone":null,"open_hours":null,"Monday":null,"Tuesday":null,"Wednesday":null,"Thursday":null,"Friday":null,"Saturday":null,"Sunday":null,"distance_in_miles_from_given_coords":1.89,"reviews":{"count":11,"avg_rating":3.7273}}},{"type":"Feature","geometry":{"type":"Point","coordinates":[-0.360183,51.587255]},"properties":{"price":1619,"fuel_type":2,"user_id":1,"recorded_time":"2022-09-21T05:38:00.000Z","user_name":"news","idstation":1658,"fuel_brand":3,"fuel_brand_name":"ESSO","name":"ESSO STATION ROAD (MFG TEN PIN)","address1":"STATION ROAD","address2":"NORTH HARROW","town":"HARROW","county":"OUTER LONDON","postcode":"HA2 6AE","google":null,"phone":null,"open_hours":null,"Monday":null,"Tuesday":null,"Wednesday":null,"Thursday":null,"Friday":null,"Saturday":null,"Sunday":null,"distance_in_miles_from_given_coords":0.81,"reviews":{"count":5,"avg_rating":3.8}}},{"type":"Feature","geometry":{"type":"Point","coordinates":[-0.347973,51.581133]},"properties":{"price":1619,"fuel_type":2,"user_id":1,"recorded_time":"2022-09-21T12:30:00.000Z","user_name":"news","idstation":1655,"fuel_brand":2,"fuel_brand_name":"SHELL","name":"SHELL PINNER ROAD (MFG HARROW)","address1":"PINNER ROAD","address2":"WEST HARROW","town":"HARROW","county":"MIDDLESEX","postcode":"HA1 4EU","google":null,"phone":null,"open_hours":null,"Monday":null,"Tuesday":null,"Wednesday":null,"Thursday":null,"Friday":null,"Saturday":null,"Sunday":null,"distance_in_miles_from_given_coords":1.1,"reviews":{"count":4,"avg_rating":3}}},{"type":"Feature","geometry":{"type":"Point","coordinates":[-0.337569,51.599686]},"properties":{"price":1629,"fuel_type":2,"user_id":1,"recorded_time":"2022-09-21T04:58:00.000Z","user_name":"news","idstation":1650,"fuel_brand":3,"fuel_brand_name":"ESSO","name":"ESSO HIGH STREET (MFG HIGH WEALD)","address1":"HIGH STREET","address2":"WEALDSTONE","town":"HARROW","county":"OUTER LONDON","postcode":"HA3 5EA","google":null,"phone":null,"open_hours":null,"Monday":null,"Tuesday":null,"Wednesday":null,"Thursday":null,"Friday":null,"Saturday":null,"Sunday":null,"distance_in_miles_from_given_coords":0.55,"reviews":{"count":3,"avg_rating":2.3333}}},{"type":"Feature","geometry":{"type":"Point","coordinates":[-0.341778,51.579235]},"properties":{"price":1629,"fuel_type":2,"user_id":1,"recorded_time":"2022-09-20T19:13:00.000Z","user_name":"news","idstation":1881,"fuel_brand":1,"fuel_brand_name":"BP","name":"BP HARROW (BESSBOROUGH SF CONNECT)","address1":"BESSBOROUGH ROAD","address2":"","town":"HARROW","county":"MIDDLESEX","postcode":"HA1 3BS","google":null,"phone":null,"open_hours":null,"Monday":null,"Tuesday":null,"Wednesday":null,"Thursday":null,"Friday":null,"Saturday":null,"Sunday":null,"distance_in_miles_from_given_coords":1.28,"reviews":{"count":4,"avg_rating":4}}}]},"search_id":99796546,"message":"Petrol Station Fuel List Listed by Coordinates"}

On HA, it fetches the data from the resource/link but it does not log in to display the above data. I have made sure my username and password is correct but it doesn’t work.

I have enabled the logging options in my configuration.yaml by doing this:

logger:
  default: info
  logs:
    homeassistant.components.rest: debug

and my log is huge and it also shows the html code of the resource/link before logging in. I also see the following in my log:

2022-09-28 15:24:46.052 WARNING (MainThread) [homeassistant.components.rest.sensor] REST result could not be parsed as JSON
2022-09-28 15:24:46.058 DEBUG (MainThread) [homeassistant.components.rest.sensor] Erroneous JSON: <!DOCTYPE html>

Can anyone please point out what I am doing wrong and why is that I cannot log in to my link using the rest sensor on HA. Thank you.

Anyone can help?

@drogfild is it possible if you could please take a look at the html code above. I did look at the changes you brought to the scrape custom component.

Can you please help me ?

You probably need to add the user agent. The fact Home Assistant says it is returning a document that has <!DOCTYPE html> means that it is not returning a JSON document.

But also - if I go to that link, that does indeed return a HTML document. The JSON is actually returned by:

https://app.petrolprices.com/geojson/2/0/0/0/price/2?lat=51.597033&lng=-0.349733

from what the Network tools in Chrome shows.

So all I need to do is to add the user agent like mozilla? Also do I need to change my resource ?

so I added the user-agent as mozilla/5.0 but it still doesnt like it and I get the following errors:

2022-09-29 09:11:49.620 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: 'value_json' is undefined when rendering '{{ value_json is not none and value_json.state == "running" }}'
2022-09-29 09:11:49.626 ERROR (MainThread) [homeassistant.components.rest.switch] Got non-ok response from resource: 404
2022-09-29 09:11:49.754 WARNING (MainThread) [homeassistant.components.rest.sensor] REST result could not be parsed as JSON

By my understanding the Rest sensor uses only “basic authentication”. If you open your ‘resorce’ to a private browsing window it shows you an login screen. That indicates that it’s not using basic authentication.

If you need to handle login screen, please see GitHub - danieldotnl/ha-multiscrape: Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality. and it’s capability to submit form Form submit functionality · danieldotnl/ha-multiscrape Wiki · GitHub

ok let me have a go at it and see if I can get the values I require. Thanks

so I tried the scrape sensor and here is my config:

multiscrape:
  - resource: 'https://app.petrolprices.com/geojson/2/0/0/0/price/2?lat=51.59703&lng=-0.349756'
    scan_interval: 30000
    form_submit:
      submit_once: True
      resource: 'https://app.petrolprices.com/login'
      select: ".login-box"
      input:
        username: xxx
        password: 'xxx'
        extra: field
    sensor:
      - select: 'body > pre'
        name: petrolname

the sensor named sensor.petrolname shows unavailable:

my HA log gives the following:

2022-09-29 12:41:53.776 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # Unable to scrape data: Could not find a tag for given selector.
Consider using debug logging and log_response for further investigation.

What am I doing wrong? I am pulling my hair :frowning:

these are my error log:

2022-09-29 12:48:32.310 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Error executing POST request to url: https://app.petrolprices.com/login.
 Error message:
 ReadTimeout('')
2022-09-29 12:48:32.310 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page.

2022-09-29 12:48:32.314 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Request data from https://app.petrolprices.com/map?fuelType=2&brandType=0&resultLimit=0&offset=0&sortType=price&lat=51.597033&lng=-0.349733&z=11&d=2
2022-09-29 12:48:32.314 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a get to url: https://app.petrolprices.com/map?fuelType=2&brandType=0&resultLimit=0&offset=0&sortType=price&lat=51.597033&lng=-0.349733&z=11&d=2.
2022-09-29 12:48:32.726 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2022-09-29 12:48:32.726 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Loading the content in BeautifulSoup.
2022-09-29 12:48:32.745 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Data succesfully refreshed. Sensors will now start scraping to update.
2022-09-29 12:48:32.746 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 13.859 seconds (success: True)

I cannot confirm if it is logging in via the scrape sensor.

At least your ‘select’ seems to be wrong.

select = CSS selector used for selecting the form in the html

You are using selector for the DIV, and you should be using selector for the FORM
image

Try in the form_submit:
select: "#form-login"

1 Like

thanks for that. I changed the select and it seems like something. here is my log:

2022-09-29 13:06:33.362 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2022-09-29 13:06:33.362 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Parse page with form with BeautifulSoup parser lxml
2022-09-29 13:06:33.379 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Try to find form with selector #form-login
2022-09-29 13:06:33.381 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Found the form, now finding all input fields
2022-09-29 13:06:33.381 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Form looks like this:
<form action="/login" class="ts-form" id="form-login" method="post">
<div class="form-group">
<input class="form-control" name="email" placeholder="Email address" type="text" value=""/>
</div>
<div class="form-group">
<input class="form-control" id="passwordInput" name="password" placeholder="Password" type="password" value=""/>
<span class="icon icon-eye-slash-regular" id="eyeSelector"></span>
</div>
<input name="searchedPath" type="hidden" value="/"/>
<div class="form-group">
<button class="btn btn-primary" id="account-submit" type="submit">Sign in</button>
</div>
<div class="form-group pb-0">
<a class="forgot-pass" href="/forgot">Forgotten your password?</a>
</div>
<div class="form-group pb-0">
<a class="forgot-pass" href="/signup">Not a member yet? Signup</a>
</div>
</form>
2022-09-29 13:06:33.382 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Found the following input fields: {'email': '', 'password': '', 'searchedPath': '/'}
2022-09-29 13:06:33.383 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Merged input fields with input data in config. Result: {'email': '', 'password': 'xx', 'searchedPath': '/', 'username': 'xxx', 'extra': 'field'}
2022-09-29 13:06:33.383 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Found form action /login and method post
2022-09-29 13:06:33.383 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Determined the url to submit the form to: https://app.petrolprices.com/login
2022-09-29 13:06:33.383 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Submitting the form
2022-09-29 13:06:33.383 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing form_submit-request with a post to url: https://app.petrolprices.com/login.

I believe there are some errors still as I cannot verify if its logged in through the scrape sensor. Here is my log further:

2022-09-29 13:06:43.763 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Error executing post request to url: https://app.petrolprices.com/login.
 Error message:
 ReadTimeout('')
2022-09-29 13:06:43.763 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page.

2022-09-29 13:06:43.766 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Request data from https://app.petrolprices.com/map?fuelType=2&brandType=0&resultLimit=0&offset=0&sortType=price&lat=51.597033&lng=-0.349733&z=11&d=2
2022-09-29 13:06:43.766 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a get to url: https://app.petrolprices.com/map?fuelType=2&brandType=0&resultLimit=0&offset=0&sortType=price&lat=51.597033&lng=-0.349733&z=11&d=2.
2022-09-29 13:06:43.883 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2022-09-29 13:06:43.883 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Loading the content in BeautifulSoup.
2022-09-29 13:06:43.900 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Data succesfully refreshed. Sensors will now start scraping to update.
2022-09-29 13:06:43.901 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 13.592 seconds (success: True)
2022-09-29 13:06:43.907 INFO (MainThread) [homeassistant.setup] Setup of domain multiscrape took 13.6 seconds
2022-09-29 13:06:43.911 INFO (MainThread) [homeassistant.components.sensor] Setting up sensor.multiscrape
2022-09-29 13:06:43.914 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # Setting up sensor
2022-09-29 13:06:43.915 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # Start scraping to update sensor
2022-09-29 13:06:43.918 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # petrolname # Tag selected: None
2022-09-29 13:06:43.918 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Exception occurred while scraping, will try to resubmit the form next interval.
2022-09-29 13:06:43.918 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # Unable to scrape data: Could not find a tag for given selector.
Consider using debug logging and log_response for further investigation.
2022-09-29 13:06:43.924 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # On-error, set value to None
2022-09-29 13:06:43.924 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # petrolname # Updated sensor and attributes, now adding to HA

Dude, please remove your credentials from the logs before posting. Might consider changing you password as it’s now public. :slight_smile:

1 Like

done that didnt realise :frowning:

I am getting the following in my log:

2022-09-29 14:48:49.560 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2022-09-29 14:48:49.560 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Loading the content in BeautifulSoup.
2022-09-29 14:48:49.577 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Data succesfully refreshed. Sensors will now start scraping to update.
2022-09-29 14:48:49.578 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 14.231 seconds (success: True)
2022-09-29 14:48:49.590 INFO (MainThread) [homeassistant.setup] Setting up script
2022-09-29 14:48:49.649 INFO (MainThread) [homeassistant.setup] Setup of domain multiscrape took 14.3 seconds
2022-09-29 14:48:49.658 INFO (SyncWorker_7) [homeassistant.loader] Loaded plex from homeassistant.components.plex
2022-09-29 14:48:49.667 INFO (MainThread) [homeassistant.components.sensor] Setting up sensor.multiscrape
2022-09-29 14:48:49.679 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # Setting up sensor
2022-09-29 14:48:49.701 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # Start scraping to update sensor
2022-09-29 14:48:49.703 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Exception occurred while scraping, will try to resubmit the form next interval.
2022-09-29 14:48:49.703 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # Unable to scrape data: Invalid character '=' position 6
  line 1:
div id="result-output"
      ^.
Consider using debug logging and log_response for further investigation.
2022-09-29 14:48:49.707 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # petrolname # On-error, set value to None
2022-09-29 14:48:49.707 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # petrolname # Updated sensor and attributes, now adding to HA

this is my config:

multiscrape:
  - resource: 'https://app.petrolprices.com/map?fuelType=2&brandType=0&resultLimit=0&offset=0&sortType=price&lat=51.597033&lng=-0.349733&z=11&d=2'
    scan_interval: 30000
    form_submit:
      submit_once: True
      resource: 'https://app.petrolprices.com/login'
      select: "#form-login"
      input:
        username: xx
        password: 'xx'
        extra: field
    sensor:
      - select: 'div id="result-output"'
        name: petrolname
Found the following input fields: {'email': '', 'password': '', 'searchedPath': '/'}
Merged input fields with input data in config. Result: {'email': '', 'password': 'xx', 'searchedPath': '/', 'username': 'xxx', 'extra': 'field'}

Take a look at those fields. So it found fields ‘email’ and ‘password’ from the html page. But you are filling it with ‘username’ and ‘password’. All your input-section fields must match with that login page. This doesn’t have any AI or even advanced login to know what to fill and where. You need to do that work.

Also your sensor -section select -definition is in wrong format. Please check how selectors work. Pro-tip: Browser developer tool → inspect → find what you want → right click → Copy → Selector. It’s not always perfect but many times gives you right direction.

1 Like

so when I copy from selector I get the following copied:

#result-output

Is the above my select under sensor: ??

Most probably it is. People in the forums are usually really happy to help, but please do your home work also. I already suggested that you could try to learn what selectors are. That way you would have this thing done already.

Couple of good resources:

https://www.w3schools.com/cssref/trysel.asp

so I did my homework and understand how selectors work, my conclusion is that I believe I am not using the correct “select” from the fields i.e.

 Found the following input fields: {'email': '', 'password': '', 'searchedPath': '/'}
Merged input fields with input data in config. Result: {'email': '', 'password': 'xx', 'searchedPath': '/', 'username': 'xxx', 'extra': 'field'}

yes I changed the username to email but no joy. I can conclude that the scrape sensor is not logging into the page and hence why I am getting the sensor as unavailable.

As for the sensor, I looked into the correct div.class but that does not matter as it doesn’t get through the login page so it wouldn’t really be looking at the correct div.class I am interested in.

Maybe your new config and log would help us understand what happens when it tries to log in.