Help web scraping for smart oil sensor

Hey guys, I feel like im really close to getting this to work but im missing something. Here is my code using the multiscraper because it supports logging in to a page beforehand.

multiscrape:
  - resource: 'https://app.smartoilgauge.com/app.php'
    scan_interval: 30
    form_submit:
      submit_once: True
      resource: 'https://app.smartoilgauge.com/login.php?logout_first=true'
      select: "#inputUsername"
      input:
        username: "my username here"
        password: "My password here"
        extra: field
    sensor:
      - select: '#tankSummary > div.ts_row > div.ts_col.ts_level > div.ts_col_hdr'
        name: Oil Test Name

I get an error from my log saying. " Unable to scrape data: Could not find a tag for given selector". I dont know if this is an issue with the login part and its not getting to the correct page afterwards to find the selector or its a problem with the selector itself. Any advice would be helpful. Thanks!

Please enable and post logging here.

Sorry for the delay i will post the logging here in a minute. Im going to sort through and only pull out the info for the scraper to save time

Here is the scraper stuff from the logfile: I changed my username and password for security reasons.

2023-01-06 19:04:01.551 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # New run: start (re)loading data from resource
2023-01-06 19:04:01.552 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Rendered resource template into: https://app.smartoilgauge.com/app.php
2023-01-06 19:04:01.552 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Starting with form-submit
2023-01-06 19:04:01.553 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Requesting page with form from: Login - Smart Oil Gauge™
2023-01-06 19:04:01.553 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing form_page-request with a GET to url: Login - Smart Oil Gauge™.
2023-01-06 19:04:11.083 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2023-01-06 19:04:11.084 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Parse page with form with BeautifulSoup parser lxml
2023-01-06 19:04:11.164 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Try to find form with selector #inputUsername
2023-01-06 19:04:11.168 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Form looks like this:

2023-01-06 19:04:11.169 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Finding all input fields in form
2023-01-06 19:04:11.170 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Found the following input fields: {}
2023-01-06 19:04:11.171 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Found form action None and method None
2023-01-06 19:04:11.171 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Merged input fields with input data in config. Result: {‘username’: ‘ThisIsMyUsername’, ‘password’: ‘ThisIsMyPassword’, ‘extra’: ‘field’}
2023-01-06 19:04:11.172 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Determined the url to submit the form to: Login - Smart Oil Gauge™
2023-01-06 19:04:11.172 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Submitting the form
2023-01-06 19:04:11.172 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing form_submit-request with a POST to url: Login - Smart Oil Gauge™.
2023-01-06 19:04:17.152 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2023-01-06 19:04:17.152 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Form seems to be submitted succesfully (to be sure, use log_response and check file). Now continuing to retrieve target page.
2023-01-06 19:04:17.153 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Request data from https://app.smartoilgauge.com/app.php
2023-01-06 19:04:17.153 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a get to url: https://app.smartoilgauge.com/app.php.
2023-01-06 19:04:20.988 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Error executing get request to url: https://app.smartoilgauge.com/app.php.
Error message:
RemoteProtocolError(‘Server disconnected without sending a response.’)
2023-01-06 19:04:20.988 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Updating failed with exception: Server disconnected without sending a response.
2023-01-06 19:04:21.005 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 19.454 seconds (success: True)
BUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # New run: start (re)loading data from resource
2023-01-06 19:04:23.746 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Rendered resource template into: https://app.smartoilgauge.com/app.php
2023-01-06 19:04:23.747 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Request data from https://app.smartoilgauge.com/app.php
2023-01-06 19:04:23.747 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a get to url: https://app.smartoilgauge.com/app.php.
2023-01-06 19:04:35.700 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Error executing get request to url: https://app.smartoilgauge.com/app.php.
Error message:
RemoteProtocolError(‘Server disconnected without sending a response.’)
2023-01-06 19:04:35.700 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Updating failed with exception: Server disconnected without sending a response.

And i think it loops from there.

Try .content-container for the select of the form.

Hey Daniel, Thanks for your help after changing the select form to that and ALSO changing the input name for the password to match that of the form it worked. I do have one other issue now. I can grab some of the values on the page but anything in a paragraph tag i cant. Can you help with that?

edit: I found that the content i want is being fetched after the initial site is loaded so i get back an empty tag… Is there a way to wait or delay the read from the site using multiscrape?

Happy to hear you got further! Delaying is not an option. That would require web browser capabilities (executing javascript). Instead you need to dig into your browser’s developer tools and find out which request retrieves the data you need. Then try to use that URL in multiscrape.

This should work as it loads the api data instead:

multiscrape:
  - resource: 'https://app.smartoilgauge.com/ajax/main_ajax.php'
    method: POST
    payload: 'action=get_tanks_list&tank_id=0'
    headers:
      X-Requested-With : XMLHttpRequest
      Content-Type : application/x-www-form-urlencoded
    scan_interval: 3600
    form_submit:
      submit_once: False
      resource: 'https://app.smartoilgauge.com/login.php'
      select: ".content-container"
      input:
        username: [email protected]
        user_pass: 'YourPassword'
    sensor:
      - name: smartoiltank
        unique_id: smartoiltank
        value_template: '{{ value_json.tanks[0].sensor_gallons }}'
        unit_of_measurement: "gal" 
        attributes:
          - name: tank_name
            value_template: '{{ value_json.tanks[0].tank_name }}'
          - name: last_updated_time
            value_template: '{{ value_json.tanks[0].sensor_rt }}'
          - name: last_updated_timestamp
            value_template: '{{ value_json.tanks[0].last_read }}'
          - name: battery
            value_template: '{{ value_json.tanks[0].battery }}'
1 Like

Wow you are a god! Ive been trying so hard looking for a simple answer like this. I setup a whole mqtt broker and updated through that with a python script. That was so annoying. Thank you for posting this!!! Amazing.