Extract data from http website

Hello everybody !

I’ve got an idea : I wanna read on my frontend the time before my next tramway.
It exists a website and with a http request we have an access to the information. But how can I return this information on a format number ?

I explain : with form parameter I can choose the tramway’s station and the tramway’s lign, but I’ve got this :

https://mobi.filbleu.fr/horaires-et-trajet/horaires-temps-reel?view=tempsreel&id_ligne=A&id_arret=HETR-2T&ordering=1&submit_bt=recherche&user=0

So my question is : how can I return this value (in red) in my frontend with

thank you in advance !!!

You could use the scrape component. It takes a bit of work to find the right css tag to use in the select. But it has worked well for me. The link in the documentation to Beautifulsoups CSS selectors documentation is very helpful.

https://home-assistant.io/components/sensor.scrape/

It works !!! Thank you so much !!!

I would like to refresh this older topic.

I need to extract data from the web page requiring login (email and password)
Http with “request” is:
http://xxxxxxxxx.yy/en/login/login?continue=%2Fen%2Fpool%2Fgetmainvalues%3Fid%3D6666%26hasPH%3Dtrue%26hasRX%3Dfalse%26hasCL%3Dfalse%26hasCD%3Dfalse%26config%3D0%26hasHidro%3Dtrue%26hasLight%3Dtrue%26hasRelays%3Dtrue%26numRelays%3D1%252C2%252C3%26hasFiltration%3Dtrue%26hasBackwash%3Dfalse%26hasIO%3Dfalse%26hasUV%3Dfalse%26needsTimeBesgoRemaining%3Dfalse

Than a login page open and then “response”:

{"temp":"26.8\u00baC","local_time":"07:28","lightStat":{"status":{"type":"MAN","status":"OFF"}},"filtration_stat":"ON","filtration_mode":"HEATING","filtration_time_remaining":0,"PH":"6.6","PH_status":{"alarm":"","type":"ACID","hi_value":"7.3","status":0,"color":{"class":"orange","hex":"#ff8800"}}}

Data are in friendly format, but how to go through login?

It’s in the docs;

Firstly I need to pass login. I have not succeeded yet :frowning:
I tried: user/password, username/password, user_emai/password neither of them works.
This is the login page: Login

I have following config:

Error log:

2018-07-22 13:32:36 ERROR (MainThread) [homeassistant.components.sensor] scrape: Error on device update!
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/entity_platform.py", line 248, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/entity.py", line 319, in async_device_update
    yield from self.hass.async_add_job(self.update)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/sensor/scrape.py", line 120, in update
    value = raw_data.select(self._select)[0].text
IndexError: list index out of range

You probably need to specify an authentication type

Could you guide me to the right direction?
it is plain login, ie. enter user email and password

I’m not certain what’s required, just speculating, see the following element info.

User Email:
<input type="text" name="user" id="user" value="" class="form-control " maxlength="255" data-validation-engine="validate[required,minSize[0],maxSize[255],custom[email]]">

Password:
<input type="password" name="pass" id="pass" value="" autocomplete="off" class="form-control " data-validation-engine="validate[required,minSize[6],maxSize[16]]">

The site also doesn’t specify an Authentication Type:

<?xml version="1.0" encoding="utf-8"?>
<WebTestRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Url>http://vistapool.es/en/login/login/id/xxxx?continue=%2Fen%2Fpool%2Fboard%2Fid%2Fxxxx</Url>
  <HttpResult>200 OK</HttpResult>
  <RequestDate>2018-07-22T11:00:17.7465896-04:00</RequestDate>
  <AuthorizationType>None</AuthorizationType>
  <RequestHeaders>
    <string>Host: vistapool.es</string>
    <string>Cache-Control: no-store,no-cache</string>
    <string>Pragma: no-cache</string>
    <string>Connection: Keep-Alive</string>
  </RequestHeaders>
  <ResponseHeaders>
    <string>Pragma: no-cache</string>
    <string>Vary: Accept-Encoding</string>
    <string>Connection: close</string>
    <string>Transfer-Encoding: chunked</string>
    <string>Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0</string>
    <string>Content-Type: text/html; charset=UTF-8</string>
    <string>Date: Sun, 22 Jul 2018 15:00:18 GMT</string>
    <string>Expires: Thu, 19 Nov 1981 08:52:00 GMT</string>
    <string>Set-Cookie: PHPSESSID=et5s9j6i3k5f30jbmsnnvurqf7; path=/</string>
    <string>Server: Apache/2.2.15 (CentOS)</string>
    <string>X-Powered-By: PHP/5.3.3</string>
  </ResponseHeaders>
</WebTestRequest>

Did you try with:
username:
password:

I tried: user/password, username/password, user_email/password neither of them works. :frowning:

Maybe @fabaff and @DarkFox can provide some more advanced assistance.
The sensor is built to use username/password or none.

if username and password:
    if config.get(CONF_AUTHENTICATION) == HTTP_DIGEST_AUTHENTICATION:
        auth = HTTPDigestAuth(username, password)
    else:
        auth = HTTPBasicAuth(username, password)
else:
    auth = None

You’ll need to provide the username and password as parameters in the URL that the login form submits to. This is assuming the site will accept it as a GET request. If the endpoint only accepts POST requests, I’m afraid you will not be able to use the scrape component for this.

I haven’t run into this situation myself yet, so I’m not sure what the easiest way to scrape a page with a login screen is.

Hm, I tried to open the web with ceredentials embedded in url, but it does not go through. Login page popped up instead.
It looks like it is “dead road”

Hi, did you ever get this to work? I’m also interested in reading the values from Vistapool into Home Assistant…

Hi, none of Homeassistant sensors (scrape, etc.) is supported by vistapool web and vistapool itself is not talkative at all.
There is an option - use of MODBUS. But I am not skilled enough to do it :roll_eyes: (you need converter and MODBUS knowledge)
So far I use second thermometer and PH probe to display these basic values and some additional relays to control lights, filtration and countercurrent.

Thanks for your answer. I also contacted VistaPool with no response at all. What pool thermometer and pH probe are you using to incorporate the value to Home Assistant?

Thermometer DS18B20 (for example here )
pH probe (example )
Both are connected to NodeMCU (or Wemos D1 mini), with Tasmota firmware, providing MQTT
Rest is on the Homeassistant :slight_smile:

Thanks. And just in case you want to take a look, these guys seem to have been able to log into vistapool and scrub the data.

https://www.symcon.de/forum/threads/35166-Vistapool-Pool-Steuerung-über-IPS

Hm, it looks interesting, but still - there is a need to interaction with vistapool cloud.
And furthermore a bit “pricy” :nauseated_face:

I found a workaround to integrate Vistapool data into HA: