Extract data from http website

I would like to refresh this older topic.

I need to extract data from the web page requiring login (email and password)
Http with “request” is:
http://xxxxxxxxx.yy/en/login/login?continue=%2Fen%2Fpool%2Fgetmainvalues%3Fid%3D6666%26hasPH%3Dtrue%26hasRX%3Dfalse%26hasCL%3Dfalse%26hasCD%3Dfalse%26config%3D0%26hasHidro%3Dtrue%26hasLight%3Dtrue%26hasRelays%3Dtrue%26numRelays%3D1%252C2%252C3%26hasFiltration%3Dtrue%26hasBackwash%3Dfalse%26hasIO%3Dfalse%26hasUV%3Dfalse%26needsTimeBesgoRemaining%3Dfalse

Than a login page open and then “response”:

{"temp":"26.8\u00baC","local_time":"07:28","lightStat":{"status":{"type":"MAN","status":"OFF"}},"filtration_stat":"ON","filtration_mode":"HEATING","filtration_time_remaining":0,"PH":"6.6","PH_status":{"alarm":"","type":"ACID","hi_value":"7.3","status":0,"color":{"class":"orange","hex":"#ff8800"}}}

Data are in friendly format, but how to go through login?

It’s in the docs;

Firstly I need to pass login. I have not succeeded yet :frowning:
I tried: user/password, username/password, user_emai/password neither of them works.
This is the login page: Login

I have following config:

Error log:

2018-07-22 13:32:36 ERROR (MainThread) [homeassistant.components.sensor] scrape: Error on device update!
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/entity_platform.py", line 248, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/entity.py", line 319, in async_device_update
    yield from self.hass.async_add_job(self.update)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/sensor/scrape.py", line 120, in update
    value = raw_data.select(self._select)[0].text
IndexError: list index out of range

You probably need to specify an authentication type

Could you guide me to the right direction?
it is plain login, ie. enter user email and password

I’m not certain what’s required, just speculating, see the following element info.

User Email:
<input type="text" name="user" id="user" value="" class="form-control " maxlength="255" data-validation-engine="validate[required,minSize[0],maxSize[255],custom[email]]">

Password:
<input type="password" name="pass" id="pass" value="" autocomplete="off" class="form-control " data-validation-engine="validate[required,minSize[6],maxSize[16]]">

The site also doesn’t specify an Authentication Type:

<?xml version="1.0" encoding="utf-8"?>
<WebTestRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Url>http://vistapool.es/en/login/login/id/xxxx?continue=%2Fen%2Fpool%2Fboard%2Fid%2Fxxxx</Url>
  <HttpResult>200 OK</HttpResult>
  <RequestDate>2018-07-22T11:00:17.7465896-04:00</RequestDate>
  <AuthorizationType>None</AuthorizationType>
  <RequestHeaders>
    <string>Host: vistapool.es</string>
    <string>Cache-Control: no-store,no-cache</string>
    <string>Pragma: no-cache</string>
    <string>Connection: Keep-Alive</string>
  </RequestHeaders>
  <ResponseHeaders>
    <string>Pragma: no-cache</string>
    <string>Vary: Accept-Encoding</string>
    <string>Connection: close</string>
    <string>Transfer-Encoding: chunked</string>
    <string>Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0</string>
    <string>Content-Type: text/html; charset=UTF-8</string>
    <string>Date: Sun, 22 Jul 2018 15:00:18 GMT</string>
    <string>Expires: Thu, 19 Nov 1981 08:52:00 GMT</string>
    <string>Set-Cookie: PHPSESSID=et5s9j6i3k5f30jbmsnnvurqf7; path=/</string>
    <string>Server: Apache/2.2.15 (CentOS)</string>
    <string>X-Powered-By: PHP/5.3.3</string>
  </ResponseHeaders>
</WebTestRequest>

Did you try with:
username:
password:

I tried: user/password, username/password, user_email/password neither of them works. :frowning:

Maybe @fabaff and @DarkFox can provide some more advanced assistance.
The sensor is built to use username/password or none.

if username and password:
    if config.get(CONF_AUTHENTICATION) == HTTP_DIGEST_AUTHENTICATION:
        auth = HTTPDigestAuth(username, password)
    else:
        auth = HTTPBasicAuth(username, password)
else:
    auth = None

You’ll need to provide the username and password as parameters in the URL that the login form submits to. This is assuming the site will accept it as a GET request. If the endpoint only accepts POST requests, I’m afraid you will not be able to use the scrape component for this.

I haven’t run into this situation myself yet, so I’m not sure what the easiest way to scrape a page with a login screen is.

Hm, I tried to open the web with ceredentials embedded in url, but it does not go through. Login page popped up instead.
It looks like it is “dead road”

Hi, did you ever get this to work? I’m also interested in reading the values from Vistapool into Home Assistant…

Hi, none of Homeassistant sensors (scrape, etc.) is supported by vistapool web and vistapool itself is not talkative at all.
There is an option - use of MODBUS. But I am not skilled enough to do it :roll_eyes: (you need converter and MODBUS knowledge)
So far I use second thermometer and PH probe to display these basic values and some additional relays to control lights, filtration and countercurrent.

Thanks for your answer. I also contacted VistaPool with no response at all. What pool thermometer and pH probe are you using to incorporate the value to Home Assistant?

Thermometer DS18B20 (for example here )
pH probe (example )
Both are connected to NodeMCU (or Wemos D1 mini), with Tasmota firmware, providing MQTT
Rest is on the Homeassistant :slight_smile:

Thanks. And just in case you want to take a look, these guys seem to have been able to log into vistapool and scrub the data.

https://www.symcon.de/forum/threads/35166-Vistapool-Pool-Steuerung-über-IPS

Hm, it looks interesting, but still - there is a need to interaction with vistapool cloud.
And furthermore a bit “pricy” :nauseated_face:

I found a workaround to integrate Vistapool data into HA:

Do you run the script directly from Hassio?
Is there a way to do it?

It cannot be run directly from Hassio because it’s PHP, not Python. I wish I knew how to port it to Python.

I am not sure, but it seems that here is described a way how to do it.

To read the whole package of available data in json format is following part of the script:

function Vistapool_ReadData()
{
Global $poolID;
Global $password;
Global $username;
Global $ServerOS;
Global $WebsiteURL;

// Try to log in to the website with Cookie
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded')); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
if (strtolower($ServerOS) == "windows") 
{ 
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'C:\Windows\Temp\vistapool_cookie'); 
} 
else 
{ 
    curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/vistapool_cookie'); 
} 
curl_setopt($ch, CURLOPT_POST, 0); 
curl_setopt($ch, CURLOPT_POSTFIELDS, ""); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_URL, $WebsiteURL.'/de/pool/list'); 
$curlData = curl_exec($ch); 
 
 
// Check if the login with cookie was successful, if not, then a normal login will be done
preg_match("|(Passwort wiederherstellen)|", $curlData, $LoginFALSE); 
if ($LoginFALSE) 
{ 
    echo "Anmeldung mit Cookie war nicht erfolgreich - normaler Login wird durchgeführt!".PHP_EOL; 
     
    $fields = array( 
        'user'=>urlencode($username), 
        'pass'=>urlencode($password), 
        'remember_password'=>'0', 
        'entrar'=>'Eingabe', 
    ); 
      
    $fields_string = ''; 
    foreach($fields as $key=>$value) {  
        $fields_string .= $key.'='.$value.'&';  
    } 
      
    $fields_string = substr($fields_string,0,-1); 
    $postfields = $fields_string; 
 
    ini_set("max_execution_time", 60);     
    curl_setopt($ch, CURLOPT_HEADER, 0); 
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded')); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
    if (strtolower($ServerOS) == "windows") 
    { 
        curl_setopt($ch, CURLOPT_COOKIEFILE, 'C:\Windows\Temp\vistapool_cookie'); 
        curl_setopt($ch, CURLOPT_COOKIEJAR, 'C:\Windows\Temp\vistapool_cookie'); 
    } 
    else 
    { 
        curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/vistapool_cookie'); 
        curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/vistapool_cookie'); 
    } 
    curl_setopt($ch, CURLOPT_COOKIE, session_name().'='.session_id()); 
    curl_setopt($ch, CURLOPT_COOKIESESSION, true); 
    curl_setopt($ch, CURLOPT_URL, $WebsiteURL.'/de/login/login'); 
    curl_setopt($ch, CURLOPT_POST, 1); 
    curl_setopt($ch, CURLOPT_POSTFIELDS, "$postfields"); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    $curlData = curl_exec($ch); 
} 
 
 
// Querying the pool data 
curl_setopt($ch, CURLOPT_POST, 0); 
curl_setopt($ch, CURLOPT_POSTFIELDS, ""); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_URL, $WebsiteURL.'/de/pool/getmainvalues?id='.$poolID.'&hasPH=true&hasRX=true&hasCL=true&hasCD=true&config=1&hasHidro=true&hasLight=true&hasRelays=true&numRelays=1%2C2%2C3%2C4&hasFiltration=true&hasBackwash=true&hasIO=true&hasUV=true&needsTimeBesgoRemaining=false'); 
$curlData = curl_exec($ch); 
 
 
// Output of read data 
if ($curlData == '"not connected"') 
{ 
    // Pool control is offline - data can not be read 
    echo "POOLSTEUERUNG IST NICHT VERBUNDEN!".PHP_EOL; 
    IPS_LogMessage("VISTAPOOL", "POOLSTEUERUNG IST NICHT VERBUNDEN!"); 
} 
else 
{ 
    // Pool control is online - data can be read 
    $PoolData_Array = json_decode($curlData, true); 
    return $PoolData_Array;         
} 
 
 
// cURL session ending
curl_close($ch); 

Technically it is going through login and open following page:

http://vistapool.es/en/pool/getmainvalues?id=poolID

where the content of the web above is:

{“temp”:“29.7\u00baC”,“local_time”:“17:49”,“lightStat”:{“status”:{“type”:“MAN”,“status”:“OFF”}},“filtration_stat”:“OFF”,“filtration_mode”:“HEATING”,“filtration_time_remaining”:0,“PH”:“7.0”,“PH_status”:{“alarm”:"",“type”:“ACID”,“hi_value”:“7.3”,“status”:0,“color”:{“class”:“grey”,“hex”:"#dddddd"}},“RX”:0,“RX1”:700,“RXColor”:{“class”:“grey”,“hex”:"#dddddd"},“RX_status”:{“status”:"",“current”:"",“hidro”:""},“CL”:“0.00”,“CL1”:“1.00”,“CLColor”:{“class”:“grey”,“hex”:"#dddddd"},“CL_status”:{“hidro”:“FL2”,“status”:"",“current”:""},“CD”:0,“CD1”:5000,“CDColor”:{“class”:“grey”,“hex”:"#dddddd"},“CD_status”:“OFF”}

If anybody is able to perform it directly from HASSIO - it would be done :slight_smile: