Pulling Text using Scrape sensor

Shaneo234 · January 30, 2020, 2:18pm

Hi all,

i have been bashing my head a little trying to figure this out but im having issues / my understanding of the docs provided for the scrape sensor. In a nutshell i am trying to pull the version number off the following page in the bottom right:

site: https://throwaway-test-site.oak.com

my sensor is currently configured per the following:

sensor:
  - platform: scrape
    resource: https://throwaway-test-site.oak.com
    name: version
    select: 'div[class="c-login-version"]'

tried all sorts for the select such as select: ".c-login-version"

am i missing something here in a value template?

any help would be appreciated and apologies if this is a very simple task, bit of a noobie to web scraping…

Shaneo234 · January 30, 2020, 3:06pm

Just noticed i did not include the site…added to main description of issue

jocnnor · January 30, 2020, 4:16pm

Close!

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

You can pick your favorite amongst the css-selectors there. But keep in mind, it will, by default, only search for top-level tags. You’ll have to specify that it’s a nested element by doing the following

select: '.s-login .u-when-large .c-login-version div'

This should follow the tree and get only the one you want (assuming there is only one on the page). I’m not 100% sure if you need to put body before .s-login or not.

value_template is not needed if you want everything inside of the tag. In this case, I’m assuming you want the entire contents of that

.

The easiest way to debug this honestly would be to use python and BeautifulSoup yourself.

Doing so, I found that the site you linked redirets to a different site. The html of the site has essentally nothing. You’ll need to use the redirected url of https://throwaway-test-site.oak.com/Account/Login#/

On windows, install python, then do pip install bs4. Then you can try to parse the html using only the select field (it’s what we get in Home Assistant).

ACTUALLY:

In doing this, I noticed the entire webpage is actually <script></script> tags…which the beautifulSoup select thing can’t seem to dive into. We need the dang script to execute to generate the html…

I might be wrong about the top level tags thing. It could just be because of all the script tags and no actual HTML.

2 options. Use AppDaemon and convert your new python app you just now wrote and tested and use that to extract the values. You can use way more features other than .select().

Or, it looks like the version id is in the script. So we’ll just get the entire script as text, and use value_template to extract the value.

select: '#RootComponentTemplate'
value_template: >-
  {% set a = value.find("c-login-version\">") %}
  {% set b = value.find("<", a) %}
    {{ value[a:b].split(">")[1] }}

This might work. Just using python string manipulations to extract the value from the dumb select output.

Man, that took way longer than I thought it would…

Shaneo234 · January 30, 2020, 7:27pm

Thank you very much @jocnnor this is exactly what i was looking for, i struggled alot with the value templates and took me a while to just figure out calculations haha! i think ill go through your suggestion and install it directly on windows and mess around with beautifulSoup.

now with the above it can take me places

thanks again!