Scrape sensor improved - scraping multiple values

Seems simple enough, but I’m having no luck.
I’ve used console to copy the selector path.
Aaaand nothing.
The page: Park City Weather | Park City Mountain Resort

The first bit of data I’m trying to grab is the 24hr snow fall, so console gave me this:
#snow_report_1 > div.snow_report__content.row > ul > li:nth-child(2) > div > h5
It seems to make sense, but doesn’t work.

Any ideas?

I’m trying to scrape a temperature measurement from a website - measurements are added every hour to a string - so far I can retrieve the entire string with measurements after ‘var query_temp’ - but I’m not experienced enough with this to obtain the last measurement (these are always in the positions -5 to -1 from the end of the string - indicated in the figure below). Could anyone point me in the right direction?

In future, please help us by posting relevant data as text: I’ve had to re-type all this for testing.

value_template: >
  {{ value|regex_findall("\s(\-?[0-9\.]*)\s")|last }}

regex_findall is returning a list of all numbers that are surrounded by whitespace:

  • \s — whitespace character before
  • ( — start remembering
  • \-? — optional minus sign
  • [0-9\.]* — any sequence of digits and points
  • ) — stop remembering
  • \s — whitespace character after
2 Likes

Thank you! Still I’m having trouble → The data are here:

<script type="text/javascript">
      
      var query_labels = " 01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 ";
      var query_temp = " 44.8  44.9  44.9  44.8  44.7  44.5  49.8  60.8  60.6  60.4  60.2  59.9  59.6  59.3  58.4 ";
      var query_elec = " 279.45808708333334  400.80427425  0.0  0.0  0.0  0.0  2158.4836078611106  29.757760666666666  251.44402029444447  0.0  0.0  0.0  353.20271759999997  0.0  552.133760033333 ";
      var query_heat = " 946.9248917628065  1501.3182562778238  0.0  0.0  0.0  0.0  2366.505316186263  29.757760666666666  808.0376479104308  0.0  0.0  0.0  1304.7204898726695  0.0  2045.6582218857018 "
      var total_heat = "9.0";
      var total_electricity = "4.0";
      var month = "";

And I have used the following code:

value_template: "{{(value.split('var')[2])| replace('query_temp = \"', '')| replace('\";','')| regex_findall("\s(\-?[0-9\.]*)\s")| last| float}}"

But I’m getting a new error if I include this line:

Error loading /config/configuration.yaml: while parsing a block mapping
  in "/config/configuration.yaml", line 795, column 9
expected <block end>, but found '<scalar>'
  in "/config/configuration.yaml", line 798, column 119```

Nested quotes (" within "). Use this instead:

value_template: >
  {{ value|regex_findall("query_temp[^;]*")|first|regex_findall("\s(\-?[0-9\.]+)[\"\s]")|last }}

EDIT: updated again: this version looks for query_temp rather than assuming it’s going to be on the third line.

First section returns the first instance of query_temp up to the next semicolon; second section pulls out the final number from it.

1 Like

That did it! Thanks so much!

1 Like

Hi can someone help me. I’m trying to get that value from a website.

thanks in advance!

Hi. I’d like to scrape a status indicator. The problem is that the element has no data in it, but rather the only thing that changes is the colour defined in the style attribute

<div _ngcontent-pdh-c159="" style="display: flex; margin-left: auto;" title="AVAILABLE"><span _ngcontent-pdh-c159="" class="charger-status-dot" style="background-color: rgb(93, 199, 22); height: 34px; width: 34px;"></span></div>

Alternatively there is another element which has the title “AVAILABLE”

Is it possible to create a binary sensor which would be true if the colour/title match the above? I guess this is called extracting tag attributes?

We’d need the URL or the full HTML (pastebin?), and confirmation that the data you’re after is in the HTML as originally downloaded (View Source rather than F12 DevTools).

Could be as simple as select: div.buy-value.

If that colour definition is in the original HTML as fetched (i.e. not dynamically loaded afterwards), RESTful binary sensor. If that colour isn’t used anywhere else in the document, and the page length isn’t too great:

binary_sensor:
  - platform: rest
    resource: URL
    value_template: "{{ 'rgb(93, 199, 22);' in value }}"

Lots of "if"s there, but without a URL or the HTML to go off, I have to make assumptions.

I’ve put the page_soup.txt generated by multiscrape here:

https://pastebin.com/4EuREMjh

Since posting I’ve realised that multiscrape has an attribute key which should be able to return the tag attributes but somehow it does not work for this particular element. I’m experimenting with something like:

    - name: O-Life Home Charger status
      unique_id: o_life_home_charger_status
      select: ".spot-list-item div:nth-child(1) div div .charger-status-dot"
      attribute: "class"
      value_template: "{{value}}"

which I believe should return “charger-status-dot”, but it fails. It seems to work fine with other selectors that I am already getting from this page. Is it because the div is actually empty?

Can anyone help me. I keep getting the

Unable to scrape data: Could not find a tag for given selector Consider using debug logging and log_response for further investigation.

I’m just trying to extract a simple weather description from:

multiscrape:
  - name: DMI Weather
    resource: "https://www.dmi.dk/lokation/show/DK/2624652/Aarhus/"
    scan_interval: 3600
    sensor:
      - unique_id: dmi_weather_text
        name: DMI Weather Text
        select: "#textForecast > div:nth-child(2) > div > div.weather-forecast"

I have tried adding:

logger:
  default: info
  logs:
    custom_components.multiscrape: debug

But I keep getting the

Consider using debug logging and log_response

Just checking if you ever made any further progress with Flo-Gas. Just had my telemetry unit fitted.

Cheers.

No unfortunately, I didn’t manage to get multi scrape going with it :pensive:

I am not certain if this is the right location for this, but does anyone have any tips to creating a scrap sensor that would contain the statement from this alert>

I have tried a few things but honestly don’t understand the “selector” portion of his to try and get what I would like.
Any tips or help is appreciated.

I am trying to get the ISS pass value.
The original API stopped working, and I am trying to find another source.

Found one source on https://www.heavens-above.com/

I am trying to get the information in the below table:
image

Following the thread above, I am just working on retrieving one piece of data, then I will expand.
When I inspect, I get the following selector:

#aspnetForm > table > tbody > tr:nth-child(3) > td:nth-child(1) > table.standardTable > tbody > tr:nth-child(1) > td:nth-child(1) > a

In my code, I have the following:

#ISS Visible Path
- name: ISS Visible Passes
  resource: !secret iss_pass_api
  scan_interval: 7200
  sensor:

    - unique_id: iss_first_pass
      name: ISS First Pass
      select: "#aspnetForm > table > tbody > tr:nth-child(3) > td:nth-child(1) > table.standardTable > tbody > tr:nth-child(1) > td:nth-child(1) > a"

I always get unavailable as the entity.
Where am I going wrong?
All I need is to retrieve date and highest point time into a time stamp so I can create a reminder in HA

Try removing tbody from the selector (see multiscrape wiki).

still unavailable.
Just to verify, I don’t need a restart. I am using the multiscrape loading under YAML configuration reloading, not a full restart. Should work to update, right?

EDIT:
This is the HTML if it helps:

What a horrible website design. What ever happened to semantic HTML?

Anyway, this will get you “08 Mar” in a simple scrape sensor:

image

Is it possible to add a delay value before start to scrap? I have a page Parcel Locker LUB91M Jarzębinowa 2 Lublin | InPost
that loads some parts of page after a while…

can you help me translate this into the YAML code please?