Scrape sensor improved - scraping multiple values

Hello,

I am running into a similar situation and believe I need to invoke a rest sensor (brand new to me). The right-click and “View Page Source” provides this: https://pastebin.com/iVEVcqfZ

When I press F12 and go to the network tab, I am not sure what file I need to use to get the javascript URL for the rest sensor. Here are the results under the “Network” tab when I reload the page, any help would be appreciated - thank you!

12D0EXN41br.js?_nc_x=Ij3Wp8lg5Kz
14.cache.js
4.cache.js
54.cache.js
api.js
AuthService
B9B219D8501B6CF3D9BA2603E04CE480.cache.js
BillingService
BootstrapService
button.e7f9415a2e000feaab02c86dd5802747.js
close.gif
collect?v=2&tid=G-QZN5THYDN6&gtm=45je38u0&_p=41996….coop%2Fprovider&dt=Loading%20Application...&_s=1
collect?v=2&tid=G-XKLPC78ZDJ&gtm=45je38u0&_p=41996….coop%2Fprovider&dt=Loading%20Application...&_s=1
common.js
consumer.nocache.js
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/svg+xml,…
embeds?l=%7B%22widget_origin%22%3A%22https%3A%2F%2…ssion_id=9fe6c274a1071d779cf3ca20d4df24baded01412
facebook.png
FEppCFCt76d.png
fontawesome-webfont.woff2?v=4.7.0
gen_204?csp_test=true
google-maps-api-script
green-DownloadText128.png
green_button_small.png
GWT.rpc
GWT.rpc
GWT.rpc
help.png
js
js?id=G-QZN5THYDN6&l=dataLayer&cx=c
js?id=G-XKLPC78ZDJ&l=dataLayer&cx=c
like.php?href=https://mvec.smarthub.coop/UsageAnal…alse&action=like&colorscheme=light&font&height=21
loading.gif
log.js
log?hasfast=true
MessengerServiceV4
MessengerServiceV4
MessengerServiceV4
MiscellaneousReceivableService
mvec.smarthub.coop
NewServiceConnectService
NewServiceConnectService
PaymentService
postMessage.js
PreferencesRPCService
ProviderBillingService
ProviderService
RateDataService
ReadingsService
ReadingsService
recaptcha__en.js
resources?resource=2021/MVEC_Logo-FullColor_Web%20_%20500.jpg
SecuredSettingsService
settings?session_id=9fe6c274a1071d779cf3ca20d4df24baded01412
simplePagerFastForward.png
siteName
titlegradient.png
tweet_button.2b2d73daf636805223fb11d48f3e94f7.en.html
twitter.png
UserProfileService
UserProfileService
util.js
VersionService
VersionService
WeatherDataService
widget_iframe.2b2d73daf636805223fb11d48f3e94f7.htmlorigin=https%3A%2F%2Fmvec.smarthub.coop
widgets.j

Hi, perhaps someone can give me a hint. I wan to scrape a timescedule from “Belegungsplan - ESC Geretsried”.
The page shows the actual week. Scraping the actual week works for me. With adding for example “?2023-W38” in the browser, it shows the scedule for this explict week.
When I put the url “Belegungsplan - ESC Geretsried” as resource, multiscrape returns only the scedule of the actual week.
Please give me an advice. Thank you

Hi All,

I am trying to fetch data from my heatpump @ mydewarmte.nl,

however after the form submit I receive a 403:Forbidden response. I already added a Agent to the Header, but the problem is with a CSRF token.

My configuration:

  - resource: https://mydewarmte.com/status
    log_response: true
    scan_interval: 300
    headers:
      User-Agent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36'
    form_submit:
      #submit_once: True
      resource: https://mydewarmte.com/
      select: "body > div > div.wrapper > div.form-block > form"
      input:
        username: ####
        password: ####
    sensor:
      - select: "#supply_temp"
        name: scrapemydewarmte-water_supplytemp

My log:

2023-09-18 13:53:15.597 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_1 # response_body written to file: form_submit_response_body.txt
2023-09-18 13:53:15.598 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_1 # Error executing post request to url: https://mydewarmte.com/.
 Error message:
 HTTPStatusError("Client error '403 Forbidden' for url 'https://mydewarmte.com/'\nFor more information check: https://httpstatuses.com/403")

with the error description from the “form_submit_response_body_error.txt”:

<div id="summary">
  <h1>Forbidden <span>(403)</span></h1>
  <p>CSRF verification failed. Request aborted.</p>

  <p>You are seeing this message because this HTTPS site requires a “Referer header” to be sent by your web browser, but none was sent. This header is required for security reasons, to ensure that your browser is not being hijacked by third parties.</p>
  <p>If you have configured your browser to disable “Referer” headers, please re-enable them, at least for this site, or for HTTPS connections, or for “same-origin” requests.</p>
  <p>If you are using the &lt;meta name=&quot;referrer&quot; content=&quot;no-referrer&quot;&gt; tag or including the “Referrer-Policy: no-referrer” header, please remove them. The CSRF protection requires the “Referer” header to do strict referer checking. If you’re concerned about privacy, use alternatives like &lt;a rel=&quot;noreferrer&quot; …&gt; for links to third-party sites.</p>


</div>

When I record a session in the browser, I see a CSRF token is supplied in the Set-Cookie which is given back in the Payload of the form submit POST. Can I somehow manage to do this in the multiscrape configuration?

to awnser my own question, it seems that ha_multiscrape did include the CSRF token in the form_submit, however the problem was the website needed a ‘referer’, ‘origin’ and ‘host’ to give a 200 return, which I included in the headers…

Now I can receive the correct values, for other people interested, see code below for the final setup:

multiscrape:
  - resource: https://mydewarmte.com/status
    scan_interval: 300
    headers:
      User-Agent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36'   
      referer: 'https://www.mydewarmte.com/'
      origin: 'https://www.mydewarmte.com'
      host: 'www.mydewarmte.com'
    form_submit:
      submit_once: True
      resource: https://mydewarmte.com/
      select: "body > div > div.wrapper > div.form-block > form"
      input:
        username: PASTE_YOUR_EMAIL
        password: PASTE_YOUR_PASSWORD
    sensor:
      - name: PompAO Water Flow
        unique_id: pompaowaterflow
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[0])|replace('\r\n      var WaterFlow = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "l/min"
      - name: PompAO Supply Temperature
        unique_id: pompao_supplytemp
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[1])|replace('\r\n      var SupplyTemp = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "°C"
      - name: PompAO Outside Temperature
        unique_id: pompao_outsidetemp
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[2])|replace('\r\n      var OutSideTemp = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "°C"
      - name: PompAO Heat Input
        unique_id: pompao_heatinput
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[3])|replace('\r\n      var HeatInput = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "kW"
      - name: PompAO Return Temperature
        unique_id: pompao_returntemp
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[4])|replace('\r\n      var ReturnTemp = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "°C"
      - name: PompAO Electrical Consumption
        unique_id: pompao_electricalcons
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[5])|replace('\r\n      var ElecConsump = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "kW"
      - name: PompAO PompAo Status
        unique_id: pompao_on_off_status
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[6])|replace('\r\n      var PompAoOnOff = ', '')|replace('\"','')|bool }}"
      - name: PompAO Heat Output
        unique_id: pompao_heatoutput
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[7])|replace('\r\n      var HeatOutPut = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "kW"
      - name: PompAO Boiler Status
        unique_id: pompao_boiler_on_off_status
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[8])|replace('\r\n      var BoilerOnOff = ', '')|replace('\"','')|bool }}"
      - name: PompAO Thermostat Status
        unique_id: pompao_thermostat_status
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[9])|replace('\r\n      var ThermostatOnOff = ', '')|replace('\"','')|bool }}"

1 Like

Hi
I am trying to get the timetable from my daughters school.
I got it wo work, but the login drives me crazy.
when i try to login in browser i get the link to the login with a uuid or something. when i put this in the ressource it works. but i dont want to do that manually each day.
so i tried to find out how to do this.
now i saw the link is in the “form_page_response_body.txt”
so can i use a link out of this txt file for the ressource for the form?
it looks like this:
<link rel="canonical" href="xxx" /><title>

Could you create a github issue for this? I’ll be happy to look into this with you, once I return from vacation.

This is golden, thanks @dutchace

Hi, my scraper has suddenly stopped working, I set the logs to debug and enabled log_response (both are empty), and the debug shows an SSL error:

2023-09-27 15:01:28.596 INFO (MainThread) [custom_components.multiscrape] Multiscrape triggered by service: <ServiceCall multiscrape.trigger_acl_fuel_scraper (c:01HBBBNTMKP5WBFTPCNZ77JZC2)>
2023-09-27 15:01:28.596 DEBUG (MainThread) [custom_components.multiscrape.coordinator] ACL Fuel Scraper # New run: start (re)loading data from resource
2023-09-27 15:01:28.596 DEBUG (MainThread) [custom_components.multiscrape.coordinator] ACL Fuel Scraper # Deleting logging files from previous run
2023-09-27 15:01:28.598 DEBUG (MainThread) [custom_components.multiscrape.coordinator] ACL Fuel Scraper # Rendered resource template into: https://www.acl.lu/en-us/mobilite-et-tourisme/service-tourisme/prix-des-carburants
2023-09-27 15:01:28.598 DEBUG (MainThread) [custom_components.multiscrape.coordinator] ACL Fuel Scraper # Request data from https://www.acl.lu/en-us/mobilite-et-tourisme/service-tourisme/prix-des-carburants
2023-09-27 15:01:28.598 DEBUG (MainThread) [custom_components.multiscrape.http] ACL Fuel Scraper # Executing page-request with a get to url: https://www.acl.lu/en-us/mobilite-et-tourisme/service-tourisme/prix-des-carburants.
2023-09-27 15:01:28.600 DEBUG (MainThread) [custom_components.multiscrape.http] ACL Fuel Scraper # request_headers written to file: page_request_headers.txt
2023-09-27 15:01:28.602 DEBUG (MainThread) [custom_components.multiscrape.http] ACL Fuel Scraper # request_body written to file: page_request_body.txt
2023-09-27 15:01:28.672 DEBUG (MainThread) [custom_components.multiscrape.http] ACL Fuel Scraper # Error executing get request to url: https://www.acl.lu/en-us/mobilite-et-tourisme/service-tourisme/prix-des-carburants.
 Error message:
 SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)')
2023-09-27 15:01:28.672 DEBUG (MainThread) [custom_components.multiscrape.http] ACL Fuel Scraper # Unable to write headers and body to files during handling of exception.
 Error message:
 AttributeError("'NoneType' object has no attribute 'headers'")
2023-09-27 15:01:28.672 ERROR (MainThread) [custom_components.multiscrape.coordinator] ACL Fuel Scraper # Updating failed with exception: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 0.077 seconds (success: True)
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # 95 Fuel Price # Start scraping to update sensor
2023-09-27 15:01:28.673 ERROR (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # 95 Fuel Price # Unable to scrape data: Skipped scraping because data couldn't be updated 
Consider using debug logging and log_response for further investigation.
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # 95 Fuel Price # On-error, set value to None
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # 95 Fuel Price # Icon template rendered and set to: mdi:gas-station
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # 95 Fuel Price # Sensor updated and state written to HA
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # 98 Fuel Price # Start scraping to update sensor
2023-09-27 15:01:28.673 ERROR (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # 98 Fuel Price # Unable to scrape data: Skipped scraping because data couldn't be updated 
Consider using debug logging and log_response for further investigation.
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # 98 Fuel Price # On-error, set value to None
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # 98 Fuel Price # Icon template rendered and set to: mdi:gas-station
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # 98 Fuel Price # Sensor updated and state written to HA
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # Diesel Fuel Price # Start scraping to update sensor
2023-09-27 15:01:28.673 ERROR (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # Diesel Fuel Price # Unable to scrape data: Skipped scraping because data couldn't be updated 
Consider using debug logging and log_response for further investigation.
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # Diesel Fuel Price # On-error, set value to None
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # Diesel Fuel Price # Icon template rendered and set to: mdi:gas-station
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # Diesel Fuel Price # Sensor updated and state written to HA
2023-09-27 15:01:28.673 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # Last Updated # Start scraping to update sensor
2023-09-27 15:01:28.673 ERROR (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # Last Updated # Unable to scrape data: Skipped scraping because data couldn't be updated 
Consider using debug logging and log_response for further investigation.
2023-09-27 15:01:28.674 DEBUG (MainThread) [custom_components.multiscrape.sensor] ACL Fuel Scraper # Last Updated # On-error, set value to None
2023-09-27 15:01:28.674 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # Last Updated # Icon template rendered and set to: mdi:calendar
2023-09-27 15:01:28.674 DEBUG (MainThread) [custom_components.multiscrape.entity] ACL Fuel Scraper # Last Updated # Sensor updated and state written to HA

I SSH’d into my HA VM and run “curl -v” on the URL I connect to and it connects and displays the page just fine. Any ideas?

multiscrape:
  - name: ACL Fuel Scraper
    resource: https://www.acl.lu/en-us/mobilite-et-tourisme/service-tourisme/prix-des-carburants
    scan_interval: 900
    log_response: true
    sensor:
      - unique_id: acl_95_fuel_price
        name: 95 Fuel Price
        icon: mdi:gas-station
        select: "#form > div.main-wrapper > div.section-pt.section-pb-70.noPaddingBottom > div > div > div > table:nth-child(9) > tr:nth-child(2) > td:nth-child(4)"
        unit_of_measurement: '€'
      - unique_id: acl_98_fuel_price
        name: 98 Fuel Price
        icon: mdi:gas-station
        select: "#form > div.main-wrapper > div.section-pt.section-pb-70.noPaddingBottom > div > div > div > table:nth-child(9) > tr:nth-child(2) > td:nth-child(5)"
        unit_of_measurement: '€'
      - unique_id: acl_diesel_fuel_price
        name: Diesel Fuel Price
        icon: mdi:gas-station
        select: "#form > div.main-wrapper > div.section-pt.section-pb-70.noPaddingBottom > div > div > div > table:nth-child(9) > tr:nth-child(2) > td:nth-child(6)"
        unit_of_measurement: '€'
      - unique_id: acl_last_updated
        name: Last Updated
        icon: mdi:calendar
        select: "#form > div.main-wrapper > div.section-pt.section-pb-70.noPaddingBottom > div > div > div > table:nth-child(9) > tr:nth-child(2) > td:nth-child(7)"

Home Assistant 2023.9.3
Supervisor 2023.09.2
Operating System 10.5
Frontend 20230911.0 - latest
HACS v1.33
HA Multiscrape v6.7.3

I have following html and would like to read all td’s title attribute. I can be one sensor which can represent all titles or multiple sensors for first 5 td with title attribute. I have tried with select as tbody > tr > td[class^=‘event’] but it gives only first one. Index parameter is not possible to declare. Please help

<div class="datatable">
<table>
<tbody>
<tr>
<td></<td>
<td class="event" title="TTT"></td>
<td>ll</td>
</tr>
<tr>
<td></<td>
<td ></td>
<td class="event" title="bbb">ll</td>
</tr>
<tr>
<td class="event" title="bbb"></<td>
<td ></td>
<td class="event" title="bbb">ll</td>
</tr>
</tbody>
</table>
</div>

Anyone please suggest on this

Hi, did you get it working? Want to get values of elvaco also.

How can I scrape the “diesel” price from a google search ?

https://www.google.com/search?q=dats+lovendegem&sca_esv=579558902&sxsrf=AM9HkKljgSqSkSZ2JYEND259J3bQdHSOaQ%3A1699164186045&source=hp&ei=GjBHZdQwqr-L6A-_6bawBQ&iflsig=AO6bgOgAAAAAZUc-KiSHiUGDKbU8MfcVHQjZ7XnnZzi0&gs_ssp=eJzj4tFP1zc0TI-vyC03SDZgtFIxqDAxTzY2S0s2sjAytjRONrS0MqgwNk20SDVLNDA0SLVISjP04k9JLClWyMkvS81LSU1PzQUAo_QUgw&oq=dats+&gs_lp=Egdnd3Mtd2l6IgVkYXRzICoCCAAyDRAuGK8BGMcBGIoFGCcyERAuGIAEGLEDGIMBGMcBGNEDMgcQABiKBRhDMgsQLhivARjHARiABDILEC4YgAQYxwEYrwEyBRAAGIAEMgsQLhiABBjHARivATILEC4YgAQYxwEYrwEyCxAuGIAEGMcBGK8BMgsQLhiABBjHARivAUjQFVAAWPcFcAB4AJABAJgBfaABmQSqAQMxLjS4AQHIAQD4AQHCAgcQIxiKBRgnwgINEC4YigUYxwEY0QMYQ8ICDRAuGIoFGMcBGK8BGEPCAg4QLhiABBixAxjHARjRA8ICCxAuGIAEGLEDGIMBwgITEC4YgwEYxwEYsQMY0QMYigUYQ8ICCxAAGIoFGLEDGIMBwgIFEC4YgATCAggQABiABBixAw&sclient=gws-wizhttps://www.google.com/search?q=dats+lovendegem&sca_esv=579558902&sxsrf=AM9HkKljgSqSkSZ2JYEND259J3bQdHSOaQ%3A1699164186045&source=hp&ei=GjBHZdQwqr-L6A-_6bawBQ&iflsig=AO6bgOgAAAAAZUc-KiSHiUGDKbU8MfcVHQjZ7XnnZzi0&gs_ssp=eJzj4tFP1zc0TI-vyC03SDZgtFIxqDAxTzY2S0s2sjAytjRONrS0MqgwNk20SDVLNDA0SLVISjP04k9JLClWyMkvS81LSU1PzQUAo_QUgw&oq=dats+&gs_lp=Egdnd3Mtd2l6IgVkYXRzICoCCAAyDRAuGK8BGMcBGIoFGCcyERAuGIAEGLEDGIMBGMcBGNEDMgcQABiKBRhDMgsQLhivARjHARiABDILEC4YgAQYxwEYrwEyBRAAGIAEMgsQLhiABBjHARivATILEC4YgAQYxwEYrwEyCxAuGIAEGMcBGK8BMgsQLhiABBjHARivAUjQFVAAWPcFcAB4AJABAJgBfaABmQSqAQMxLjS4AQHIAQD4AQHCAgcQIxiKBRgnwgINEC4YigUYxwEY0QMYQ8ICDRAuGIoFGMcBGK8BGEPCAg4QLhiABBixAxjHARjRA8ICCxAuGIAEGLEDGIMBwgITEC4YgwEYxwEYsQMY0QMYigUYQ8ICCxAAGIoFGLEDGIMBwgIFEC4YgATCAggQABiABBixAw&sclient=gws-wiz

hey @xbmcnut – i’ve been following along with this and I’ve got as far as having the comma separated display, image
I’ve tried the {{ states("sensor.next").split(",")[0] }}, {{ states("sensor.next").split(",")[1] }}, {{ states("sensor.next").split(",")[2] }} however it just creates an error. Any chance of pasting the whole code block?

image
The code in behind that is:

type: custom:button-card
entity: sensor.mybuttoncardsensor
layout: icon_name_state2nd
show_label: true
label: |
  [[[
    var days_to = entity.state.split("|")[1]
    if (days_to == 0)
    { return "Today" }
    else if (days_to == 1)
    { return "Tomorrow" }
    else
    { return "in " + days_to + " days" }
  ]]]
show_name: true
name: |
  [[[
    return entity.state.split("|")[0]
  ]]]
state:
  - color: red
    operator: template
    value: '[[[ return entity.state.split("|")[1] == 0 ]]]'
  - color: orange
    operator: template
    value: '[[[ return entity.state.split("|")[1] == 1 ]]]'
  - value: default

Let me know if you want the code behind these buttons
Rubbish_Bins

1 Like

Hi, maybe someone here can help mi with this issue.
I’m scrapping data from this website https://www.wunderground.com/dashboard/pws/IHAUTU4 and that’s fine, but the problem is the data comes in imperial units, and I want them in metric.
how can I make the tool to first click in the settings button on the top right and then select °C before scrapping the data?
I’ve been messing around with the form submit, but unfortunately to no success.
thanks.

Don’t try to do that, unless the metric page has a different URL you can scrape.

Much easier to pull them in the default imperial and convert in the value_template of the sensor definition.

For example, if your select gets the temperature in °F, use this to convert to °C:

value_template: "{{ (value|float(0.0) - 32) * (5/9) }}"
1 Like

THAAAAANKS! that totally worked.
I tried that before but I guess I was missing the float(0.0) part.
just one more thing. do you happen to know how to round it to one decimal? becasue
"{{ (value|float(0.0) - 32) * (5/9) | round(1) }}"
is not it. heh.

EDIT: nevermind. it decided to round it at the second restart I guess. Thanks again.
EDIT2: nevermind that. it’s still not rounding. it just happened that the last result I saw was a round number.
much appreciated if someone can tell me how to round lt. lol.

{{ ((value|float(0.0) - 32) * (5/9)) | round(1) }}

It was just rounding 5/9. Brackets needed.

1 Like

I want from gulf hansbeke - Google Search the “'diesel” price. How can I do this ?

Hey guys :wave:,

due to the massive snow fall in the Alps, i thought of adding some ski resort info. :snowboarder:
My config looks like this:

multiscrape:
  - name: "OK Bergbahnen"
    resource: "https://www.ok-bergbahnen.com"
    log_response: true
    scan_interval: 3600
    sensor:
      - name: "Oberstdorf Snow height"
        select: "a.jahreszeit-winter > span:nth-child(1) > span > span.font-bold"
        unit_of_measurement: "cm"
      - name: "Oberstdorf open lifts"
        select: "a.jahreszeit-winter > span:nth-child(2) > span > span.font-bold"
      - name: "Oberstdorf open slopes"
        select: "a.jahreszeit-winter > span:nth-child(3) > span > span.font-bold"
        unit_of_measurement: "km"
      - name: "Oberstdorf open parking lots"
        select: "a.jahreszeit-winter > span:nth-child(4) > span > span.font-bold"

But with this config, I’m running into the following errors currently:

[custom_components.multiscrape.sensor] OK Bergbahnen # Oberstdorf Snow height # Unable to scrape data: Skipped scraping because data couldn't be updated
...

I was testing the selects, etc. in Python and the test worked fine for me.
Do you have any idea why my config for (multi-)/scrape is not working in HA?

Cheers Tim

My test:

import requests
from bs4 import BeautifulSoup
import json
from datetime import timedelta

# JSON configuration from logs
config = {
    'name': 'OK Bergbahnen',
    'resource': 'https://www.ok-bergbahnen.com',
    'log_response': True,
    'scan_interval': timedelta(seconds=3600),
    'sensor': [
        {'name': 'Oberstdorf Snow height', 'select': 'a.jahreszeit-winter > span:nth-child(1) > span > span.font-bold', 'unit_of_measurement': 'cm', 'force_update': False},
        {'name': 'Oberstdorf open lifts', 'select': 'a.jahreszeit-winter > span:nth-child(2) > span > span.font-bold', 'force_update': False},
        {'name': 'Oberstdorf open slopes', 'select': 'a.jahreszeit-winter > span:nth-child(3) > span > span.font-bold', 'unit_of_measurement': 'km', 'force_update': False},
        {'name': 'Oberstdorf open parking lots', 'select': 'a.jahreszeit-winter > span:nth-child(4) > span > span.font-bold', 'force_update': False}
    ],
    'timeout': 10,
    'parser': 'lxml',
    'list_separator': ',',
    'method': 'GET',
    'verify_ssl': True
}

def scrape_data(config):
    response = requests.get(config['resource'], timeout=config['timeout'], verify=False)
    
    if config['log_response']:
        print(response.text)

    soup = BeautifulSoup(response.text, config['parser'])

    values = {}
    for sensor in config['sensor']:
        selector = sensor['select']
        value_element = soup.select_one(selector)

        value = value_element.text.strip() if value_element else 'N/A'

        if 'unit_of_measurement' in sensor:
            value += ' ' + sensor['unit_of_measurement']

        values[sensor['name']] = value

    return values

result = scrape_data(config)
print(json.dumps(result, indent=2))

:white_check_mark: Result:

{
  "Oberstdorf Snow height": "260 cm",
  "Oberstdorf open lifts": "5",
  "Oberstdorf open slopes": "8 km",
  "Oberstdorf open parking lots": "14"
}

Many thanks! This is a great app, I’ve configured it for several use cases and it works like a charm!

I have however stumbled upon a challenge where I have sensors which picks up the daily lunch options from my favourite restaurants. They do work however in a couple of cases it also picks up the price of the dish and places it in the middle of the sentence.
I am using split and now also tried with replace to format the output but without success since the price is dynamic depending on the dish.

Is there a guide or a site outlining:

  1. available commands (e.g. split, trim etc.) for the value_template
  2. how these commands works and how to configure them

I’m not a programmer but I really enjoy tinkering with this so any hint on where I can learn more would be very appreciated!