Scrape sensor improved - scraping multiple values

Using the browser tools, what is the selector path it gives you? Also, you need to check the actual soup file to see the HTML that the scraper sees. It will be in your HA config directory under multiscrape. I don’t think location-list is what you want, since that is on the individual elements. You want something with safebays-list.

i got the soup file and i extracted the important part :

<div class="wpb_text_column wpb_content_element vc_custom_1655198915150">
	<div class="wpb_wrapper">
		<div class="safebays-home-subtitle">This list shows bays that are likely to be free from jellyfish, rough sea, debris etc... <br/> This does not mean that only these beaches are safe.</div>
		<div class="safebays-list">
			<div class="safebay-island">Malta</div>
				<div class="location-list">
					<div class="location-name-list">Marfa</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Ramla l-Bir</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Little Armier</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Armier Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Mellieħa</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Imġiebaħ</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Mistra Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">St. Paul's Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Qawra</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Baħar iċ-Ċagħaq</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">St. George's Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Sliema</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Rinella Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Marsaskala</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">St. Thomas Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Ħofriet</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Marsaxlokk</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Pretty Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Wied iż-Żurrieq</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Għar Lapsi</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Fomm ir-Riħ</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Ġnejna Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Għajn Tuffieħa</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Golden Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Anchor Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Paradise Bay</div>
				</div>
				<div class="clr"></div>
				<div class="safebay-island">Gozo</div>
				<div class="location-list">
					<div class="location-name-list">Wied il-Għasri</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Xwejni Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Qbajjar</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Marsalforn</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Ramla l-Ħamra</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">San Blas Bay</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Daħlet Qorrot</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Ħondoq ir-Rummien</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Mġarr ix-Xini</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Xlendi</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Dwejra</div>
				</div>
				<div class="location-list">
					<div class="location-name-list">Dwejra Inland Sea</div>
			</div>
		</div>
	</div>
</div>

I tried
div.wpb_text_column:nth-child(2) > div:nth-child(1) and
.safebays-list

as selector (_list) - both are giving me this error (HA Log) :

homeassistant.exceptions.InvalidStateError: Invalid state encountered for entity ID: sensor.malta_beaches. State max length is 255 characters.

Oh, yeah, it doesn’t take much to reach that limit. The workaround is to put the data in an attribute which doesn’t have that limit.

Hello, I’m wondering if somebody can help or share his experience.
I’m trying to use the multiscrape integration to get energy cost values (PUN values).
The document to parse is an XML but while there are a few examples on how to parse the values of an html document (also on the custom integration Wiki) I could find nothing for XMLs.
How do I identify the fields? I tried with the nodes path but no luck.
In particular I’m interested in using the select_list feature to get all the hourly costs at once (this is what I tried: select_list: 'NewDataSet > Prezzi > PUN')

Below an XML extract (you can have a look at the actual file here - you will get a landing page asking you to check a couple of boxes to accept the use terms and then you will be redirected to the file).

Thanks to anybody who can help!

<NewDataSet>
<xs:schema id="NewDataSet">
<xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="Prezzi">
<xs:complexType>
<xs:sequence>
<xs:element name="Data" type="xs:string" minOccurs="0"/>
<xs:element name="Mercato" type="xs:string" minOccurs="0"/>
<xs:element name="Ora" type="xs:string" minOccurs="0"/>
<xs:element name="PUN" type="xs:string" minOccurs="0"/>
<xs:element name="NAT" type="xs:string" minOccurs="0"/>
<xs:element name="CALA" type="xs:string" minOccurs="0"/>
<xs:element name="CNOR" type="xs:string" minOccurs="0"/>
<xs:element name="CSUD" type="xs:string" minOccurs="0"/>
<xs:element name="NORD" type="xs:string" minOccurs="0"/>
<xs:element name="SARD" type="xs:string" minOccurs="0"/>
<xs:element name="SICI" type="xs:string" minOccurs="0"/>
<xs:element name="SUD" type="xs:string" minOccurs="0"/>
<xs:element name="AUST" type="xs:string" minOccurs="0"/>
<xs:element name="COAC" type="xs:string" minOccurs="0"/>
<xs:element name="COUP" type="xs:string" minOccurs="0"/>
<xs:element name="CORS" type="xs:string" minOccurs="0"/>
<xs:element name="FRAN" type="xs:string" minOccurs="0"/>
<xs:element name="GREC" type="xs:string" minOccurs="0"/>
<xs:element name="SLOV" type="xs:string" minOccurs="0"/>
<xs:element name="SVIZ" type="xs:string" minOccurs="0"/>
<xs:element name="BSP" type="xs:string" minOccurs="0"/>
<xs:element name="MALT" type="xs:string" minOccurs="0"/>
<xs:element name="XAUS" type="xs:string" minOccurs="0"/>
<xs:element name="XFRA" type="xs:string" minOccurs="0"/>
<xs:element name="MONT" type="xs:string" minOccurs="0"/>
<xs:element name="XGRE" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
<Prezzi>
<Data>20230822</Data>
<Mercato>MGP</Mercato>
<Ora>1</Ora>
<PUN>119,631120</PUN>
<NAT>121,230000</NAT>
<CALA>124,000000</CALA>
<CNOR>119,190000</CNOR>
<CSUD>119,190000</CSUD>
<NORD>119,190000</NORD>
<SARD>119,190000</SARD>
<SICI>124,000000</SICI>
<SUD>119,190000</SUD>
<AUST>119,190000</AUST>
<COAC>119,190000</COAC>
<COUP>119,190000</COUP>
<CORS>119,190000</CORS>
<FRAN>119,190000</FRAN>
<GREC>119,190000</GREC>
<SLOV>119,190000</SLOV>
<SVIZ>119,190000</SVIZ>
<BSP>119,190000</BSP>
<MALT>124,000000</MALT>
<XAUS>119,190000</XAUS>
<XFRA>119,190000</XFRA>
<MONT>119,190000</MONT>
<XGRE>119,190000</XGRE>
</Prezzi>
<Prezzi>
<Data>20230822</Data>
<Mercato>MGP</Mercato>
<Ora>2</Ora>
<PUN>124,000000</PUN>
<NAT>124,000000</NAT>
<CALA>124,000000</CALA>
<CNOR>124,000000</CNOR>
<CSUD>124,000000</CSUD>
<NORD>124,000000</NORD>
<SARD>124,000000</SARD>
<SICI>124,000000</SICI>
<SUD>124,000000</SUD>
<AUST>124,000000</AUST>
<COAC>124,000000</COAC>
<COUP>124,000000</COUP>
<CORS>124,000000</CORS>
<FRAN>124,000000</FRAN>
<GREC>124,000000</GREC>
<SLOV>124,000000</SLOV>
<SVIZ>124,000000</SVIZ>
<BSP>124,000000</BSP>
<MALT>124,000000</MALT>
<XAUS>124,000000</XAUS>
<XFRA>124,000000</XFRA>
<MONT>124,000000</MONT>
<XGRE>124,000000</XGRE>
</Prezzi>
<Prezzi>
<Data>20230822</Data>
<Mercato>MGP</Mercato>
<Ora>3</Ora>
<PUN>119,000000</PUN>
<NAT>116,270000</NAT>
<CALA>149,270000</CALA>
<CNOR>115,996050</CNOR>
<CSUD>115,996050</CSUD>
<NORD>115,996050</NORD>
<SARD>115,996050</SARD>
<SICI>149,270000</SICI>
<SUD>115,996050</SUD>
<AUST>115,996050</AUST>
<COAC>115,996050</COAC>
<COUP>115,996050</COUP>
<CORS>115,996050</CORS>
<FRAN>115,996050</FRAN>
<GREC>115,996050</GREC>
<SLOV>115,996050</SLOV>
<SVIZ>115,996050</SVIZ>
<BSP>115,996050</BSP>
<MALT>149,270000</MALT>
<XAUS>115,996050</XAUS>
<XFRA>115,996050</XFRA>
<MONT>115,996050</MONT>
<XGRE>115,996050</XGRE>
</Prezzi>
</NewDataSet>

If you have XML, use a REST sensor. This config:

rest:
 - resource: https://www.mercatoelettrico.org/It/WebServerDataStore/MGP_Prezzi/20230822MGPPrezzi.xml
   headers:
     Cookie: GmeItaliano=EE0F6794E46EDE29034439E15E1D231663A8B5A0CFBB7C7C736C056E1B9056C85552B35A81F83E92B373C3270FA4A70764B342F08066058F2B8AB53566B699465CD2E05DD90304ADD3309764F208E4A63BBF8E773D0EAAC4E4D0EB8BCB82382A6A51C93F01C7E1D44360E19DE2A93EA859CB9C56EDCD14FDAFEFA0664020B4E07152CF7361EE9E239E3E88B0828FF90AA0A964BDD6436ACF092716DEA58D300E
   sensor:
     - name: "PUN values"
       value_template: "{{ (value_json['NewDataSet']['Prezzi']|map(attribute='PUN')|map('replace',',','.')|map('round',2)|list)[:12] }}"

gives you the next 12 values as a string, rounded to 2 decimals, and with decimal points instead of commas:

Notes:

  • That cookie header might expire. You’ll have to work out a way around that.
  • Sensor state is a string, and cannot exceed 255 characters hence my rounding and limiting to 12.

You could create a separate sensor for each value, with date and hour as attributes.

Thank you for your time and your suggestions.
I had a look and played a bit but given mainly the need to potentially have to deal with the cookie header I decided to continue using the multiscraper that can automatically manage this aspect.
Also thank you for pointing out the limitation of the status length, so I followed your suggestion and created one sensor for each value (I will concatenate in a list later for my needs).
I got everything working still relying on the CSS selector… not sure if this is the right approach for an XML… but it works (of course suggestions about this or how to improve this overall are always welcome - for example see the warning below, even if I made the use of lxml explicit and it should be used by default).

I’m pasting my solution below, if anybody else from Italy lands on this page and need it.

PS:
I’m using the XML approach as it allows me to build a template as a resource and potentially query any day with little effort - other options available from the website and relying on HTML tables are not so immediate.

multiscrape:
  - resource_template: 'https://www.mercatoelettrico.org/It/WebServerDataStore/MGP_Prezzi/{{ as_timestamp(now()) | timestamp_custom("%Y%m%d", True) }}MGPPrezzi.xml'
    scan_interval: 3600
    parser: 'lxml'
    name: 'PUN oggi'
    button:
      unique_id: 'aggiorna_misure_pun_oggi'
      name: 'Aggiorna misure PUN oggi'
    log_response: true
    form_submit:
      submit_once: false
      resource: 'https://www.mercatoelettrico.org/It/Tools/Accessodati.aspx'
      select: '#form1'
      input:
        'ctl00$ContentPlaceHolder1$CBAccetto1': 'on'
        'ctl00$ContentPlaceHolder1$CBAccetto2': 'on'
        'ctl00$ContentPlaceHolder1$Button1': 'Accetto'
      input_filter:
        - 'ctl00$ContentPlaceHolder1$Button2'
        - 'ctl00$vai'
        - 'ctl00$LinkButton2'
        - 'ctl00$LoginButton'
    sensor:
      - select: 'NewDataSet:nth-child(1) > Prezzi:nth-child(2) > PUN:nth-child(4)'
        name: 'PUN oggi 00'
        unique_id: 'pun_oggi_00'
        icon: 'mdi:currency-eur'
        unit_of_measurement: '€/kWh'
        value_template: '{{ value | replace (",", ".") |float | int / 1000}}'
      - select: 'NewDataSet:nth-child(1) > Prezzi:nth-child(3) > PUN:nth-child(4)'
        name: 'PUN oggi 01'
        unique_id: 'pun_oggi_01'
        icon: 'mdi:currency-eur'
        unit_of_measurement: '€/kWh'
        value_template: '{{ value | replace (",", ".") |float | int / 1000}}'
1 Like

Hi,
I am experiencing a problem with retrieving data from a web page.
Specifically I want to get the data from :
body > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(9) > td:nth-child(2) > font:nth-child(1) > strong:nth-child(1) > font:nth-child(1) > small:nth-child(1) retrieved from http://perachora-davis.meteoclub.gr/
I’ve tried everything but it doesn’t bring me any data.
I have tried with other fields and the result is the same.

Can you help me? What can go wrong?

Thanks

Your selector doesn’t match anything on the page: there is no <tbody> in the HTML. The tool you’re using to work out the selector is not giving a valid answer. The HTML on that page is also not valid, so some tools will struggle.

What value are you trying to retrieve? It’s much better to work these out manually and minimise the list. I think you’re trying to access the barometric pressure (9th tr with complications due to invalid code), which is:

tr:nth-child(11) > td:nth-child(2) > font:nth-child(1)

If you then want to extract the value, set the value_template to:

{{ value|select('in','.0123456789')|join }}

…which works so long as the description doesn’t contain a digit or a dot.

image

I’d like to share a quick and simple recipe for those that often struggle to figure out the CSS selector path.

I have done a fair amount of web development, and it still catches me out from time to time. CSS remains tricky business for me.

Where people often get stuck, is where some parts of a page are loaded dynamically, so you can’t get access to the elements that you want. This is sometimes related to timing, because it’s not always trivial (if ever) to know when a page has fully loaded.

If you want a much better chance of finding your selector, simply do the following:

  1. Make a multiscrape sensor that just use value_template: "{{ value }}" and select: "foo" (foo can be anything, as long as it’s an ASCII word; don’t make it just . (a dot/fullstop), for example). What we’re trying to achieve here is to have no meaningful output, but something that has valid config that will produce a page_soup.txt file on your HA server.

  2. Copy the page_soup.txt to your local computer.

  3. Open this file in your browser. It will probably render badly, but that doesn’t matter: This is what the scraper sees. Find the element you’d like to scrape and right click on it to bring up the code inspector for that piece of HTML. The HTML elements will be visible in another window (typically). Highlight the element in that view and right click again to choose “Copy > Copy selector”. Chances are that what you have it this point can be used as your select: option’s config. Some of you might want to clean this up a bit for aesthetic and readability reasons. Read on…

  4. You can test your selector with a bit of Python using the soup file and without having to iterate on your HA config. You must have BeautifulSoup installed, obviously, which is what this custom component uses (I’m not going to detail the installation of Python or any libraries here).

from bs4 import BeautifulSoup
f = open('page_soup.txt')
soup = BeautifulSoup(f, 'html.parser')
soup.select_one("<YOUR_SELECTOR_PATH>")
  1. If you want to inspect pretty code in step 3 instead, you can prettify it. It can make it easier to debug and navigate, especially if you want to adjust the CSS selector path to only be as specific as it needs to be. A lot of web generated code by tools have tons of ugly IDs and a lot of classes where you only need one of them to uniquely identify an element for the purposes of scraping.
prettyHtml = soup.prettify()
with open('pretty.html', 'w') as o:
  o.write(prettyHtml)

I hope this helps!

2 Likes

I’d like to scrape price from here https://www.ditur.fi/garmin-epix-gen2-pro-47mm-sapphire-010-02803-11
but inspect gives me selector #product-price-109517 > span > span which doesn’t work. Could someone help me to right direction?

Did you read the post before yours?

Yes, I tried that, but maybe I do something wrong, because I cannot get proper selectors from any page with that. I got body > pre as selector.
My sensor is:

- resource: https://www.ditur.fi/garmin-epix-gen2-pro-47mm-sapphire-010-02803-11
  log_response: true
  sensor:
    select: "hhh"
    value_template: "{{ value }}"

I guess I should get the price 989 scraped from here, if it’s possible to get from the span class?

</script>
<div class="price-box price-final_price flex items-center gap-2 lg:gap-5 flex-wrap lg:flex-nowrap" x-data="initPrice109517()" x-spread="eventListeners">
<template x-if="!activeProductsPriceData &amp;&amp; !isPriceHidden()">
<div class="price-container flex items-center gap-2 lg:gap-5 flex-wrap lg:flex-nowrap">
<div class="old-price mr-2 flex order-3 lg:order-2">
<span class="price-wrapper title-font font-light text-base lg:text-2xl line-through text-gray-900" id="product-price-109517">
<span class="price">
<span class="price">1 099 €</span> </span>
</span>
</div>
<div class="discount flex order-1 lg:order-3 w-full lg:w-auto">
<span class="bg-hot_sale-lighter py-1 px-2 leading-none">
<span class="text-white text-xs uppercase font-semibold">10%</span>
</span>
</div>
<div class="final-price inline-block order-2 text-hot_sale-lighter lg:order-1" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">
<span class="price-label hidden">
</span>
<span class="price-wrapper title-font text-xl lg:text-2xl lg:font-semibold" id="product-price-109517">
<span class="price">
<span class="price">989 €</span> </span>
</span>
<meta content="989" itemprop="price"/>
<meta content="EUR" itemprop="priceCurrency"/>
</div>
</div>

Is this HTML from the soup file?

Yes, from page_soap.txt

Describe your steps in more detail.

What did you do with that file?

How did you try to determine a selector?

Hello,

I am running into a similar situation and believe I need to invoke a rest sensor (brand new to me). The right-click and “View Page Source” provides this: https://pastebin.com/iVEVcqfZ

When I press F12 and go to the network tab, I am not sure what file I need to use to get the javascript URL for the rest sensor. Here are the results under the “Network” tab when I reload the page, any help would be appreciated - thank you!

12D0EXN41br.js?_nc_x=Ij3Wp8lg5Kz
14.cache.js
4.cache.js
54.cache.js
api.js
AuthService
B9B219D8501B6CF3D9BA2603E04CE480.cache.js
BillingService
BootstrapService
button.e7f9415a2e000feaab02c86dd5802747.js
close.gif
collect?v=2&tid=G-QZN5THYDN6&gtm=45je38u0&_p=41996….coop%2Fprovider&dt=Loading%20Application...&_s=1
collect?v=2&tid=G-XKLPC78ZDJ&gtm=45je38u0&_p=41996….coop%2Fprovider&dt=Loading%20Application...&_s=1
common.js
consumer.nocache.js
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/png;base…
data:image/svg+xml,…
embeds?l=%7B%22widget_origin%22%3A%22https%3A%2F%2…ssion_id=9fe6c274a1071d779cf3ca20d4df24baded01412
facebook.png
FEppCFCt76d.png
fontawesome-webfont.woff2?v=4.7.0
gen_204?csp_test=true
google-maps-api-script
green-DownloadText128.png
green_button_small.png
GWT.rpc
GWT.rpc
GWT.rpc
help.png
js
js?id=G-QZN5THYDN6&l=dataLayer&cx=c
js?id=G-XKLPC78ZDJ&l=dataLayer&cx=c
like.php?href=https://mvec.smarthub.coop/UsageAnal…alse&action=like&colorscheme=light&font&height=21
loading.gif
log.js
log?hasfast=true
MessengerServiceV4
MessengerServiceV4
MessengerServiceV4
MiscellaneousReceivableService
mvec.smarthub.coop
NewServiceConnectService
NewServiceConnectService
PaymentService
postMessage.js
PreferencesRPCService
ProviderBillingService
ProviderService
RateDataService
ReadingsService
ReadingsService
recaptcha__en.js
resources?resource=2021/MVEC_Logo-FullColor_Web%20_%20500.jpg
SecuredSettingsService
settings?session_id=9fe6c274a1071d779cf3ca20d4df24baded01412
simplePagerFastForward.png
siteName
titlegradient.png
tweet_button.2b2d73daf636805223fb11d48f3e94f7.en.html
twitter.png
UserProfileService
UserProfileService
util.js
VersionService
VersionService
WeatherDataService
widget_iframe.2b2d73daf636805223fb11d48f3e94f7.htmlorigin=https%3A%2F%2Fmvec.smarthub.coop
widgets.j

Hi, perhaps someone can give me a hint. I wan to scrape a timescedule from “Belegungsplan - ESC Geretsried”.
The page shows the actual week. Scraping the actual week works for me. With adding for example “?2023-W38” in the browser, it shows the scedule for this explict week.
When I put the url “Belegungsplan - ESC Geretsried” as resource, multiscrape returns only the scedule of the actual week.
Please give me an advice. Thank you

Hi All,

I am trying to fetch data from my heatpump @ mydewarmte.nl,

however after the form submit I receive a 403:Forbidden response. I already added a Agent to the Header, but the problem is with a CSRF token.

My configuration:

  - resource: https://mydewarmte.com/status
    log_response: true
    scan_interval: 300
    headers:
      User-Agent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36'
    form_submit:
      #submit_once: True
      resource: https://mydewarmte.com/
      select: "body > div > div.wrapper > div.form-block > form"
      input:
        username: ####
        password: ####
    sensor:
      - select: "#supply_temp"
        name: scrapemydewarmte-water_supplytemp

My log:

2023-09-18 13:53:15.597 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_1 # response_body written to file: form_submit_response_body.txt
2023-09-18 13:53:15.598 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_1 # Error executing post request to url: https://mydewarmte.com/.
 Error message:
 HTTPStatusError("Client error '403 Forbidden' for url 'https://mydewarmte.com/'\nFor more information check: https://httpstatuses.com/403")

with the error description from the “form_submit_response_body_error.txt”:

<div id="summary">
  <h1>Forbidden <span>(403)</span></h1>
  <p>CSRF verification failed. Request aborted.</p>

  <p>You are seeing this message because this HTTPS site requires a “Referer header” to be sent by your web browser, but none was sent. This header is required for security reasons, to ensure that your browser is not being hijacked by third parties.</p>
  <p>If you have configured your browser to disable “Referer” headers, please re-enable them, at least for this site, or for HTTPS connections, or for “same-origin” requests.</p>
  <p>If you are using the &lt;meta name=&quot;referrer&quot; content=&quot;no-referrer&quot;&gt; tag or including the “Referrer-Policy: no-referrer” header, please remove them. The CSRF protection requires the “Referer” header to do strict referer checking. If you’re concerned about privacy, use alternatives like &lt;a rel=&quot;noreferrer&quot; …&gt; for links to third-party sites.</p>


</div>

When I record a session in the browser, I see a CSRF token is supplied in the Set-Cookie which is given back in the Payload of the form submit POST. Can I somehow manage to do this in the multiscrape configuration?

to awnser my own question, it seems that ha_multiscrape did include the CSRF token in the form_submit, however the problem was the website needed a ‘referer’, ‘origin’ and ‘host’ to give a 200 return, which I included in the headers…

Now I can receive the correct values, for other people interested, see code below for the final setup:

multiscrape:
  - resource: https://mydewarmte.com/status
    scan_interval: 300
    headers:
      User-Agent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36'   
      referer: 'https://www.mydewarmte.com/'
      origin: 'https://www.mydewarmte.com'
      host: 'www.mydewarmte.com'
    form_submit:
      submit_once: True
      resource: https://mydewarmte.com/
      select: "body > div > div.wrapper > div.form-block > form"
      input:
        username: PASTE_YOUR_EMAIL
        password: PASTE_YOUR_PASSWORD
    sensor:
      - name: PompAO Water Flow
        unique_id: pompaowaterflow
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[0])|replace('\r\n      var WaterFlow = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "l/min"
      - name: PompAO Supply Temperature
        unique_id: pompao_supplytemp
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[1])|replace('\r\n      var SupplyTemp = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "°C"
      - name: PompAO Outside Temperature
        unique_id: pompao_outsidetemp
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[2])|replace('\r\n      var OutSideTemp = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "°C"
      - name: PompAO Heat Input
        unique_id: pompao_heatinput
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[3])|replace('\r\n      var HeatInput = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "kW"
      - name: PompAO Return Temperature
        unique_id: pompao_returntemp
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[4])|replace('\r\n      var ReturnTemp = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "°C"
      - name: PompAO Electrical Consumption
        unique_id: pompao_electricalcons
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[5])|replace('\r\n      var ElecConsump = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "kW"
      - name: PompAO PompAo Status
        unique_id: pompao_on_off_status
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[6])|replace('\r\n      var PompAoOnOff = ', '')|replace('\"','')|bool }}"
      - name: PompAO Heat Output
        unique_id: pompao_heatoutput
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[7])|replace('\r\n      var HeatOutPut = ', '')|replace('\"','')|float }}"
        unit_of_measurement: "kW"
      - name: PompAO Boiler Status
        unique_id: pompao_boiler_on_off_status
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[8])|replace('\r\n      var BoilerOnOff = ', '')|replace('\"','')|bool }}"
      - name: PompAO Thermostat Status
        unique_id: pompao_thermostat_status
        select: "body > script:nth-child(1)"
        value_template: "{{ (value.split(';')[9])|replace('\r\n      var ThermostatOnOff = ', '')|replace('\"','')|bool }}"

1 Like

Hi
I am trying to get the timetable from my daughters school.
I got it wo work, but the login drives me crazy.
when i try to login in browser i get the link to the login with a uuid or something. when i put this in the ressource it works. but i dont want to do that manually each day.
so i tried to find out how to do this.
now i saw the link is in the “form_page_response_body.txt”
so can i use a link out of this txt file for the ressource for the form?
it looks like this:
<link rel="canonical" href="xxx" /><title>