Scrape sensor improved - scraping multiple values

Hi Daniel,
I tried using both username: and email: with no success. There are no messages in the log that say I passed or failed authentication, even if I put an incorrect password in. I think it’s worth cleaning this part of the code up to help in debugging authentication issues.

Cheers,

Hi, I have a question.
I’m not sure if I’m properly authenticated.
How can I see using logger?
My yaml:
immagine

This folder has been created automatically:
immagine

Which file should I check?
Reading in this community I have read that I must see in my log a message like this: The form appears to have been submitted successfully.
Where is?

Kevin, did you enable logging in your configuration.yaml like this?

logger:
  default: info
  logs:
    custom_components.multiscrape: debug

When I run this with your config, it tells me pretty clear that the reason why your config fails:

The form is hidden within a <script> tag, and showed with Javascript. This makes it a complicated case, I’ll try to take a better look later.

I’m not sure if I can make this more clear, as in the end, it is a form-submit feature, and not just for authentication. So I cannot assume that authentication failed, it can be any kind of form. E.g. your address for retrieving a garbage collection schedule.

See answer above to Kevin:
Add this to your configuration.yaml:

logger:
  default: info
  logs:
    custom_components.multiscrape: debug
1 Like

Hi all, my problem is config weather station in HA with xml file and multiscrape, this is page

This XML file does not appear to have any style information associated with it. The document tree is shown below.

<maintag>

<script/>

<misc>

<data misc="refresh_time">2022.10.12. 230831</data>

</misc>

<data realtime="temp">22.11111111111111</data>

</realtime>

and here is config in multiscrape.yaml

multiscrape:
  -resource: https://192.168.1.39/realtime.xml
  scan_interval: 30
  sensor:
  -unique_id: temp_out_weather
  name: "TEMP"
  select: realtime > data realtime="temp":nth-child(1)" 
  value_template: '{{ (value.split("")[1]) }}'

in developers tools / YAML show me this:

Invalid config for [multiscrape]: [multiscrape] is an invalid option for [multiscrape]. Check: multiscrape->multiscrape->0->multiscrape. (See /config/configuration.yaml, line 11).

in configuration.yaml line 11=


# Text to speech
tts:
  - platform: google_translate

automation: !include automations.yaml
script: !include scripts.yaml
scene: !include scenes.yaml  <------------------line--11------------
multiscrape: !include multiscrape.yaml

thank’s

You should not repeat the integration name when you include files. Try this in multiscrape.yaml:

- resource: https://192.168.1.39/realtime.xml
  scan_interval: 30
  sensor:
    - unique_id: temp_out_weather
      name: "TEMP"
      select: 'realtime > data realtime="temp":nth-child(1)'
      value_template: '{{ (value.split("")[1]) }}'

Hi danieldotnl
i try your config but not work

Error loading /config/configuration.yaml: while parsing a block mapping
in "/config/configuration.yaml", line 2, column 1
expected <block end>, but found '-'
in "/config/configuration.yaml", line 104, column 1

I made it work only this way, my weather station wh2900, connected to pc with weewx generate files realtime.xml and realtime.txt. Not working to me with scrape, multiscrape, rest.

content realtime.txt

17/10/22 09:45:21 16.6 66 10.2 12.8 11.9 77 0.0 0.0 1011.9 ENE 2 km/h C hPa mm 42.4 0.4 4.3 428.2 0.0 20.4 54 16.6 4.0 17.4 09:25 12.4 06:17 22.4 09:14 31.4 09:14 1012.6 08:39 1009.6 00:02 4.5.1 0 27.7 16.1 18.0 6 0.0 672 96 0.0 0 1 0 E 808 m 14.4 3.6 675.0 0

decode realtime.txt


############################################################################
# Reference: Cumulus Format of realtime.txt file
############################################################################
#   Field       Pos     Example     Description
#   date        0       18/10/08    date (always dd/mm/yy)
#   time        1       16:03:45    time (always hh:mm:ss)
#   temp        2       8.4         outside temperature
#   hum         3       84          relative humidity
#   dew         4       5.8         dewpoint
#   wspeed      5       24.2        wind speed (average)
#   wgust       6       33.0        wind speed (gust)
#   avgbearing  7       261         wind bearing
#   rrate       8       0.0         current rain rate
#   rfall       9       1.0         rain today
#   press       10      999.7       barometer
#   wdir        11      W           wind direction
#   beaufort    12      6           wind speed (beaufort)
#   windunit    13      mph         wind units
#   tempunit    14      C           temperature units
#   pressunit   15      mb          pressure units
#   rainunit    16      mm          rain units
#   windrun     17      146.6       wind run (today)
#   pressrend   18      +0.1        pressure trend value
#   rmonth      19      85.2        monthly rain
#   ryear       20      588.4       yearly rain
#   rfallY      21      11.6        yesterday's rainfall
#   intemp      22      20.3        inside temperature
#   inhum       23      57          inside humidity
#   wchll       24      3.6         wind chill
#   temptrendval 25     -0.7        temperature trend value
#   tempTH      26      10.9        today's high temp
#   TtempTH     27      12:00       time of today's high temp (hh:mm)
#   tempTL      28      7.8         today's low temp
#   TtempTL     29      14:41       time of today's low temp (hh:mm)
#   windTM      30      37.4        today's high wind speed (average)
#   TwindTM     31      14:38       time of today's hi wind (avg) (hh:mm)
#   wgustTM     32      44.0        today's high wind gust
#   TwgustTM    33      14:28       time of today's high wind gust (hh:mm)
#   pressTH     34      999.8       today's high pressure
#   TpressTH    35      16:01       time of today's high pressure (hh:mm)
#   pressTL     36      998.4       today's low pressure
#   TpressTL    37      12:06       time of today's low pressure (hh:mm)
#   version     38      1.8.2       Cumulus version
#   build       39      459         Cumulus build no
#   rmaxgust    40      1.6         Recent Max Gust
#   heatindex   41      76.2        Heat Index
#   humidex     42      24.9        Humidex Index
#   uv          43      0.02        UV (if you have it)
#   et          44      0.6         ET
#   solar       45      220         Solar (if you have it)
############################################################################

content configuration.yaml

sensor:
  - platform: command_line
    name: outtemp
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 3
    unit_of_measurement: "°C"
    scan_interval: 5
 
  - platform: command_line
    name: outhum
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 4
    unit_of_measurement: "%"
    scan_interval: 5

  - platform: command_line
    name: intemp
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 23
    unit_of_measurement: "°C"
    scan_interval: 5
 
  - platform: command_line
    name: inhum
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 24
    unit_of_measurement: "%"
    scan_interval: 5

  - platform: command_line
    name: pressure
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 11
    unit_of_measurement: "Hpa"
    scan_interval: 5

  - platform: command_line
    name: pressuretrend 
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 19
    unit_of_measurement: "tend"
    scan_interval: 5

  - platform: command_line
    name: windspeed
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 6
    unit_of_measurement: "km/h"
    scan_interval: 5
    
  - platform: command_line
    name: winddir
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 12
    unit_of_measurement: "Dir"
    scan_interval: 5    
   
  - platform: command_line
    name: rainfall
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 10
    unit_of_measurement: "mm"
    scan_interval: 5
   
  - platform: command_line
    name: solarrad
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 46
    unit_of_measurement: "w/m2"
    scan_interval: 3   

   
  - platform: command_line
    name: uvindex
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 45
    unit_of_measurement: "uv"
    scan_interval: 3  

  - platform: command_line
    name: tempmax
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 27
    unit_of_measurement: "°C"
    scan_interval: 3  

  - platform: command_line
    name: tempmin
    command: curl http://192.168.1.39/realtime.txt |  cut  -d\  -f 29
    unit_of_measurement: "°C"
    scan_interval: 3  

result:

greetings to all

1 Like

Hi
I have createded a scraper to retrieve currency values . But somehow the history inst values presented in a curve, but in i line. Also its flat 10 in the curve. I tried unit € or Kr, but still a problem
Do i need to use some kind of value template

code
- unique_id: Euro_buy
name: Euro-buy
select: “#pair_61 > div.contentBox > div.innerContainerWrap.first > div.pid-61-bid.innerContainer”

Data

History

I’m trying to get my solar panel information from a username/password protected website (https://aeg.invertercontrol.com/). Here is a snippet of the page with the username and password boxes:

login page

In my config, I have enabled the following options:

multiscrape:
  - resource: 'https://aeg.invertercontrol.com/dashboard'
    scan_interval: 30
    log_response: true
    form_submit:
      submit_once: True
      resource: 'https://aeg.invertercontrol.com/'
      select: ".login-page"
      input:
        email: [email protected]
        password: '*****'

I have the following enabled the log_response option and I see that the login page is scraped and not the dashboard page that I actually need, so the logon is unsuccessful.

Anyone have any idea which kind of CSS selection I should put here and if it’s even possible to login via a form submit here?

Thanks.

You need to provide a select for both resource and input. You can select the correct selector by right clicking on the HTML element where you have it open in the browser’s dev tools.

Well the selectors are #email and #password, but I have no idea where to place these selects, as the documentation is nog clear on this.

Hi, I am trying to configure the username and password for this site.

how do I find the select portion?
Username should be: username-4, correct?
password should be: user-password-4, correct?

<div class="um-form">

		<form method="post" action="" autocomplete="off">

			<div class="um-row _um_row_1 " style="margin: 0 0 30px 0;"><div class="um-col-1"><div id="um_field_4_username" class="um-field um-field-text  um-field-username um-field-text um-field-type_text" data-key="username"><div class="um-field-label"><label for="username-4">Benutzername oder E-Mail</label><div class="um-clear"></div></div><div class="um-field-area"><input autocomplete="off" class="um-form-field valid " type="text" name="username-4" id="username-4" value="" placeholder="" data-validate="unique_username_or_email" data-key="username">

						</div></div><div id="um_field_4_user_password" class="um-field um-field-password  um-field-user_password um-field-password um-field-type_password" data-key="user_password"><div class="um-field-label"><label for="user_password-4">Passwort</label><div class="um-clear"></div></div><div class="um-field-area"><input class="um-form-field valid " type="password" name="user_password-4" id="user_password-4" value="" placeholder="" data-validate="" data-key="user_password">

						</div></div></div></div>		<input type="hidden" name="form_id" id="form_id_4" value="4">
	
	<p class="um_request_name">
		<label for="um_request_4">Only fill in if you are not human</label>
		<input type="hidden" name="um_request" id="um_request_4" class="input" value="" size="25" autocomplete="off">
	</p>

	<input type="hidden" name="redirect_to" id="redirect_to" value="/meine-daten/#dlaStart"><input type="hidden" id="_wpnonce" name="_wpnonce" value="20e42a2f26"><input type="hidden" name="_wp_http_referer" value="/login/">
<div class="g-recaptcha" id="um-4" data-mode="login"></div>


	<div consent-by="services" consent-id="45" class="um-col-alt" consent-transaction-complete="1">

		

			<div class="um-field um-field-c">
				<div class="um-field-area">
					<label class="um-field-checkbox">
						<input type="checkbox" name="rememberme" value="1">
						<span class="um-field-checkbox-state"><i class="um-icon-android-checkbox-outline-blank"></i></span>
						<span class="um-field-checkbox-option"> Angemeldet bleiben</span>
					</label>
				</div>
			</div>

						<div class="um-clear"></div>
		
			<div class="um-left um-half">
				<input type="submit" value="Anmelden" class="um-button" id="um-submit-btn">
			</div>
			<div class="um-right um-half">
				<a href="https://muc.lebensmittelretter.org/register/" class="um-button um-alt">
					Registrieren				</a>
			</div>

		
		<div class="um-clear"></div>

	</div>

	
	<div consent-by="services" consent-id="45" class="um-col-alt-b" consent-transaction-complete="1">
		<a href="https://muc.lebensmittelretter.org/password-reset/" class="um-link-alt">
			Passwort vergessen?		</a>
	</div>

	
		</form>

	</div>

thanks in advance!

How do you use select_list in template_value?
I have multiple values to be used in logic for value, but i could not find how can this be used.

Literally 2 posts up:

You can select the correct selector by right clicking on the HTML element where you have it open in the browser’s dev tools.

thanks for pointing that out. When I select the username field it highlights this section of the html:

<input autocomplete="off" class="um-form-field valid " type="text" name="username-4" id="username-4" value="" placeholder="" data-validate="unique_username_or_email" data-key="username">

what is the right part to use as selection?
and how to proceed with the password field which highlights this:

<input class="um-form-field valid " type="password" name="user_password-4" id="user_password-4" value="" placeholder="" data-validate="" data-key="user_password">

do I need two select lines, one before username and one before password?

Those are the HTML elements.

  1. Open the developer tools in your browser.
  2. Find the HTML elements.
  3. Right click, and then something like: Copy > Copy Selector.

thanks! I do not have a log entry for not finding the form so I guess this works? but searching the website I found a different approach, without the login, which I prefer:
therefore I need to scrape the id of the first article in the left-area (there will be ne articles on top with new ids and that is the new information I am looking for):

<div id="et-main-area">
	
<div id="main-content">
	<div class="container">
		<div id="content-area" class="clearfix">
			<div id="left-area">
		
					<article id="post-4334" class="et_pb_post post-4334 lmv_event type-lmv_event status-publish hentry">

				
															<h2 class="entry-title"><a href="https://muc.lebensmittelretter.org/lmv_event/lmv_event_4334/">lmv_event_4334</a></h2>
					
					<p class="post-meta"> von <span class="author vcard">Christina S.</span> | <span class="published">Nov 2, 2022</span></p><p>Gemischte Kisten von Fiona</p>
				
					</article>
			
					<article id="post-4332" class="et_pb_post post-4332 lmv_event type-lmv_event status-publish hentry">

				
															<h2 class="entry-title"><a href="https://muc.lebensmittelretter.org/lmv_event/lmv_event_4332/">lmv_event_4332</a></h2>
					
					<p class="post-meta"> von <span class="author vcard">Irene U.</span> | <span class="published">Nov 2, 2022</span></p><p>Gemüse und Obst</p>
<blockquote><p>Bitte seid pünktlich</p></blockquote>
<p>&nbsp;</p>
<p>&nbsp;</p>
				
					</article>

could this be done?

Hi,
Thank s for the code! I’m getting the information in the sensor but I can’t manage to add the information as Grid consumption in the Energy section of Home Assistant. The sensor is not displayed. Have you tried to add it there?

Never tried. Adding stuff to the energy monitoring is not the easiest thing, if it is not supported out of the box.

1 Like