Scrape configuration - basic YAML include setup question

Hi, All:

I’m using the simple scrape integration, documented Here. What originally I thought was a scraper question, I’m embarrassed to admit, it’s now just a simple YAML question:

I have a simple (example) scraper configuration in my configuration.yaml file, as so:

# Example configuration.yaml entry
sensor:
  - platform: scrape
    resource: https://www.home-assistant.io
    select: ".current-version h1"
    name: "HA Version"

And I want to move it (and all my future scrapers) to a separate file outside my configuration.yaml, as such:

scrape: !include scrape.yaml

Naively, I assumed if I put the same YAML from above in the scrape.yaml file, it would work. Not so.

I have created scrape.yaml and in it, is the following:

# Example configuration.yaml entry
sensor:
  - platform: scrape
    resource: https://www.home-assistant.io
    select: ".current-version h1"
    name: "HA Version"

When my instance boots, I get an error stating:

2021-09-14 11:22:46 ERROR (MainThread) [homeassistant.setup] Setup failed for scrape: No setup or config entry setup function defined.

I’m pretty sure I’m just missing a space or colon or quote, but my background is not in YAML, so I am a rank newbie at the syntax.

Can anyone shed light on the proper configuration?

Thank you in advance.
-Jim

Scrape: is not a valid configuration key. Try sensor, because that’s what the “scaper” is:
In configuration.yaml:

sensor: !include sensors.yaml

In file sensors.yaml in your config folder
#sensor:

Note the file names carefully, and be sure you only have "sensor" defined once ( note the " #" in the sensors file, so its a reminder to indent, but ignore the second sensor key).

Remove sensor from that.

scrape.yaml

  - platform: scrape
    resource: https://www.home-assistant.io
    select: ".current-version h1"
    name: "HA Version"

EDIT: Also do what @Kdem said about changing the word scrape to sensor in configuration.yaml.

Scrape: is not a valid configuration key!! Duh! Why didn’t that jump out at me. Oh I know, because I am not even halfway started learning all those keys! Thank you for pointing that out!

What about in my case? i get the same warning as @jwhowa .

I have this:
In my configuration file:
sensor: !include_dir_merge_named sensor/

in the map Sensor I have a 2 .yaml files
1: sensor.yaml
2: scrape.yaml

In the scrape.yaml I have this:

- platform: scrape
  name: ROVA ophaal moment
  resource: http://afvalkalender.rova.nl/nl/8051BB/5
  select: ".firstDate"
  scan_interval: 43200
        
- platform: scrape
  name: ROVA ophaal type
  resource: http://afvalkalender.rova.nl/nl/8051DP/30
  select: ".firstWasteType"
  scan_interval: 43200

After restart HASS I get the following warning:

> Setup failed for scrape: No setup or config entry setup function defined.

What is going wrong?

Use the merge dir list include

Not working,…

sensor: !include_dir_merge_list sensor/

If I set it as a List, do I need to edit the sensor.yaml with stripes?

#Calculatie energie huidige periode:
electra_per_uur:
  friendly_name: 'Electra per uur'
  unit_of_measurement: kWh
  value_template: "{{ sensor.hourly_energy_offpeak |float + sensor.hourly_energy_peak |float}}"

  electra_per_dag:
    friendly_name: 'Electra per dag'
    unit_of_measurement: kWh
    value_template: "{{ sensor.daily_energy_offpeak |float + sensor.daily_energy_peak |float }}"

  electra_per_week:
    friendly_name: 'Electra per week'
    unit_of_measurement: kWh
    value_template: "{{ states('sensor.weekly_energy_offpeak')|float}} + {{states('sensor.weekly_energy_peak')|float }}"

  electra_per_maand:
    friendly_name: 'Electra per maand'
    unit_of_measurement: kWh
    value_template: "{{ states('sensor.monthly_energy_offpeak')|float}} + {{states('sensor.monthly_energy_peak')|float }}"

That isn’t formatted properly, seems like you’re not sharing your full config. Those are template sensors using the legacy format, but they are missing the template platform, which is required.

I would like to use the scrape function for a website, but need your help as there are multiple

blocks. Below is a snapshot of the html (www.actuelewind.nl)

<div class="ui-grid-b bar-gray">
	<div class="ui-block-a"><div class="ui-bar spotDetailBlock ui-bar-links ui-bar-top">
		<div class="spotDetailBlock">Windsnelheid <div id="spotInfoWindsnelheidMS" class="spotDetailBlockInfo">23.5</div> <span class="spotDetailFavWindEenheid">knopen</span></div>
	</div></div>
	<div class="ui-block-b"><div class="ui-bar spotDetailBlock ui-bar-midden ui-bar-top">
		<div class="spotDetailBlock">Windstoten <div id="spotInfoWindstotenMS" class="spotDetailBlockInfo">35.6</div> <span class="spotDetailFavWindEenheid">knopen</span></div>
	</div></div>
	<div class="ui-block-c"><div class="ui-bar spotDetailBlock ui-bar-rechts ui-bar-top">
		<div class="spotDetailBlock">Windrichting <div id="spotInfoWindrichting" class="spotDetailBlockInfo">ZW (227°)</div></div>
	</div></div>
	</div>
	<div class="ui-block-b"><div class="ui-bar spotDetailBlock ui-bar-midden ui-bar-top">
		<div class="spotDetailBlock">Windstoten <div id="spotInfoWindstotenMS" class="spotDetailBlockInfo">35.6</div> <span class="spotDetailFavWindEenheid">knopen</span></div>
	</div></div>
	<div class="ui-block-c"><div class="ui-bar spotDetailBlock ui-bar-rechts ui-bar-top">
		<div class="spotDetailBlock">Windrichting <div id="spotInfoWindrichting" class="spotDetailBlockInfo">ZW (227°)</div></div>
	</div></div>

Is there a way to extract 23.5, 25.6 and 227 in seperate sensors? Thanks for you support!

EDIT: The HTML you posted is not in the original response from the server. Stand by for more info…

OK, so that data is in this URL:

https://www.actuelewind.nl/getActualSpotData6.php?t=web&p=null&ss=1920&1710510951462

and you’ll need to use the RESTful integration to pull the data in.

Example below for IJmuiden / Wijk aan Zee (6225) — find your desired station ID from the home page by clicking the heading and getting it from the URL:

This goes into configuration.yaml.

  • If you already have a rest: heading, put it under that;
  • If you have a rest: !include ... line already, put it in the referenced file without the first line.

If you’ve added rest: to your config file, you’ll need a full restart to pull it in for the first time.

rest:
  - resource_template: https://www.actuelewind.nl/getActualSpotData6.php?t=web&p=null&ss=1920&{{ now()|as_timestamp|int }}
    scan_interval: 600
    sensor:
      - name: Windsnelheid
        value_template: "{{ value_json['wind']['6225']['winddata'][0]['windsnelheidMS'] }}"
        unit_of_measurement: "m/s"
        device_class: wind_speed
      - name: Windstoten
        value_template: "{{ value_json['wind']['6225']['winddata'][0]['windstotenMS'] }}"
        unit_of_measurement: "m/s"
        device_class: wind_speed
      - name: Windrichting
        value_template: "{{ value_json['wind']['6225']['winddata'][0]['windrichtingGR'] }}"
        unit_of_measurement: "°"
      - name: Stationnaam
        value_template: "{{ value_json['wind']['6225']['windspot']'stationnaam'] }}"

Yeah, that works:

JSON gives the wind speeds in m/s; the website converts to knots (I assume that’s what “knopen” is, and the numbers seem right).

I’ve added Stationnaam just to help you check you have selected the right ID.

1 Like

Hi Troon, amazing and wow thanks for this detailed support. This is working perfectly and I have it fully implemented now.

Can you tell me how did you find the URL needed for this? And what about the arguments used (e.g. ss=1920?). And should I add arguments to limit the php data output (now all station and with historic data)?

Big thanks!

F12 DevTools in the browser, just looked through the resources that the page loaded. No idea about the 1920: that’s just what the page asked for. I also don’t know if you can limit the output from the PHP script, but do have a look at what each station’s page loads up if you refresh it — there might be a limited version.

I’ve seen this tool indeed, but where to find ‘resources’? I see sources (and that seems to be js mostly). Can you show me a screenshot? Or I am I overlooking something? Thanks!

Use the Network tab and refresh the page. You’re usually looking for a response type of xhr (short for XMLHttpRequest, also referred to as “AJAX”) and the filename will often stand out as being “specific” as opposed to jquery responses or images.

Here, the getActualSpotData near the bottom ticks all those boxes:

and when you select it and click the Response tab in the right-hand window, you can see a line of JSON-formatted data:

You can see the ss=1920 in the URL, as well as the &1710746082403, which is a “cache-buster” and is just the current time as a UNIX timestamp which I recreated in the resource_template of the sensor configuration.

I then copy-pasted the JSON into this tool to work out the path to the data you wanted (using the Format JSON button):

1 Like

Amazing, thanks for clarifying. Next time I should be able to find the source myself (and help others).

1 Like