Scrape Integration - Surf Report

Need help scraping a specific line of text from a website. The end goal is a scrape sensor that picks up this text so I can use it as a surf report (ocean conditions).

What would be the right “select” for this info?

URL: Salt Creek Surf Report & 15-day Forecast | South Orange County Surf Conditions

I can’t wait for some help! Thanks!

1 Like

This might be a good start:

#RenderBodyContainer > div.row:nth-child(1) > div > div > div:nth-child(3) > div.panel > div.panel-body > span

Thanks for taking the time to help!

Buuuuuut… That didn’t work… haha

Here is my configuration:

# Surf Report

sensor:
  - platform: scrape
    resource: https://deepswell.com/surf-report/US/South-Orange-County/Salt-Creek/1026
    name: Surf Data 2
    select: "#RenderBodyContainer > div.row:nth-child(1) > div > div > div:nth-child(3) > div.panel > div.panel-body > span"

And I’m getting: “Entity not available: sensor.surf_data_2”

Any other suggestions?

Are there any error messages in your log?

No, I don’ see any :confused:

If you are trying to configure an entity, but it isn’t added to the system, there is usually an error message in the logs.
Did you try restarting Home Assistant?
It could quite possibly be that my selector is incorrect. I only tested it in Firefox’s developer tools, I did not add a scrape sensor myself. However, if that’s the problem, there should definitely be an error message.

Ah! Dummy me… I found the error message… Does this mean that the output is too long for HA and the scrape sensor?

Logger: homeassistant.components.sensor
Source: core.py:1047
Integration: Sensor (documentation, issues)
First occurred: 1:22:59 PM (2 occurrences)
Last logged: 1:22:59 PM

  • Error adding entities for domain sensor with platform scrape
  • Error while setting up scrape platform for sensor

Traceback (most recent call last): File “/usr/src/homeassistant/homeassistant/helpers/entity_platform.py”, line 382, in async_add_entities await asyncio.gather(*tasks) File “/usr/src/homeassistant/homeassistant/helpers/entity_platform.py”, line 614, in _async_add_entity await entity.add_to_platform_finish() File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 799, in add_to_platform_finish self.async_write_ha_state() File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 532, in async_write_ha_state self._async_write_ha_state() File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 679, in _async_write_ha_state self.hass.states.async_set( File “/usr/src/homeassistant/homeassistant/core.py”, line 1361, in async_set state = State( File “/usr/src/homeassistant/homeassistant/core.py”, line 1047, in init raise InvalidStateError( homeassistant.exceptions.InvalidStateError: Invalid state encountered for entity ID: sensor.surfdata. State max length is 255 characters.

Yes, that’s the problem.
States in Home Assistant cannot be longer than 255 characters. Attributes do not have this limitation, but I don’t see an easy way to provide the scrape sensor’s value as an attribute. Maybe the service you are scraping has an API? It is much easier to do this using a RESTful sensor.

Yeah… lack of an API is what started me down this path! Haha… Maybe I’ll just try to scrape a different part of the page with less text and the info I want.

Did you have a way your pulled that selector? Clearly my HTML/CSS skills are pretty sad! haha

I really appreciate your help though! And if I get it figured out I’ll let you know

When the text string is short enough this works great! Thanks for the help @ondras12345 I owe you one!

Try adding this to your sensor, it will trim off anything longer than 254 characters.

value_template: '{{ value[:254] }}'

2 Likes

How is your surf reporting going?

I’m looking around to see what is available and what has been done…

If you want to use mine you can take it from my personal trainer.

Ahhh nice.

Surfforecast.com was one of the sources I had shortlisted since it looked pretty free and scrapeable (in the absence of free APIs).

I’m also familiar with multiscrape.

So looks like you’ve done the hard work.

I’ll put this on my to-do list. Thanks for bringing it to my attention.

1 Like

Hello
Thank you for interesting setup, but unfortunatley, when I do setup a surf forecast it returns me unavailable for sensor. Does it related for a scaping or I should look into some other issue?

You must consider that not all websites allow you to scrape info.

I always start by trying to get something simple, in order to verify if i can get the data or i have to switch to another source.

We have no idea what your problem is if you don’t tell us what site you’re looking at, what data you’re trying to get or how you set your sensor up.

I basically take all from here: https://aguacatec.es/tarjeta-de-forecast-deportivo/

- name: Spot Los Locos
  resource: https://www.surf-forecast.com/breaks/Los-Locos/forecasts/latest
  scan_interval: 3600
  sensor:
    - unique_id: spot_loslocos
      name: Spot
      select: "#contdiv > section > div > div.break-header__content > h2 > b"
      attributes:
        - name: Hour 1
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-time > td:nth-child(2) > span:nth-child(1)"
        - name: Meridian 1
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-time > td:nth-child(2) > span:nth-child(2)"
        - name: Rating 1
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-rating > td:nth-child(2) > div > div"
        - name: Wave Height 1
          select: "#forecast-table > div > table > tbody > tr:nth-child(5) > td:nth-child(2) > div > svg > text"
        - name: Wave Period 1
          select: "#forecast-table > div > table > tbody > tr:nth-child(6) > td:nth-child(2) > strong"
        - name: Hour 2
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-time > td:nth-child(3) > span:nth-child(1)"
        - name: Meridian 2
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-time > td:nth-child(3) > span:nth-child(2)"
        - name: Rating 2
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-rating > td:nth-child(3) > div > div"
        - name: Wave Height 2
          select: "#forecast-table > div > table > tbody > tr:nth-child(5) > td:nth-child(3) > div > svg > text"
        - name: Wave Period 2
          select: "#forecast-table > div > table > tbody > tr:nth-child(6) > td:nth-child(3) > strong"
        - name: Hour 3
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-time > td:nth-child(4) > span:nth-child(1)"
        - name: Meridian 3
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-time > td:nth-child(4) > span:nth-child(2)"
        - name: Rating 3
          select: "#forecast-table > div > table > tbody > tr.forecast-table__row.forecast-table-rating > td:nth-child(4) > div > div"
        - name: Wave Height 3
          select: "#forecast-table > div > table > tbody > tr:nth-child(5) > td:nth-child(4) > div > svg > text"
        - name: Wave Period 3
          select: "#forecast-table > div > table > tbody > tr:nth-child(6) > td:nth-child(4) > strong"

And rest of the code with attributes
So the website that I am scraping is Los Locos 48 hour detailed Surf Forecast as provided in exaple from the training assistant

I did it exctly as provided in this guide: https://aguacatec.es/tarjeta-de-forecast-deportivo/