Scraping a dynamic URL

I would like to scrape a URL that is changing daily, because in the URL there is a ‘date’ argument. Is there a way to setup a scrape sensor in which the date is a variable in the URL and automaticly filled by HA based on the current date?

For example:

http:www.thisismyurl/index.html?date=20221109
Which works out to a logic like:
http:www.thisismyurl/index.html?date=$date
$date = sensor.date

If somebody knows a good guide I would be very happy as well :).

Are you scraping or using a rest resource? Rest sensors use a templatable resource.

Scraping :slight_smile:

Scrape resources are not templatable. You will need a commandline sensor. What is the output of the url.

I just found this integration; that might let me template my URL… but still looking into it.

What is the output of the url → a webpage, on which I want to get the forecasted kWh for my solar arrays on a clear sky (so different than the forecasts integrations that excist in which the weather plays a role)

Good find, it allows resource_template. From their docs:

resource_template A template that will output an url after being rendered. Only required when resource is not provided. True template

Templating the URL seems to work :D. However, can’t get the scraping part working yet. Sensor reports ‘unknown’

Can’t possibly help without seeing the config and the logging.

Will post some info today :). Was a little tired yesterday evening

It is possibly an issue for the custom integration you are using. But I am happy to try out the integration and your settings to see if it works for me. And y’know, thanks, because many people ignore posting properly and refuse to edit their posts. Good on you.

Seems like the CSS selector tag isn’t found. But that is what I get when I inspect the page using Chrome.

Log:

Logger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:163
Integration: Multiscrape scraping component (documentation, issues)
First occurred: 01:01:13 (401 occurrences)
Last logged: 07:41:13

Scraper_noname_0 # SolarEdge Clear Sky Predictions # Unable to scrape data: Could not find a tag for given selector Consider using debug logging and log_response for further investigation.

YAML:

sensor:
  - platform: time_date
    display_options:
        - 'date'

template:
    - sensor:
      - name: "Date Pvcalc"
        state: "{{ as_timestamp(states('sensor.date')) | timestamp_custom('%Y%m%d') }}"

multiscrape:
  - resource_template: https://www.thomasberger.be/pv/pvcalc/index.html?date={{states.sensor.date_pvcalc.state}}&long=6.333&lat=53.068&az=185&roof=38&peakw=3465&temp_coeff=-0.35&az2=0&roof2=0&peakw2=0&temp_coeff2=0
    scan_interval: 60
    sensor:
      - unique_id: solaredge_clearsky_predictions
        name: SolarEdge Clear Sky Predictions
        select: "#highcharts-0 > svg > text.highcharts-title > tspan:nth-child(3)"

Mm, the data I try to scrape seems to be injected by javascripts behind this page. That might be the issue and not sure if that is solvable…

Surely this sort of data must be available in a proper api somewhere?

Unfortunalty it is not, most api’s I’ve found include weather… which I do not want. I am looking for predicted clear-sky solar yield.

This? Forecast.Solar - Home Assistant

Exactly; weather included. No clear-sky option

I see. Sorry I haven’t made a study of this. Why not ask thomasberger.be where they get their data?

Thats exactly what I did yesterday evening; trying to convince him to recode it so that HASS can use it as an integration :slight_smile:

He is making a JSON :smiley:

1 Like

It there an easy way to trigger an update (not a fixed scan_interval) but something such another state changed?