Scraping a dynamic URL

B1rdEater · November 9, 2022, 8:20am

I would like to scrape a URL that is changing daily, because in the URL there is a ‘date’ argument. Is there a way to setup a scrape sensor in which the date is a variable in the URL and automaticly filled by HA based on the current date?

For example:

http:www.thisismyurl/index.html?date=20221109
Which works out to a logic like:
http:www.thisismyurl/index.html?date=$date
$date = sensor.date

If somebody knows a good guide I would be very happy as well :).

nickrout · November 9, 2022, 9:53am

Are you scraping or using a rest resource? Rest sensors use a templatable resource.

B1rdEater · November 9, 2022, 6:49pm

Scraping

nickrout · November 9, 2022, 7:06pm

Scrape resources are not templatable. You will need a commandline sensor. What is the output of the url.

B1rdEater · November 9, 2022, 7:51pm

I just found this integration; that might let me template my URL… but still looking into it.

What is the output of the url → a webpage, on which I want to get the forecasted kWh for my solar arrays on a clear sky (so different than the forecasts integrations that excist in which the weather plays a role)

nickrout · November 9, 2022, 9:04pm

Good find, it allows resource_template. From their docs:

resource_template A template that will output an url after being rendered. Only required when resource is not provided. True template

B1rdEater · November 9, 2022, 10:08pm

Templating the URL seems to work :D. However, can’t get the scraping part working yet. Sensor reports ‘unknown’

nickrout · November 9, 2022, 10:28pm

Can’t possibly help without seeing the config and the logging.

B1rdEater · November 10, 2022, 6:35am

Will post some info today :). Was a little tired yesterday evening

nickrout · November 10, 2022, 6:39am

It is possibly an issue for the custom integration you are using. But I am happy to try out the integration and your settings to see if it works for me. And y’know, thanks, because many people ignore posting properly and refuse to edit their posts. Good on you.

B1rdEater · November 10, 2022, 6:43am

Seems like the CSS selector tag isn’t found. But that is what I get when I inspect the page using Chrome.

Log:

Logger: custom_components.multiscrape.sensor
Source: custom_components/multiscrape/sensor.py:163
Integration: Multiscrape scraping component (documentation, issues)
First occurred: 01:01:13 (401 occurrences)
Last logged: 07:41:13

Scraper_noname_0 # SolarEdge Clear Sky Predictions # Unable to scrape data: Could not find a tag for given selector Consider using debug logging and log_response for further investigation.

YAML:

sensor:
  - platform: time_date
    display_options:
        - 'date'

template:
    - sensor:
      - name: "Date Pvcalc"
        state: "{{ as_timestamp(states('sensor.date')) | timestamp_custom('%Y%m%d') }}"

multiscrape:
  - resource_template: https://www.thomasberger.be/pv/pvcalc/index.html?date={{states.sensor.date_pvcalc.state}}&long=6.333&lat=53.068&az=185&roof=38&peakw=3465&temp_coeff=-0.35&az2=0&roof2=0&peakw2=0&temp_coeff2=0
    scan_interval: 60
    sensor:
      - unique_id: solaredge_clearsky_predictions
        name: SolarEdge Clear Sky Predictions
        select: "#highcharts-0 > svg > text.highcharts-title > tspan:nth-child(3)"

B1rdEater · November 10, 2022, 9:10am

Mm, the data I try to scrape seems to be injected by javascripts behind this page. That might be the issue and not sure if that is solvable…

nickrout · November 10, 2022, 8:52pm

Surely this sort of data must be available in a proper api somewhere?

B1rdEater · November 10, 2022, 9:58pm

Unfortunalty it is not, most api’s I’ve found include weather… which I do not want. I am looking for predicted clear-sky solar yield.

nickrout · November 10, 2022, 10:07pm

This? Forecast.Solar - Home Assistant

B1rdEater · November 10, 2022, 10:08pm

Exactly; weather included. No clear-sky option

nickrout · November 10, 2022, 11:13pm

I see. Sorry I haven’t made a study of this. Why not ask thomasberger.be where they get their data?

B1rdEater · November 11, 2022, 9:51am

Thats exactly what I did yesterday evening; trying to convince him to recode it so that HASS can use it as an integration

B1rdEater · November 13, 2022, 8:10am

He is making a JSON

justone · January 9, 2023, 7:39pm

It there an easy way to trigger an update (not a fixed scan_interval) but something such another state changed?