HA multiscrape - no result getting value

Hello all

I’m trying to get the hourly energy prices from this site:

I am new at scraping, but believe I have understood that my problem is that the actually values are coming from API’s / Javascript or a combination, and this is excatly my problem, I can scrape all sorts of text - just not the variale parts of the page.

So, what am I after ?

Basically, I would like to be able to read out the values marked with red sqares below.

I am however only getting the static text. Here is my log, from trying with the red box down-left:

2022-10-29 19:53:13 ERROR (MainThread) [custom_components.multiscrape.sensor] strom_nrgi # strom_nrgi # Unable to scrape data: Could not find a tag for given selector

This is the code:

multiscrape:
  - name: strom_nrgi
    resource: https://nrgi.dk/privat/stroem/bliv-klogere-paa-stroem/foelg-timeprisen-paa-stroem/
    scan_interval: 3600
    sensor:
      - unique_id: strom_nrgi
        name: strom_nrgi
        select: "#main > div > section:nth-child(2) > div.hourly-rates.standard-content.theme-default.p-bottom > div.hourly-rate-cards > div > div.hourly-rate-card-item.is-cheapest.is-active"
        unit_of_measurement: DKK

If I go a couple of Div’s up, I can get the static text:

Code:

multiscrape:
  - name: strom_nrgi
    resource: https://nrgi.dk/privat/stroem/bliv-klogere-paa-stroem/foelg-timeprisen-paa-stroem/
    scan_interval: 3600
    sensor:
      - unique_id: strom_nrgi
        name: strom_nrgi
        select: "#main > div > section:nth-child(2) > div.hourly-rates.standard-content.theme-default.p-bottom > div.hourly-rate-cards"
        unit_of_measurement: DKK

My log shows static text, but no values:

Billigste time

-

-

Dyreste time

-

-

Gennemsnitspris

-

-

Is there any way to scrape these values ? Btw, it seems that the problem is valid for all “variable” text on pages, I have tried.

Thanks for any help !

Michael

I took a look at the øst section (DK2) and it is just the hour prices from Nordpool with around 14øre added to the price.
I am missing a few decimals on the webpage to calculate the precise value added to the Nordpool prices, but an estimate would be 11,5øre excl. moms and that makes then 14,375øre incl. moms.

There is a Nordpool integration available, so I guess that is the better choice.
If you combine it with the eloverblik integration, then you can get transport expenses extracted too and that makes it possible to calculate the exact total price.

You can get the data in json directlyy with

https://nrgi.dk/api/common/pricehistory?region=DK1&date=2022-10-30

Hello WallyR

Thanks for the tip. Sucessfully implemented the Eloverblik integration, and now looking into the Nordpool.

/Michael

Hello Koying

Thank you for this advise - I didn’t know that.

My initial question is still open however, if this API service would not be accessible on anoter page…

Besides of that, my limited experience with JSON is to pull some data from SONOFF devices, where I ask for the exact variable cmnd.

Are you suggesting that I scrape the page provided by you (which in fact gives me all the data I need), and do some templating to get the values I need into sensors ?

/Michael

Hello again

I tried with this code:

  - platform: rest
    name: strom_nrgi_json_api_doegn
    resource: https://nrgi.dk/api/common/pricehistory?region=DK1&date=2022-10-30
    value_template: '1'
    json_attributes:
    - localTime
    - priceInclVat
    - isHighestPrice
    - isLowestPrice

If I remove the “value_template: ‘1’”, I get an error because the string is longer than 255, If I leave it my sensor value is just “1” - no attributes visible.

/Michael

Try this.
I have not tested it yet, so there might be some small errors. I am especially a bit uncertain on the time in the URL.
I have set the update to 900 seconds (15mins), since the update only really occur once a day after 15:00.

 - platform: rest
    name: nrgi_idag
    resource: https://nrgi.dk/api/common/pricehistory?region=DK1&date={{ now().strftime('%Y-%m-%d') }}
    method: GET
    scan_interval: 900
    value_template: 'OK'
    json_attributes:
      - localTime
      - priceInclVat
      - isHighestPrice
      - isLowestPrice

Hello WallyR

Thanks so much for your help. Your suggested code, unfortunately just returns a “OK”

If I query from a browser, I get the following result, thus my suggested attributesnames - don’t know if I got it worng.

/Michael

I have to admit I am not good at doing this in HA, because I normally use NodeRed for this stuff.
But try this then.

- platform: rest
    name: nrgi_idag
    resource: https://nrgi.dk/api/common/pricehistory?region=DK1&date={{ now().strftime('%Y-%m-%d') }}
    method: GET
    scan_interval: 900
    value_template: "OK"
    json_attributes:
      - prices
      - region
      - date
      - currentPrice
      - averagePrice
      - lowestPrice
      - highestPrice
1 Like

SPOT-ON WallyR !

That did it.

Now I just need to figure out how to divide the attributes up, per timeslot so I can calculate on the hourly values, but I guess the fast and easy solution to that, is to just create 24 hourly template sensors.

Thanks a lot WallyR

/Michael

{{ state_attr(‘sensor.nrgi’,‘prices’)[0][‘value’] }}

Hello WallyR

I couldn’t make your code work, but after a lot of tries, this very close syntax worked :slight_smile:

value_template:  "{{ state_attr('sensor.strom_nrgi_json_api_doegn', 'prices')[0].priceInclVat }}"

So it was just the sqare brackets and apstrophes, that was confusing me yesterday, when I tried.

Thanks again WallyR

/Michael

Good that it worked in the end.
Happy fiddling with the rest. :slight_smile:

I’m attempting to use multiscrape to get values from a couple of websites, but i’m struggling to get these working. I’ll post my issues as two separate posts. First is i’m trying to get my current elec rates from my EMC. This will be multiple rates to support a tiered by structure, but i’ll start by just showing the first value that i’m attempting to scrape. According to log files, i’m getting Server Error ‘503 Service Temporarily Unavailable’.

Here’s a pic of the elements from chrome inspector:

Here’s my multiscrape settings in my config file. Let me know if anyone sees any obvious error…thanks.

  - resource: "https://www.jacksonemc.com/my-cooperative/rates/residential-rates/residential-service"
    scan_interval: 86400
    sensor:
      - unique_id: first_650kwh_winter
        name: First 650 kWh Winter
        select: "#content > div > div.row > div.col-xl-8.col-lg-8.col-md-12.col-sm-12 > div > div > table > tbody > tr:nth-child(3) > td:nth-child(2)"
        value_template: '{{ value.split("¢")[0] }}'
        unit_of_measurement: "¢ per kWh"
        icon: mdi:transmission-tower

My next value that I would like to scrape is water level for a nearby lake. Nothing in the log files regarding any type of error with this website…

Here’s a pic of the elements from chrome inspector:

Screen Shot 2023-01-19 at 4.19.06 PM

Here’s my multiscrape settings in my config file. Thanks in advance for any advice…

  - resource: https://https://lanier.uslakes.info/
    scan_interval: 86400
    sensor:
      - unique_id: lake_lanier_level
        name: Lake Lanier Level
        select: "#main-content > table:nth-child(3) > tr:nth-child(1) > td:nth-child(1) > div:nth-child(1) > div:nth-child(2)"

Means the server is having issues, so you need to wait for the site to be fixed.
And you should really make a new post instead of replying to an old one. You will miss a lot of potential helpers by posting in an old thread instead of a new one, because many just look on the list of new posts.

Thanks Wally. But the server has no issues when i use by browser, I only get that error when attempting to scrape through HA.

Agree about reviving an old thread, but i gain a lot by reading old threads so i thought i would just keep the topic going for future readers…

You are scraping another site with other characteristics, so it does really not make that much sense to continue this thread.
People looking for help with scraping the sites you do here might simple skip this thread because it is on about the wrong sites.