Need help for scrape data from website

Hello,

i know there are hundreds of topics with the same title but i cant get it work for my scrape sensor.

i wanna scrape the playing songtitle from 2 radio stations.

1.) 88.6 Hard Rock
2.) https://soundportal.at

i played arround with some select options but never got the data.

my sensors looks like this right now:

- platform: scrape
    resource: https://soundportal.at
    select: '.tab-content >.tab-pane fade show active > .tx-sendungen-pi1> .card card-onair> .tickerv-wrap> div'
    name: soundportalInterpret

 - platform: scrape
    resource: https://radioplayer.radio886.at/88.6_Hard_Rock
    select: 'span[class="ipplaylist__interpret"]'
    name: 886Interpret

what i am doing wrong?

hopefully someone can help me.

/roy

First step is often to use your browser tools to find out if you cannot get the data more easily than with page scrapping.

E.g. https://meta.radio886.at/HardRock/0 will get you the current playlist in JSON for the 1st one, with "is_playing": true telling you the currently playing.

thanks that helped me a lot for the 88.6 radio station.

how you got the url?
i tried it with the developer-tools from the browser.

Exactly that

Look first for “json” data type, then for types which are not “js”, “css”, “html”, …

If the data is generated on the server (no json), look for the plain html containing the data, which is easier to parse than the “final” page, which is plastered with javascript-generated data.

Servus,

both websites require Javascript to show the interpret, so scraping won’t work.

I’ve taken a look at the source code, the Javascript is reaching out to an API which delivers those values in JSON format. You can also get the title information etc.
As koying pointed out, you can usually find those URIs in ‘Network Tools’ in in your browser dev console.

This config should work:

- platform: rest
  name: "Radio886 Interpret"
  resource: "https://meta.radio886.at/HardRock"
  value_template: >
    {% set res = value_json['data'] | selectattr('is_playing','true') | list | first %}
    {{ res['name'] | title }}

- platform: rest
  name: "Soundportal Interpret"
  resource: "https://soundportal.at//typo3conf/ext/aba_nowonair/Resources/Public/Cache/now_on_air.html"
  value_template: >
    {{ value | regex_findall_index(find='interpret:(.*)<br\/>', index=0, ignorecase=True) }}

thanks that helped me.