Scrape Help Needed for Radio Station

I’ve been trying to nail down the correct select string to obtain the Song Name, Artist Name, and Image from this radio station’s web page: https://mykgbi.com/recently-played/

From looking at the Firefox console, I know the elements I want are at the following CSS Selector / CSS Paths:

Song title:
CSS Selector: div.currentsonginfo:nth-child(2) > h4:nth-child(1) > span:nth-child(1) > a:nth-child(1)
CSS Path: html.js.flexbox.canvas.canvastext.webgl.no-touch.geolocation.postmessage.no-websqldatabase.indexeddb.hashchange.history.draganddrop.websockets.rgba.hsla.multiplebgs.backgroundsize.borderimage.borderradius.boxshadow.textshadow.opacity.cssanimations.csscolumns.cssgradients.no-cssreflections.csstransforms.csstransforms3d.csstransitions.fontface.generatedcontent.video.audio.localstorage.sessionstorage.webworkers.applicationcache.svg.inlinesvg.smil.svgclippaths body.archive.post-type-archive.post-type-archive-recentlyplayedsongs.tribe-js.tribe-theme-parent-nwm.tribe-theme-child-kgbi.fixed-nav div.wrapper div#streamer-container.default-stream section.streamer.inverse div.row.align-items-center div.col.col-auto.now_playing div.row.align-items-center.now_playing_content div.col.currentsonginfo h4 span.songname a.songlink
Artist name:
CSS Selector: div.currentsonginfo:nth-child(2) > p:nth-child(2) > span:nth-child(1) > a:nth-child(1)
CSS Path: html.js.flexbox.canvas.canvastext.webgl.no-touch.geolocation.postmessage.no-websqldatabase.indexeddb.hashchange.history.draganddrop.websockets.rgba.hsla.multiplebgs.backgroundsize.borderimage.borderradius.boxshadow.textshadow.opacity.cssanimations.csscolumns.cssgradients.no-cssreflections.csstransforms.csstransforms3d.csstransitions.fontface.generatedcontent.video.audio.localstorage.sessionstorage.webworkers.applicationcache.svg.inlinesvg.smil.svgclippaths body.archive.post-type-archive.post-type-archive-recentlyplayedsongs.tribe-js.tribe-theme-parent-nwm.tribe-theme-child-kgbi.fixed-nav div.wrapper div#streamer-container.default-stream section.streamer.inverse div.row.align-items-center div.col.col-auto.now_playing div.row.align-items-center.now_playing_content div.col.currentsonginfo p span.artistname a.artistlink
Image:
CSS Selector: div.now_playing:nth-child(1) > div:nth-child(2) > div:nth-child(1) > img:nth-child(1)
CSS Path: html.js.flexbox.canvas.canvastext.webgl.no-touch.geolocation.postmessage.no-websqldatabase.indexeddb.hashchange.history.draganddrop.websockets.rgba.hsla.multiplebgs.backgroundsize.borderimage.borderradius.boxshadow.textshadow.opacity.cssanimations.csscolumns.cssgradients.no-cssreflections.csstransforms.csstransforms3d.csstransitions.fontface.generatedcontent.video.audio.localstorage.sessionstorage.webworkers.applicationcache.svg.inlinesvg.smil.svgclippaths body.archive.post-type-archive.post-type-archive-recentlyplayedsongs.tribe-js.tribe-theme-parent-nwm.tribe-theme-child-kgbi.fixed-nav div.wrapper div#streamer-container.default-stream section.streamer.inverse div.row.align-items-center div.col.col-auto.now_playing div.row.align-items-center.now_playing_content div.col.col-auto.current_photo img

Realizing I might need three different sensors for this, I’m focusing on just the song title for now. Here’s my sensor attempt:

  - platform: scrape   
    resource: https://mykgbi.com/recently-played/
#    select: 'div.currentsonginfo:nth-child(2) > h4:nth-child(1) > span:nth-child(1) > a:nth-child(1)'
#    select: "div.currentsonginfo:nth-child(2) > h4:nth-child(1) > span:nth-child(1) > a:nth-child(1)"
    select: 'h4 > span > a'
    name: KGBI RDS

The commented out lines are two of the many other combinations I’ve tried. I noticed the documentation and examples sometimes use ’ and sometimes ", so I’ve tried both with each string just in case that makes a difference. I restart the server every time to reinitiate the sensor. I’ve tried ".songlink" which should retrieve class="songlink" according to the Beautiful Soup CSS selector documentation. I’ve tried too many other combinations to document.

I have yet to retrieve any text, but sometimes the sensor is blank, sometimes it’s “Unknown,” and other times it’s a different color as if it were retrieving something, but there’s no text. From the examples which I also pasted in to be sure I could at least follow an example, I should be seeing some text when I look at the sensor here:

image

Can someone please help me figure out the correct select values here?

As is often the case, I found the answer after asking for help. Here’s what worked for me:

  - platform: scrape   
    resource: https://mykgbi.com/recently-played/
    select: ".songname a"
    name: KGBI RDS

Now I’m off to make some entities for the other elements and a mini-media-player card that will display the information.

I’m continuing to document here in case it helps me or anyone else figure out the scrape syntax. The following code for image is getting me a URL, but not the one I want:

  - platform: scrape   
    resource: https://mykgbi.com/recently-played/
    select: ".streamer.inverse .col.col-auto.current_photo img"
    attribute: src
    name: KGBI Image

The problem is that when I look in the inspector, that contains the image that goes along with the current song, for example https://m.media-amazon.com/images/I/51TCg7Ku3dL.jpg. However, when I view source, that section has the standard logo entry, https://mykgbi.com/wp-content/themes/nwm/img/icons/gravatar-life.png. It’s been awhile since I’ve used CSS on a page. It seems like somehow the image is being served without it being shown in the HTML source. The relevant section of the source for the area where I’m grabbing information is:

<div class="row align-items-center now_playing_content">
  <div class="col col-auto current_photo">
    <img src="[https://mykgbi.com/wp-content/themes/nwm/img/icons/gravatar-life.png](view-source:https://mykgbi.com/wp-content/themes/nwm/img/icons/gravatar-life.png)">
  </div>
  <div class="col currentsonginfo">
    <h4><span class="songname"><a class="songlink"></a></span></h4>
    <p><span class="artistname"><a class="artistlink"></a></span></p>
  </div>
</div>

This makes it look like the img src is always the standard icon (gravatar-life.png), but it also looks as though there’s no artist or song info. However, my scrape sensors retrieve the current artist and song title while only retrieving gravatar-life.png. I’m going to keep messing with it to see if I can figure out how to grab it.

It doesn’t help that about 50% of the songs seem to not display a logo in their streamer.inverse container while the same songs show a logo in the main recently played page. I think for now I’m going to have to be content with just displaying the artist and song title.

image

For example, scraping when this is shown results in Artist: Tenth Avenue North, Song: Control, and Image: https://mykgbi.com/wp-content/themes/nwm/img/icons/gravatar-life.png

(I had the element selected, so it was highlighted, but I think it’s obvious a different image is displayed)

Any ideas on this? It’s frustrating to see the correct image in the inspector but nowhere in the HTML when I view source. From my Google searches it appears scripts have the ability to change the DOM without a corresponding change in HTML. If that’s the case here, is there any hope of scraping the image when one is displayed on the player?

1 Like

Did you solve your problem, friend?
I just wanted to capture the link that appears after the abbreviation “src”, do you have any idea how to do it?

Unfortunately, I didn’t. In my case, I think the image element is generated dynamically by a script. Since it’s not available in the HTML, the scraper can’t capture it. I haven’t kept up with the scraper to see if there have been any updates that might allow this case to work.

1 Like

I got it. Sucks, man.