Multiscrape/scrape help

I’ve been using the multiscrape sensor for months to obtain the artist and song information from a radio station I’m streaming. Recently I noticed the sensor was incorporated into official HACS. I removed the old custom configuration, installed the HACS version, and re-created my sensor. Since the syntax seemed slightly different, I started over using the default documentation. Here is my attempt at retrieving the info, which was working up until the last few days:

multiscrape:
  - resource: https://mykgbi.com/recently-played/
    scan_interval: 30
    sensor:
      - unique_id: kgbi_artist
        name: KGBI Artist
        select: ".artistname a"
      - unique_id: kgbi_song
        name: KGBI Song
        select: ".songname a"

I’m having two issues:

  1. The song name isn’t being scraped, although this selection worked before with the same site. I’ve also tried "div.currentsonginfo:nth-child(2) > h4:nth-child(1) > span:nth-child(1) > a:nth-child(1)" without success. The song name is just blank.

  2. Despite the 30-second interval, the artist name stays stagnant for 10 minutes or more, during which time the artist name changes 3 or more times on the web page.

Can someone please help me figure out how to properly scrape this site for the information I need?

There’s been some progress, but since I didn’t change anything to fix it, I’m not sure why it changed. As of today, the song name is being scraped without any changes to the configuration. The artist information also seems to stay up-to-date.

Now I’ve started working on the Icon URL, which I’ve never successfully retrieved. Following my example above, I added the following to the sensor section in multiscrape:

- unique_id: kgbi_icon
  name: KGBI Icon
  select: ".col.col-auto.current_photo img"

When that didn’t work, I tried changing select to "div.now_playing:nth-child(1) > div:nth-child(2) > div:nth-child(1) > img:nth-child(1)"

When this still didn’t work, I read that I should be using an attribute since I wanted to capture the value of the URL. This is my current configuration:

      - unique_id: kgbi_icon
        name: KGBI Icon
        select: "div.now_playing:nth-child(1) > div:nth-child(2) > div:nth-child(1) > img:nth-child(1)"
        attribute: src

Finally, after months of trying off-and-on, I have a URL!

The problem is the HTML source always has the placeholder image in this location, https://mykgbi.com/wp-content/themes/nwm/img/icons/gravatar-life.png. This never changes even when the actual album icon properly displays on the website. For example, the current song playing as I type this has this image displayed on the page and seen in the inspector:

https://m.media-amazon.com/images/I/618bZYc88zL._SL500_.jpg

However, the page source and my scrape value both have the gravatar-life.png shown above. It seems like the website is using javascript to dynamically overwrite the placeholder icon with the correct icon. Even when no album cover exists, the site dynamically serves the placeholder from cloudfront.net and not the mykgbi.com URL seen in the page source.

Is there anyway to scrape the dynamic URL instead of the static placeholder?

Now the site has changed their URL and layout, and I’m no longer getting anything. They redirected to the old URL for quite some time, but lately everything is “unavailable - unavailable.” Can someone who understands CSS/Beautiful Soup/Scraping better than me please take a look at the site and give me an idea how to select the artist and song info with the new URL? Contemporary Christian Music - Life 100.7 is the site, and the song name is at

#recently-played-list-block_242c6b9556c81e6f220820d4edbacb2a > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > p:nth-child(1) > a:nth-child(1)

or

html body.page-template-default.page.page-id-4755.wp-custom-logo.wp-embed-responsive.tribe-js.music.tribe-theme-gravity-global.gravity-block-navigation--initialized.gg-navigation--initialized div.wp-site-blocks header.wp-block-template-part div.wp-block-group.gg-navigation.has-global-padding.is-layout-constrained.at-the-top div.wp-block-group.alignwide.gg-navigation__sub-nav.is-layout-flow div.wp-block-cover.gg-py-24.gg-px-40 div.wp-block-cover__inner-container div.wp-block-columns.is-not-stacked-on-mobile.is-layout-flex.wp-container-47 div.wp-block-column.is-vertically-aligned-center.gg-navigation__player-group.flex-center-vertical.is-layout-flow div.wp-block-group.is-layout-flow div#recently-played-list-block_54575e17bcbb92c293b238136dea9bef.gravity-block-recently-played-list.is-style-recently-played-list-v4 div.gravity-block-recently-played-list__list.fade-in-list.show-list

Depending on how you copy the CSS (Path vs. Selector)

After watching a helpful video on Beautifulsoup, I’ve gotten much closer:

multiscrape:
  - resource: https://www.lifeomaha.com/
    scan_interval: 30
    sensor:
      - unique_id: kgbi_artist
        name: KGBI Artist
        select: "p.gravity-block-recently-played-list__title" 
      - unique_id: kgbi_song
        name: KGBI Song
        select: "p.gravity-block-recently-played-list__name"
      - unique_id: kgbi_icon
        name: KGBI Icon
        select: "img.gravity-block-recently-played-list__image"
        attribute: src

The song and artist names populate properly, but for the image instead of a URL I’m getting this code:

data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==

I’m searching for further information, but since selecting an attribute in HA isn’t the same as selecting one in Beautifulsoup, I’m having trouble finding the answer. Any ideas?

1 step forward, 2 steps back! I learned I needed to use “data-src” instead of “src” to obtain the URL. In the meantime I also decided to switch to the UI to see if that made it easier to manage the scrape in the future. To that end, I commented out the multiscrape sensors above, restarted HA, and added the Scrape UI integration. For the resource I chose https://www.lifeomaha.com and left the rest as default.

I then created three sensors for that resource using the selections I specified above, using “data-src” instead of “src” for the image. I also took the opportunity to rename them from KGBI to Life 100. At the end I had 3 sensors that seemed to match and seemed to update, until…

After about 10 minutes of updating more-or-less when the song changed, it stuck on one particular song and wouldn’t budge, even when the song changed on the website. The tag I chose appeared in multiple entries on the recently played page, so I started playing with the index number to see if that would help. I found I could move through the “recently played” list by increasing the index number, but reducing it to 0 returned to the song that was active 20-30 minutes ago and way down the “recently played” list.

I’ve tried restarting the Scrape integration and HA multiple times, experimented with different ways to select the element, and generally fooled around with changes I admit I don’t fully understand. Every once in a while the scrape entity has the correct data, but then it goes back to a previous song and sits there for 10, 20, 30 minutes or more.

As recommended in a different post, when I switched to the GUI I had disabled the GUI Scrape updating (which is fixed at 10 minutes) and instead created an automation that updated the three entities every 30 seconds. No dice, it still has the same issue. I tried manually updating one entity (artist name, for example) in the Developer tools, but that didn’t update the entity to the correct value.

Finally I decided to go back to what had worked before the URL changed. I deleted the Scrape integration and re-enabled my multiscrape config, changing the names and “src” as shown below:

multiscrape:
  - resource: https://www.lifeomaha.com/
    scan_interval: 30
    sensor:
      - unique_id: life_100_artist
        name: Life 100 Artist
        select: "p.gravity-block-recently-played-list__title" 
      - unique_id: life_100_song
        name: Life 100 Song
        select: "p.gravity-block-recently-played-list__name"
      - unique_id: life_100_icon
        name: Life 100 Icon
        select: "img.gravity-block-recently-played-list__image"
        attribute: data-src

This hasn’t helped either. I’m going to try to stop changing things for awhile to see if anyone else sees this post and has an idea.