Using Scrape to extract Video from Website and cast it

Hello,

I’m trying to use the scrape sensor to extract the daily changing mp4 URL of a news show to automatically play it to my chromecast.
What I want to extract ist the part included in content="…"

<meta name="twitter:player:stream" content="https://media.tagesschau.de/video/2017/1214/TV-20171214-1742-0401.webm.h264.mp4">

What I have right know

- platform: scrape
  resource: https://www.tagesschau.de/multimedia/video/video-356747~player.html
  name: TagesschauIn100Sekunden
  select: "meta[name=twitter:player:stream]"

What I want to do with it: I want to use the URL saved in the variable / name “TagesschauIn100Sekunden” and want it to be used as the media_content_id to cast it to my TV

   - service: media_player.play_media
     entity_id: media_player.chromecast
    data:
       media_content_id: TagesschauIn100Sekunden
       media_content_type: video

But obviously it does not work this way.

I already tried to find the right CSS selectors but did not succeed: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

A little help would be appreciated :slight_smile:
Thank you!!

EDIT: I now know that I could select the specific meta tag like this head > meta:nth-child(24) but doing it like that may not always work as it may not always be the 24th child and I don’t know how to access the content="" part of the element

Have a look at media_extractor, it supports tagesschau so you don’t have to do any scraping yourself: https://home-assistant.io/components/media_extractor/

1 Like

Works perfectly! Thank you very much for the hint. I didn’t know about the media_extractor. It works exactly the way I need it to work.

1 Like

Sounds interesting! @rcma, would you share your config for this?
Thanks in advance.

hi @rcma

can you please share your configuration for media_extractor and how you have utilised it please?

thanks.

Sorry it took me so long to respond.
Here is my current automation using media_extractor:

    - alias: "Tagesschau in 100 Sekunden"
  trigger:
    platform: state
    entity_id: device_tracker.Phone
    to: "home"
  action:
    - service: media_extractor.play_media
      entity_id: media_player.TV
      data:
        media_content_id: "http://www.tagesschau.de/100sekunden/index.html"
        media_content_type: video

- alias: "Tagesschau normal"
   trigger:
    platform: state
    entity_id: device_tracker.Phone
    to: "home"
  action:
    - service: media_extractor.play_media
      entity_id: media_player.TV
      data:
        media_content_id: "http://www.tagesschau.de/sendung/letzte-sendung/index.html"
        media_content_type: video

This is currently a very basic implementation that needs more refined triggers. But the main intent, extracting the link from the URL and casting it works perfectly fine.

3 Likes

Thanks @rcma

Quick question: does the above automation download the video from the website to the system running HA and then play it straight to your TV?

I tried your code above to play the audio on my chromecast audio from a YouTube video but it didn’t work for me.

Yes it does start playing right on the TV. I don’t think that Media Extractor downloads the video but it passes the video URL to the chromecast. The chromecast starts streaming.
Unfortunately, I can’t help you with the chromecast audio.

1 Like

thats exactly what I searched for Tagesschau :slight_smile: thanks for sharing! Is it maybe also possible with the media extractor to send Youtube Playlists to my TV?

Are there more websites like Tagesschau to scrape the video? And is it possible to play a playlist of this?

I was using the media_extractor for “Tagesschau in 100s” for a long time. But it stopped working 1 month ago and I don’t get it running again.

Using this in a script:

data:
  media_content_id: 'http://www.tagesschau.de/100sekunden/index.html'
  media_content_type: video
service: media_extractor.play_media
entity_id: media_player.philipstv

And this is the error, out of the logs:
grafik

I don’t know how to fix that :confused:

Edit: This is the url: http://www.tagesschau.de/100sekunden/index.html
Edit2: Ok there is an issue on the youtube-dl, the problem is because of the new design of the website…

Until youtube-dl is fixed, you can use the official video podcast URL in high quality:

service: media_extractor.play_media
data:
  media_content_id: https://www.tagesschau.de/export/video-podcast/webxl/tagesschau_https/
  media_content_type: video
target:
  entity_id: media_player.my-player

You can find the different versions/podcasts here.

3 Likes