Scrape data from multiple site, merge with CSV and display daily relevant details

Hello,
I am trying to find the easiest and most efficient way to fetch and display daily relevant details for my family and is looking for guidance regarding the right “architecture”.

My requirements are the following:

  1. Fetch and consolidate timeboard information from multiple websites
    => Websites display their timeboard using different formats:
    • Monday 08:00-18:00 for weekly schedule
    • Mo, 25.04.22 08:00-18:00 for daily schedule
  2. Consolidate data from (1) with CSV for non fetchable information
    => I receive yearly calendar in PDF and sometimes paper and I guess that creating a CSV once a year is the easiest.
  3. Display information based on the day of the week using a lovelace card

I tried scrape and multiscrape but don’t see how I can fetch data from multiple URL and then merge with a CSV to then only display the daily relevant information.

Any guidance will be highly appreciated.

As an idea, one could scrape each site separately (which I would do anyhow so that I can update one in case it changes). Their output you can then concatenate using the command line sensor, and remove duplicate headers (several examples on the net)? It is not beautiful but may work

Thanks. I have started exploring both ways:

  1. Use multiscrape to daily fetch the data.
  2. Use wget to scrape the page from HA and consolidate the data into a CSV
    I still need to invest time to get it to work.
    Thanks for the command line sensor tips that I didn’t know.