Web Scrape HTML generated by Javascript

I’m trying to scrape a date from a webpage but it’s generated by javascript so I’m only getting the placeholder as a result. HA is only returning “Month Day Year Time” not the actual date/time I’m looking for. Is it possible to get this date into HA somehow?

  - platform: scrape
    resource: http://www.wheniscriticalrole.com
    name: Next Critical Role
    select: '#next-event'
    headers:
      User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36

I’ll be coming back to this but as you mentioned, the counter is 100% JS, so it’s not too sensible to pull it that way.

If you open up the Developer Tools from whatever browser tickles your fancy and have a look around for something closer to an API call that retrieves the data you want without the HTML/CSS.

You are able to find something else you’re able to poll against.
Just click around and obviously ignore the images. Click on “Response” to see what is returned.

Got this:

So if you switch up your request from a scrape to rest (probably?) then you can just hit:
https://www.wheniscriticalrole.com/javascript/serverunixtime-v2a?id=3349307248686709

Not sure about the relationship with id and the number there and I haven’t been able to convert 1602218705057 using an EPOC calculator yet.

It’s another angle you can use to get what you want :slight_smile:


Edit 1

Found this function on line 20 of the prettified JS in file https://www.wheniscriticalrole.com/javascript/event-timer-v2a07.min.js:

  n.prototype.getTimeRemaining = function (t) {
    var e = this.stopTime.diff(t);
    return {
      total: e,
      seconds: Math.floor(e / 1000 % 60),
      minutes: Math.floor(e / 60000 % 60),
      hours: Math.floor(e / 3600000 % 24),
      days: Math.floor(e / 86400000)
    }
  },

Assuming that the variable t is what’s returned in that call.


Edit 2
Giving up and calling it a red herring. It’s counting up, so I think it is returning some kind of counter from another server instead of time from the user :man_shrugging:

2 Likes

In the script tag in the bottom there is an inline writte json object that has the actual next schedule. I gave up since scrape sensor for this does not work since its inside the script tag and then javascript, impossible to fetch with css selectors.

And the restful was hoping they did not change anything on the page in terms of characters and then taking the entire page as a json response and skipping the first 4000 characters or something like that. And then decoding the string as json.

So if anyone finds a better way or writes a js-enabled addon (puppeteer for instance) for scraping let me know.