I get a response with the html of the page but only the 891 first characters.
Is there anything I can do to get “all”? 891 is barely the header of most webpages.
Not sure. Are you running the NodeRed server as an addon or as a standalone app? If you’re running it as an addon, you might need to write to /local/test.txt.
I have node red as an addon I believe.
Didn’t work either.
Maybe I can give it up anyways I just noticed reading through more of the html there is a snippet saying javascript is required for this page, thus most likely I won’t get the information I want anyways.
It’s my spouse work schedule.
The webpage has a calendar built with a html table and I was hoping to somehow transform that in to an agenda style text with date, time and so on.
Ahhhh, ok. When you decide to pick it back up, take a look at jsDom (https://github.com/jsdom/jsdom). It’s a headless web browser (basically) that can execute and parse pages like a browser would. The downside is that there isn’t a Node-Red pallet available for it, but it is available as a NPM package which means that you can load it into NodeRed and use it in the contrib Function node (https://flows.nodered.org/node/node-red-contrib-function-npm).
I was looking at it simultaneous with PHP to see if I could parse the data there and then have a simple scrape sensor.
But by some reason I could not log in using PHP cURL.
If I had follow redirect on, it returned a webpage that I have never seen and asked if I want to log out and had a login form (can’t replicate that in real life).
If I disabled follow redirect I got to a page saying it will redirect me soon or click the link. Then clicking the link logged me in and everything was fine.
Perhaps it noticed it was a scraper and logs you out automatically
Most likely it’s looking for auth headers in the request. When you have the page open in your browser, take a look at the headers and copy them over into your curl request and see what happens.
The form data is just not there…
I don’t understand. Unless it’s sent with GET… I hope not…
But the first time I see a POST is about 500 ms in to the page load. Before that everything is GET.
Maybe its done in JavaScript.
You know a JavaScript file on the server with all the user credentials then when the user wants to log in the auth process can be done LOCALLY.
Isn’t that what we all want?
Well, the easy way to figure that out is to look at the source for the page and check the JS files.
What I would do is log out of the site, clear your cache, and then log back in again and see what it throws into the request. My guess is that they are using AJAX for logging in and out and storing the session in the cookie.
So naturally I went through the JavaScript in the html file and the 33 files (!) that is linked.
Some is JavaSript, some is Ajax.
About half of them are obfuscated (more than what JavaScript is naturally) and I did nothing more than search for “doPre” in those.
The other files I did some fast read through and searched for “doPre”.
Nothing…
I assume the function doPreLogin is in one of the obfuscated files.