How can I get larger responses from http requests

I get a response with the html of the page but only the 891 first characters.
Is there anything I can do to get “all”? 891 is barely the header of most webpages.


As you can see the response ends with “…”.
Is this a node red limit or is it the http request limit?

That’s a limit of the UI itself. Try writing the response out to a file and the complete response should be there.

I can’t seem to get that to work.
Does Node red need some permissions to write files?

I used this but nothing got created.

Not sure. Are you running the NodeRed server as an addon or as a standalone app? If you’re running it as an addon, you might need to write to /local/test.txt.

I have node red as an addon I believe.
Didn’t work either.

Maybe I can give it up anyways I just noticed reading through more of the html there is a snippet saying javascript is required for this page, thus most likely I won’t get the information I want anyways.

1 Like

What are you trying to scrape? There’s LOTS of ways around the JS thing. :slight_smile:

It’s my spouse work schedule.
The webpage has a calendar built with a html table and I was hoping to somehow transform that in to an agenda style text with date, time and so on.

I’ll put it on hold for a while at least.

I’m loading 4kB json file from remote server without problem.

Ahhhh, ok. When you decide to pick it back up, take a look at jsDom (https://github.com/jsdom/jsdom). It’s a headless web browser (basically) that can execute and parse pages like a browser would. The downside is that there isn’t a Node-Red pallet available for it, but it is available as a NPM package which means that you can load it into NodeRed and use it in the contrib Function node (https://flows.nodered.org/node/node-red-contrib-function-npm).

I was looking at it simultaneous with PHP to see if I could parse the data there and then have a simple scrape sensor.
But by some reason I could not log in using PHP cURL.

If I had follow redirect on, it returned a webpage that I have never seen and asked if I want to log out and had a login form (can’t replicate that in real life).
If I disabled follow redirect I got to a page saying it will redirect me soon or click the link. Then clicking the link logged me in and everything was fine.

Perhaps it noticed it was a scraper and logs you out automatically :thinking:

Most likely it’s looking for auth headers in the request. When you have the page open in your browser, take a look at the headers and copy them over into your curl request and see what happens.

I can’t find that.
I was looking for that earlier but I can’t find the POST of the username and password.

Perhaps there is one more way… Let me see.

Nope…

The form data is just not there…
I don’t understand. Unless it’s sent with GET… I hope not…
But the first time I see a POST is about 500 ms in to the page load. Before that everything is GET.

Hmmmm, that is weird! If they are passing it via GET with query strings, that’s a HUGE red flag. LOL

What is showing in the request headers for the POST request?

Request headers is nothing really… Nothing that makes me think “that is it!”.


I have just masked the URL of the page.

But there is a Request payload that is well, nothing to see…

Maybe its done in JavaScript.
You know a JavaScript file on the server with all the user credentials then when the user wants to log in the auth process can be done LOCALLY.
Isn’t that what we all want?

:smiley:

Well, the easy way to figure that out is to look at the source for the page and check the JS files.

What I would do is log out of the site, clear your cache, and then log back in again and see what it throws into the request. My guess is that they are using AJAX for logging in and out and storing the session in the cookie.

Obviously…
Facepalm, I didn’t use the incognito mode so most likely there is a cookie file with a stored session.
That could explain things.

I’ll see what it does in incognito when kids are in bed.

1 Like

This is impossible…

I used incognito mode and got as far as I can see the same responses in the developer tools.
No obvious POST data and nothing that looks correct.

So I looked at the html again and noticed the submit doing a JavaScript function.

<form method="POST" name="loginForm" id="loginForm" ng-hide="features.upgrade_lock" ng-submit="doPreLogin()" novalidate="">

So naturally I went through the JavaScript in the html file and the 33 files (!) that is linked.
Some is JavaSript, some is Ajax.
About half of them are obfuscated (more than what JavaScript is naturally) and I did nothing more than search for “doPre” in those.
The other files I did some fast read through and searched for “doPre”.
Nothing…

I assume the function doPreLogin is in one of the obfuscated files.