Scraping website behind login

Hi Everyone,

I would like to scrape my power/hydro bill amount and bill due date from my local provider’s website. I would like to use Node-Red if possible. I can easily scrape a site that isn’t behind login using the www-request node however I haven’t found a way to scrape a site that requires you to log in first. As an alternative, I’ve installed the nbrowser node which looks like it will work however when I call the node, I see absolutely nothing (no errors, no results etc). There is an option from the node to show a browser window instance but I can’t seem to get it to work. Has anyone else had any success in running this node in node-red (from Frenck) on Hassos?

You should have a look at the headers to find how to log and make request.

Best tool for my point of view is to make try with postman and find what headers/bearer need to be add.

1 Like

Thanks @tikismoke I’ll have a look at postman

@Schocker did you manage to get anywhere with this? - I’m facing exactly the same issue for my LPG tank data.

I thought I’d hit the jackpot when I stumbled on nbrowser but it doesn’t appear to work in the hassio node red addon by @frenck - I did a little bit of investigation and it seems to require xvfb in the container :frowning:

Personally, Node-Red isn’t the best idea for something like this. If you can code in Python (which is actually super easy to learn), you could use the BeautifuSoup (https://www.crummy.com/software/BeautifulSoup/bs4/doc/) library, which makes things like this really simple. If you’re a fan of nodejs, take a look at SuperAgent (https://visionmedia.github.io/superagent/).

There’s a good write up about how to attempt this in NodeRed though: https://discourse.nodered.org/t/is-there-any-way-to-automatically-authenticate-myself-in-a-site-with-a-node/12703/3

I look at it like this…

Pictures vs text - is like - node red vs python

For node red I don’t need te really learn syntax and programming. For python I need to, variables, syntax, declaration…

And I used to just don’t like reading back at school :joy: - is like - don’t like text :yum:

LOL fair enough. Programming isn’t everyone’s forte… I totally get that, BUT, with that said, sometimes things are just easier by writing a few lines of code. :wink: (Hence how we even have projects like NodeRed and HA).

Thanks, I wouldn’t have initally gone down the NR route but my when looking at the website I’m trying to scrape (https://my.flogas.co.uk) the site seems to use some kind of Ajax login form which is beyond me :laughing:
Once I’m past that I can simply scrape the data I need from the page, but the issue is logging in.

Unfortunately after checking with frenck it seems the NR add-on doesn’t (and won’t) contain xvfb so nbrowser is a no go…

1 Like

@swifty Unfortunately I couldn’t get it to work so I created a python script and used Selenium to scrape the data I was after. It works pretty well. By going this route, you will have more options and control. Having said that, if nbrowser did work inside of Node-Red my use case would have been much easier to implement.

1 Like

I’ve just had to go the same route but used node red to control a selenium docker container; https://flows.nodered.org/node/node-red-contrib-webdriverio

The flow then uses the usual Home Assistant nodes to create entities in HA for my LPG level.

1 Like

Interesting… I’ll have to check out that node. Thanks for the link @swifty