[GUIDE] Scraping dynamic websites with browserless + multiscrape. v2 update

wigster · May 3, 2024, 6:53am

Do you have an ARM architecture CPU? Apparently browserless do not release the ARM architecture Docker images often, so that Docker gets updated very rarely. I do not understand if that’s by omission or whether there is a fundamental problem with their ARM implementation.

My HA is running on an ARM chip so I have the same problem. I have ended up running the browserless Docker container on another (Intel) machine and having HA make requests to there.

atlflyer · May 3, 2024, 3:12pm

The changelog for the Browserless Chrome add-on now says not to update and to “switch to chromium” without any further explanation of what that might mean. Does anyone know what’s going on?

wigster · May 3, 2024, 7:45pm

I asked if Alex belgium could look at the arm image.

[BROWSERLESS-CHROME] Add on on ARM stuck on old version 2.2.0-5 · Issue #1375 · alexbelgium/hassio-addons (github.com)

You can update at your own risk.

westado · May 9, 2024, 8:39pm

hy I did it exactly according to the description. The html file is created but it is empty. is that right? On the browserless.io site the script works great. Do I just copy the data from the script and create a file with my_scraper.js? Thanks

wigster · May 20, 2024, 9:29am

Not sure it’s the same, but for me the v2.8 of the browserless-chromium add-on worked flawlessly, but e.g. the new 2.11 (I think) seems to return empty html.

I have a feeling that version v2 of browserless is not super stable; it’s a bit of a work in progress, but I have not had time to investigate the problem yet.

herrfelix · August 30, 2024, 8:39pm

I get this error when I start the shell_command:

stdout: ""
stderr: >-
  ./scripts/browserless_scraper.sh: line 2: /config/www/browserless/: Is a
  directory
returncode: 1

I think, /config/www/browserless/ should be a directory. What’s wrong her?

EDIT:
I forgot to add a payload to the shell_command.

With:

action: shell_command.browserless_scraper
data:
  function: “/cookidoo-scrape.js”
  output: “output.html”

everything works as desired.