Hi All,
I’m relatively new to multiscrape, so have been searching for examples to build off. I’ve been able to scrape data from simple sites, however I need some data from a marine weather website.
The basic HTML scraping isn’t working (the log_output shows that the HTML returned obfuscates the actual data I want) is only going to be available via calling their endpoint: GET https://www.aucklandcoastguard.org.nz/api/forecast/weather
I’m at a loss as to where to begin though - without a huge number of examples of GET/POST methods within the community forums (or at least ones that I understand) I thought I’d try here.
My main questions are:
- What is the format required to call a GET endpoint like this - see headers below
- Do I need to split this into two parts - one to get a cookie, and one to call the endpoint with said cookie, or does this all happen within multi-scrape? (note, no login is required)
Headers/Endpoint info:
The JSON in the response that I’m trying to retrieve is indicated by the * at the start of each line below. I’m relatively confident there’s enough information out there to do this part of the config myself, so really just putting this here for context.
{
"forecastAtTimes": [
{
"location": [
{
"id": "Abel",
...more data...
}
],
"at": 0.0,
"from": 202202031246.0,
"to": 202202050000.0
},
... more locations...
{
"location": [
{
"id": "Auckland",
"area": null,
"warning": null,
"forecast": null,
"outlook": null,
"swell": null,
* "situation": "A couple of slow-moving, moisture-lade....",
"tideHighData": null,
"tideLowData": null
}
],
"at": 202202031108.0,
"from": 0.0,
"to": 0.0
},
{
"location": [
{
"id": "Auckland",
"area": "Manukau and Waitemata Harbours, Hauraki Gulf and Bream Head to Cape Colville.",
"warning": "Nil",
* "forecast": "For Manukau Harbour: Thursday: S...",
"outlook": "See above",
"swell": "See above",
"situation": null,
"tideHighData": null,
"tideLowData": null
}
],
"at": 0.0,
"from": 202202031108.0,
"to": 202202050000.0
}
]
}
Thanks in advance