Could someone help me with a scrap i am still learning this and getting a bit stuck.
I used node red and http request h3 and i get the following just not able to get the dates.
any help anyone could give would be great
Could someone help me with a scrap i am still learning this and getting a bit stuck.
I used node red and http request h3 and i get the following just not able to get the dates.
any help anyone could give would be great
Hi
I live in wakefield too, I dont suppose you would share your work please ?
thanks
Martyn
I am on ubuntu 18.04 server and have Node Red running successfully.
Does anyone know how to install PUP as it wont let us install from sudo apt-get install pup ?
I live in Wakefield and seen 1 user on here also uses it from the council website too but if some one would give me guidance, that would be appreciated.
thanks in advance
martyn
Wow what a wealth of information here. Ive managed to get so far but not been able to extract any useful information yet on dates etc. Was wondering if someone could help point me in the right direction.
My cancel site is https://apps.castlepoint.gov.uk/cpapps/index.cfm?fa=wastecalendar
By using chrome F12 I’ve found out my road id as 2757 and this url takes us there but not got any further: https://apps.castlepoint.gov.uk/cpapps/index.cfm?roadID=2757&fa=wastecalendar.displayDetails
Would be grateful for some advice.
That’s as far as I got. The scraping is the big bit.
What OS are you using and have you tried what the original poster did to install PUP ?
Here’s scrape sensors to return the date for pink and grey days:
First “pink” day on page (don’t need an index for the first one, returns “10”):
- platform: scrape
name: First Pink
resource: https://apps.castlepoint.gov.uk/cpapps/index.cfm?roadID=2757&fa=wastecalendar.displayDetails
select: ".pink"
Second “pink” day on page (indexes start at 0, returns “24”):
- platform: scrape
name: Second Pink
resource: https://apps.castlepoint.gov.uk/cpapps/index.cfm?roadID=2757&fa=wastecalendar.displayDetails
select: ".pink"
index: 1
And you can do the same for the grey days, but the class is .normal
in this case (returns “3”).
- platform: scrape
name: First Grey
resource: https://apps.castlepoint.gov.uk/cpapps/index.cfm?roadID=2757&fa=wastecalendar.displayDetails
select: ".normal"
The month header can be grabbed with the following (months are the second and third “h2” on the page, returns “June 2019”)
- platform: scrape
name: First Month
resource: https://apps.castlepoint.gov.uk/cpapps/index.cfm?roadID=2757&fa=wastecalendar.displayDetails
select: "h2"
index: 1
“.pink” is the css class used for the pink dates, “.normal” for the grey, and “h2” is used for the month headers. Then you just use index
to select what number occurrence the element you’re looking for is, index numbers start at zero and go up from there.
Hi
https://www.wakefield.gov.uk/site/Where-I-Live-Results?uprn=63121996
This is not my exact address but a random one but how would I scrape that please ?
There isnt a calendar on the page just text
Thanks in advance
Martyn
So here are instructions on how to accomplish this with a scrape sensor, but I will say that I was unable to get a working sensor for it as the wakefield site takes far too long to load and times out.
select:
field of a scrape sensor.For the wakefield site, doing this on the date under “last collection” of “household waste” gives us:
#ctl00_PlaceHolderMain_Waste_output > div:nth-child(2) > div:nth-child(2) > div:nth-child(2)
Your sensor would look like this:
- platform: scrape
resource: https://www.wakefield.gov.uk/site/Where-I-Live-Results?uprn=63121996
select: "#ctl00_PlaceHolderMain_Waste_output > div:nth-child(2) > div:nth-child(2) > div:nth-child(2)"
Thats amazing, looks so easy but I just couldn’t work it out. Thank you mayker for your time on this.
Is there a way to create and test scrapes without writing them in yaml and restating Hassio each time?
You could scrape the data and then set it to a variable in the template editor and play with the template to get the correct data and then make the sensor and restart. I do that all the time to check I am trying the right options.
Unfortunately not. There are browser plugins to help find out the correct selectors though. Search Google Chrome CSS selector extension
Thanks David and Robbrad, I will look into both methods.
Thank you for explaining and possibly why I wasn’t getting anything but I will keep trying lol
Martyn
Hi,
Sorry for the delay in replying. If you use Node-red the following code will scrap the information, split it into Garden, Recycling, and General waste. Those are then sent via mqtt (which I “sense” in HA for display).
You will need to change the http request for the one you obtain when you enter your postcode, but it appears to be working for me.
[{“id”:“4737b518.c5bcec”,“type”:“inject”,“z”:“3e0d8381.5dd29c”,“name”:"",“topic”:"",“payload”:"",“payloadType”:“date”,“repeat”:“7200”,“crontab”:"",“once”:false,“onceDelay”:0.1,“x”:210,“y”:100,“wires”:[[“686629f7.10a8d8”]]},{“id”:“240a841d.08d0dc”,“type”:“debug”,“z”:“3e0d8381.5dd29c”,“name”:"",“active”:false,“tosidebar”:true,“console”:false,“tostatus”:false,“complete”:“true”,“x”:570,“y”:240,“wires”:[]},{“id”:“686629f7.10a8d8”,“type”:“http request”,“z”:“3e0d8381.5dd29c”,“name”:"",“method”:“GET”,“ret”:“txt”,“url”:"",“tls”:"",“x”:350,“y”:200,“wires”:[[“240a841d.08d0dc”,“f489f1db.0789d8”,“9109ca9c.5945c”]]},{“id”:“f489f1db.0789d8”,“type”:“html”,“z”:“3e0d8381.5dd29c”,“name”:"",“property”:“payload”,“outproperty”:“payload”,“tag”:“table[class=“mb10 wilWasteContent gardenFutureData”]”,“ret”:“html”,“as”:“single”,“x”:390,“y”:280,“wires”:[[“8836ed72.79a688”]]},{“id”:“8836ed72.79a688”,“type”:“debug”,“z”:“3e0d8381.5dd29c”,“name”:"",“active”:false,“tosidebar”:true,“console”:false,“tostatus”:false,“complete”:“true”,“x”:630,“y”:340,“wires”:[]},{“id”:“9109ca9c.5945c”,“type”:“html”,“z”:“3e0d8381.5dd29c”,“name”:"",“property”:“payload”,“outproperty”:“payload”,“tag”:“div[class=“mb10 ind-waste-wrapper”]”,“ret”:“html”,“as”:“single”,“x”:370,“y”:460,“wires”:[[“c19b6f46.cd128”,“e7255980.dc6a08”,“91d1edf5.05ad3”,“ae205d9f.5ab878”]]},{“id”:“c19b6f46.cd128”,“type”:“debug”,“z”:“3e0d8381.5dd29c”,“name”:"",“active”:true,“tosidebar”:true,“console”:false,“tostatus”:false,“complete”:“true”,“x”:710,“y”:500,“wires”:[]},{“id”:“e7255980.dc6a08”,“type”:“html”,“z”:“3e0d8381.5dd29c”,“name”:"",“property”:“payload[2]”,“outproperty”:“payload”,“tag”:“div”,“ret”:“html”,“as”:“single”,“x”:810,“y”:440,“wires”:[[“b8414891.1b716”,“ad2e3ccd.e71bf”]]},{“id”:“b8414891.1b716”,“type”:“debug”,“z”:“3e0d8381.5dd29c”,“name”:"",“active”:true,“tosidebar”:true,“console”:false,“tostatus”:false,“complete”:“payload[6]”,“x”:1000,“y”:440,“wires”:[]},{“id”:“91d1edf5.05ad3”,“type”:“html”,“z”:“3e0d8381.5dd29c”,“name”:"",“property”:“payload[1]”,“outproperty”:“payload”,“tag”:“div”,“ret”:“html”,“as”:“single”,“x”:810,“y”:380,“wires”:[[“4503744c.71f94c”,“54055bd3.a99f34”]]},{“id”:“4503744c.71f94c”,“type”:“debug”,“z”:“3e0d8381.5dd29c”,“name”:"",“active”:true,“tosidebar”:true,“console”:false,“tostatus”:false,“complete”:“payload[6]”,“x”:1000,“y”:380,“wires”:[]},{“id”:“ae205d9f.5ab878”,“type”:“html”,“z”:“3e0d8381.5dd29c”,“name”:"",“property”:“payload[0]”,“outproperty”:“payload”,“tag”:“div”,“ret”:“html”,“as”:“single”,“x”:810,“y”:320,“wires”:[[“26ffccf.40081b4”,“1bb9846e.98a20c”]]},{“id”:“26ffccf.40081b4”,“type”:“debug”,“z”:“3e0d8381.5dd29c”,“name”:"",“active”:true,“tosidebar”:true,“console”:false,“tostatus”:false,“complete”:“payload[6]”,“x”:1000,“y”:320,“wires”:[]},{“id”:“87378101.fd7128”,“type”:“mqtt out”,“z”:“3e0d8381.5dd29c”,“name”:"",“topic”:“recycling”,“qos”:“2”,“retain”:“true”,“broker”:“db107b84.66e6”,“x”:1180,“y”:420,“wires”:[]},{“id”:“1cfdde38.c152e2”,“type”:“mqtt out”,“z”:“3e0d8381.5dd29c”,“name”:"",“topic”:“garden”,“qos”:“2”,“retain”:“true”,“broker”:“db107b84.66e6”,“x”:1180,“y”:480,“wires”:[]},{“id”:“1c5197ad.e595e”,“type”:“mqtt out”,“z”:“3e0d8381.5dd29c”,“name”:"",“topic”:“waste”,“qos”:“2”,“retain”:“true”,“broker”:“db107b84.66e6”,“x”:1170,“y”:360,“wires”:[]},{“id”:“1bb9846e.98a20c”,“type”:“function”,“z”:“3e0d8381.5dd29c”,“name”:"",“func”:“msg.payload = msg.payload[6];\nreturn msg;”,“outputs”:1,“noerr”:0,“x”:1010,“y”:360,“wires”:[[“1c5197ad.e595e”]]},{“id”:“54055bd3.a99f34”,“type”:“function”,“z”:“3e0d8381.5dd29c”,“name”:"",“func”:“msg.payload = msg.payload[6];\nreturn msg;”,“outputs”:1,“noerr”:0,“x”:1010,“y”:420,“wires”:[[“87378101.fd7128”]]},{“id”:“ad2e3ccd.e71bf”,“type”:“function”,“z”:“3e0d8381.5dd29c”,“name”:"",“func”:“msg.payload = msg.payload[6];\nreturn msg;”,“outputs”:1,“noerr”:0,“x”:1010,“y”:480,“wires”:[[“1cfdde38.c152e2”]]},{“id”:“db107b84.66e6”,“type”:“mqtt-broker”,“z”:"",“name”:"",“broker”:“localhost”,“port”:“1883”,“clientid”:"",“usetls”:false,“compatmode”:true,“keepalive”:“60”,“cleansession”:true,“birthTopic”:"",“birthQos”:“0”,“birthPayload”:"",“closeTopic”:"",“closeQos”:“0”,“closePayload”:"",“willTopic”:"",“willQos”:“0”,“willPayload”:""}]
Ive run ‘go get github.com/ericchiang/pup’ on my ubuntu distribution and its installed the ‘pup’ file in /go/bin folder. Is there anything else I need to do? as it keeps coming up with
Command ‘pup’ not found, but there are 17 similar ones.
Does running /go/bin/pup work?
/go/bin/pup didn’t but /root/go/bin/pup seem to do something, just a cursor waiting for something to be entered.
This is what I get until I press cont+c to stop.
Sorry to be so vague but I’ve searched everywhere I could think of and couldn’t get any information hence posting here.
Node - Red maybe the solution for some but I could not copy that code into my Node Red. I got a couple of erros with the " not been correct and then with mb10.
Not sure if its just me but I could not import it via the clipboard to even have a lool
I just tried copying it in myself and can confirm I couldn’t copy it into my work machine’s node-red.
I’ll have a look ino it when I get home tonight.
Sorry about that.