Hi, I am trying to get my head around the scraper sensor. I am trying to get the info of my oil tank from the supplier’s web site (BoilerJuice in the UK). They provide a website you can check your oil level, I would like to get this into HA.
Here is the code on their site, I am trying to extract the part which states in this example ‘(693 litres)’ or the ‘hidden percentage’ value of ‘63’. Can anyone help me as I wracking my brain trying to understand what to search for. Thanks
I think it would just be select: "p.hidden, p.percentage". Alternately, I think you could also do something like select: "div.text > p.percentage". The second one might be a better choice as it requires the <p> tag to be a direct descendant of the <div> tag.
Another idea would be to use NodeRed with the Cheerio node and scrape it that way. I’ve had a lot of success with Cheerio over BeautifulSoup.
What were you using in the select: before? I’m thinking it’s not finding the div.text tag and that’s why you’re getting the error. Can you try the first example I sent?
Ok, that sounds like a different problem. Can you try changing select: to select: "h3"? That should grab the <h3>Oil Remaining</h3> tag and return the text.
I have a feeling though that this might be an authentication issue rather than a select issue. But, the above test should reveal that.
Yup, I was just testing that. Rather than returning a 403 error (unauthorized), the site is redirecting to a login page which means that the Scrape sensor isn’t sending the credentials properly. You could try it with the authentication: property set to either basic or digest, but neither of those may work depending on how BoilerJuice has setup their authentication. Another idea is something like what @Valentino_Stillhardt suggested, but I would go for a curl route. Curl has a lot more authentication features that would allow you to try to tailor the request to the site.
[EDIT]: Also, take a look at the headers that are coming back when you open the oil level page in a browser. You may need to change the user agent as a lot of sites scan user agent headers and block unknown user agents. You would do that in the headers: configuration of the Scrape sensor.
Grab the HTML from the URL you posted and paste it into XPather. Then, start writing your query at the top of the screen and you should be able to come up with a query for the select: property.
Do not worry about the id number, it is a demo mode for testing.
I’ve tried xpather and I don’t know if I’ve done it right, that’s how my yaml code should look like