Warning while using Scrape

templeton_nash · January 19, 2024, 2:29pm

I have the following in my log:

Logger: py.warnings
Source: /usr/local/lib/python3.11/warnings.py:109
First occurred: 02:01:26 (30 occurrences)
Last logged: 14:17:50

/usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor. warnings.warn(

It’s only a warning but I’m always trying to clean up my log. I believe it comes from the scrape sensor that I use to get info from my printer. This is configured as here:

scrape:
  - resource: http://192.168.0.169/DevMgmt/ProductUsageDyn.xml
    scan_interval: 60
    sensor:
    - name: "HP OfficeJet Pro 9020 Series Pages Printed"
      unique_id: hp_officejet_pro_9020_series_pages_printed
      icon: mdi:file-document
      select: 'dd\:TotalImpressions[PEID="5082"]'
      value_template: >-
        {% if value == "" %}
          0
        {% else %}
          {{ value }}
        {% endif %}
      unit_of_measurement: pages

Is there a way to stop this or should I stop worrying?

Troon · January 19, 2024, 2:52pm

Yeah, don’t use scrape, use rest. Here’s mine for my OfficeJet 8100 which works perfectly:

https://community.home-assistant.io/t/extracting-printer-data-from-complex-xml-with-rest-integration/

templeton_nash · January 19, 2024, 3:22pm

Thanks @Troon , that fixed it!

WallyR · January 19, 2024, 3:48pm

Or use snmp

Troon · January 19, 2024, 3:53pm

For my printer at least, the SNMP values for cartridge remaining capacity were floor-rounded to the nearest 10%, and the value for one of the colours was unreliable, which is why I use the XML instead.

I realise that’s a shortcoming of the printer implementation, not the protocol.

WallyR · January 19, 2024, 4:12pm

Yeah, that would make it rather unusable.