Extract information from an html address (xml page)

Hello all,

I would like to read information from an old automation card, which controls four relays and reads the status of four doors. Not having found any possible integration with HA (chip 18F family from Microchip), I think it is easier to try to read this information from the web page interface, at
http://ip-address/status.xml, thet gives me this result:

- <response>
<led0>0</led0>
<led1>0</led1>
<led2>0</led2>
<led3>0</led3>
<btn0>dn</btn0>
<btn1>up</btn1>
<btn2>up</btn2>
<btn3>up</btn3>
<dbgS>-</dbgS>
<dbgC>-</dbgC>
</response>

The value that interests me is refer to btn0 (dn or up), but I really don’t know if it is possible to extract it and make it become a value inside HA. Could you help me to do it?
Thank you .

use a command line sensor to read the webpage data then use a regex to extract the data you want.

you just have to set the scan interval correctly to catch the updated data.

here is an example of a sensor based on a sensor I use to extract my camera motion sensor status (http address and regex modified for yours) that updates the sensor every 3 seconds:

sensor:
  - platform: command_line  
    command: curl -k --silent "http://ip-address/status.xml"
    name: "Camera Motion"
    value_template: "{{ value | regex_findall_index('<btn0> (\S+) </btn0>')}}"
    scan_interval: 3

Hi @finity ,
thank you for your suggestion. I wasted half a day just to solve a first error that was reported from HA:

while scanning a double-quoted scalar in “/config/configuration.yaml”, line 39, column 21 found unknown escape character ‘S’ in “/config/configuration.yaml”, line 39, column 62

In practice it was enough to invert the single quotes with the double quotes and the problem was solved:

value_template: '{{ value | regex_findall_index("<btn0>(\S+)</btn0>")}}'

Now I just can’t get the string I need (‘dn’ or ‘up’), but it always returns “unknowed”. Also I could not find clear explanations related to regex, so even in attempts I have not found solutions. If you can solve the code or indicate a site with clear explanations about regex, I would be grateful.

I know finity pointed you towards command line, but the rest integration makes this simple.

rest:
  - resource: http://ip-address/status.xml
    sensor:
      - name: X
        value_template: "{{ value_json.response.led0 }}"
      - name: Y
        value_template: "{{ value_json.response.led1 }}"

you missed one important thing in the template.

you have this:

value_template: '{{ value | regex_findall_index("<btn0>(\S+)</btn0>")}}'

I wrote this:

value_template: "{{ value | regex_findall_index('<btn0> (\S+) </btn0>')}}"

notice the spaces before and after (\S+)

regex is kind of tricky.

there is some explanation in the HA docs but there is also a site you can use to test the regex patterns called “regex101”. I would post a link but my google is acting up right now.

FYI restful reads json or xml, much easier than commandline and regex.

1 Like

may be true (I don’t use either of them very often so no real extensive experience to know definitively either way for my uses) but it is another tool in the tool box.

I figured I would give your method a go…but I’m not getting a valid state.

Here is the sensor:

sensor:
  - platform: rest
    resource: http://192.168.1.58:88/cgi-bin/CGIProxy.fcgi?cmd=getDevState&usr=<my_user>&pwd=<my_pass>
    name: "Kitchen Camera Motion REST"
    value_template: "{{ value_json.CGI_Result.motionDetectAlarm }}"
    scan_interval: 10

here is the result of the rest call in a browser:

<CGI_Result>
    <result>0</result>
    <IOAlarm>0</IOAlarm>
    <motionDetectAlarm>1</motionDetectAlarm>
    <soundAlarm>0</soundAlarm>
    <record>0</record>
    <sdState>0</sdState>
    <sdFreeSpace>0k</sdFreeSpace>
    <sdTotalSpace>0k</sdTotalSpace>
    <ntpState>1</ntpState>
    <ddnsState>0</ddnsState>
    <url></url>
    <upnpState>0</upnpState>
    <isWifiConnected>0</isWifiConnected>
    <wifiConnectedAP></wifiConnectedAP>
    <infraLedState>0</infraLedState>
    <humanDetectAlarmState>0</humanDetectAlarmState>
    <sdFormatError>0</sdFormatError>
</CGI_Result>

but the state of the sensor is “unknown”.

I tried sending the XML thru a xml to json converter and that result works fine in dev tools so the template structure should be valid. I think.

this is the error I’m getting in the log:

2022-01-17 13:17:08 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: 'value_json' is undefined when rendering '{{ value_json.converted.CGI_Result.motionDetectAlarm }}'

Any suggestions?

Seems straightforward but I’m missing something I guess.

I don’t see anything out of the ordinary, other than the template error doesn’t match your configured value_template field. {{ value_json.converted.CGI_Result.motionDetectAlarm }} vs {{ value_json.CGI_Result.motionDetectAlarm }}

try outputting the first 255 characters to see if it’s getting a response

{{ value[:255] }}

Oh, oops that was a mis-copied error from another try in the dev tools editor.

here is the real one (same but without the “converted”):

2022-01-17 17:13:42 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: 'value_json' is undefined when rendering '{{ value_json.CGI_Result.motionDetectAlarm }}'

yes it is:

You might have to turn on debug. It should be turning it into json unless it’s incorrectly formatted XML, but I don’t see that in what you posted.

I corrected it, but it always gives me “unknowed”.

rest:
  - resource: http://ip-address/status.xml
    sensor:
      - name: X
        value_template: "{{ value_json.response.led0 }}"
      - name: Y
        value_template: "{{ value_json.response.led1 }}"

works well. Thank you all

what’s the magic sauce to turn on debugging for templates?

I used this based on the error wording but it didn’t work:

logger:
  default: warning 
  logs:
    homeassistant.helpers.template: debug

Templates are helpers, I would assume its

homeassistant.helpers.template

However, in your case, I would turn on debug for rest

logger:
  default: warning 
  logs:
    homeassistant.components.rest: debug

yeah, me too and that’s what I used and it didn’t work.

Did that.

Here is the debug entry for that plus the error for the template:

2022-01-18 14:45:04 DEBUG (MainThread) [homeassistant.components.rest.sensor] Data fetched from resource: <CGI_Result>
    <result>0</result>
    <IOAlarm>0</IOAlarm>
    <motionDetectAlarm>1</motionDetectAlarm>
    <soundAlarm>0</soundAlarm>
    <record>0</record>
    <sdState>0</sdState>
    <sdFreeSpace>0k</sdFreeSpace>
    <sdTotalSpace>0k</sdTotalSpace>
    <ntpState>1</ntpState>
    <ddnsState>0</ddnsState>
    <url></url>
    <upnpState>0</upnpState>
    <isWifiConnected>0</isWifiConnected>
    <wifiConnectedAP></wifiConnectedAP>
    <infraLedState>0</infraLedState>
    <humanDetectAlarmState>0</humanDetectAlarmState>
    <sdFormatError>0</sdFormatError>
</CGI_Result>

2022-01-18 14:45:04 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: 'value_json' is undefined when rendering '{{ value_json.CGI_Result.motionDetectAlarm }}'

if I use it as a string result and I use this as the template it extracts it fine and the result is ‘1’ as expected (might be able to use a better template but whipped it up quickly as proof of concept):

{{ value.split('>')[6].split('<')[0] }}

it doesn’t seem to be an issue with the rest sensor per se but in the code to do the conversion from xml to json that seems to be failing for some reason.

As I said I put the result into an xml to json converter and it didn’t complain so it has to be valid xml.

bug?

:man_shrugging:

What device is it?

it’s a security camera.

I don’t think it’s a bug in the device, tho. it works fine as command_line sensor and the xml result is OK.

I think it’s a bug in the rest sensor conversion to json? I can’t explain it another way.

Maybe I’ll try using the rest integration to see if it works there.

I’m thinking this may be a hidden character deal tbh

wouldn’t that be caught by the online xml to json converter too and cause it to complain?

I copied it directly from the browser into the converter.

I’ll try to copy it from the result the rest sensor got and see what happens then.