Support for xpath?

I have an xpath query to parse some html into json for a sensor. I know I could use command_line sensor and curl+xmllint to do this, but I am hoping I can do it without external dependencies.

'{',
  string-join(
  //td[not(following::td[@class='today'])]//div[@class=('rubbish','recycling_box','recycling_bin')]
    /concat(
      '"',
      normalize-space(text()),
      '":"',
      xs:date(replace(replace(@id,<regex>), <regex>)) + xs:yearMonthDuration('P1M')
      ,'"'
      )
,',')
,'}'

It selects all divs in any td that matches the specified classes and that doesn’t have a td of class today after it. The text of a div becomes a json key and the id is regex manipulated into a date.

The only beautifulsoup examples I can find seem are simple in comparison.
Is it possible to do this using something like the scape sensor?

So it turns out that Python is without XPath 2.0 support, given that the spec is 10 yrs old at this point I don’t expect that to change.
xqilla is a app that will process XPath 2.0 but isn’t available for my platform.
So instead I have created a regex to parse the HTML and do date addition. Behold!

class=\"today\".*?JSC[0-9_]+_(?P<d1>3[01]|[12]?[0-9])(?:(?P<m1_10>10)|(?P<m1_11>11)|(?P<m1_0>0)|(?P<m1_1>1)|(?P<m1_2>2)|(?P<m1_3>3)|(?P<m1_4>4)(?P<m1_5>5)|(?P<m1_6>6)|(?P<m1_7>7)|(?P<m1_8>8)|(?P<m1_9>9))(?=.*?(?P<m1>(?(m1_0)1)(?(m1_1)2)(?(m1_2)3)(?(m1_3)4)(?(m1_4)5)(?(m1_5)6)(?(m1_6)7)(?(m1_7)8)(?(m1_8)9)(?(m1_9)10)(?(m1_10)11)(?(m1_11)12)))(?P<y1>2[0-9]{3})JSC..\" +class=\"(?P<t1>[^\"]+).*?JSC[0-9_]+_(?P<d2>3[01]|[12]?[0-9])(?:(?P<m2_10>10)|(?P<m2_11>11)|(?P<m2_0>0)|(?P<m2_1>1)|(?P<m2_2>2)|(?P<m2_3>3)|(?P<m2_4>4)(?P<m2_5>5)|(?P<m2_6>6)|(?P<m2_7>7)|(?P<m2_8>8)|(?P<m2_9>9))(?=.*?(?P<m2>(?(m2_0)1)(?(m2_1)2)(?(m2_2)3)(?(m2_3)4)(?(m2_4)5)(?(m2_5)6)(?(m2_6)7)(?(m2_7)8)(?(m2_8)9)(?(m2_9)10)(?(m2_10)11)(?(m2_11)12)))(?P<y2>2[0-9]{3})JSC..\" +class=\"(?P<t2>[^\"]+).*?JSC[0-9_]+_(?P<d3>3[01]|[12]?[0-9])(?:(?P<m3_10>10)|(?P<m3_11>11)|(?P<m3_0>0)|(?P<m3_1>1)|(?P<m3_2>2)|(?P<m3_3>3)|(?P<m3_4>4)(?P<m3_5>5)|(?P<m3_6>6)|(?P<m3_7>7)|(?P<m3_8>8)|(?P<m3_9>9))(?=.*?(?P<m3>(?(m3_0)1)(?(m3_1)2)(?(m3_2)3)(?(m3_3)4)(?(m3_4)5)(?(m3_5)6)(?(m3_6)7)(?(m3_7)8)(?(m3_8)9)(?(m3_9)10)(?(m3_10)11)(?(m3_11)12)))(?P<y3>2[0-9]{3})JSC..\" +class=\"(?P<t3>[^\"]+)

1 Like

I really like the fact you mention the famous stackoverflow comment about parsing html with regex yourself :wink:

1 Like

OMG was laughing like a madman at that stackoverflow comment. Not sure I’m convinced yet though;)