Problem extracting data in custom component Python

I am trying to modify my custom component to extract some extra data from the xml returned.

Here is the xml (part of it) as an example

<?xml version="1.0" encoding="UTF-8"?>
<product version="1.7" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.bom.gov.au/schema/v1.7/product.xsd">
    <amoc>
        <source>
            <sender>Australian Government Bureau of Meteorology</sender>
            <region>New South Wales</region>
            <office>NSWRO</office>
            <copyright>http://www.bom.gov.au/other/copyright.shtml</copyright>
            <disclaimer>http://www.bom.gov.au/other/disclaimer.shtml</disclaimer>
        </source>
        <identifier>IDN11052</identifier>
        <issue-time-utc>2019-11-10T17:51:59Z</issue-time-utc>
        <issue-time-local tz="EDT">2019-11-11T04:51:59+11:00</issue-time-local>
        <sent-time>2019-11-10T17:51:59Z</sent-time>
        <expiry-time>2019-11-11T17:51:59Z</expiry-time>
        <validity-bgn-time-local tz="EDT">2019-11-11T00:00:00+11:00</validity-bgn-time-local>
        <validity-end-time-local tz="EDT">2019-11-17T23:59:59+11:00</validity-end-time-local>
        <next-routine-issue-time-utc>2019-11-11T05:25:00Z</next-routine-issue-time-utc>
        <next-routine-issue-time-local tz="EDT">2019-11-11T16:25:00+11:00</next-routine-issue-time-local>
        <status>O</status>
        <service>WSP</service>
        <sub-service>FCT</sub-service>
        <product-type>F</product-type>
        <phase>NEW</phase>
    </amoc>
    <forecast>
        <area aac="NSW_FA001" description="New South Wales" type="region">
            <forecast-period start-time-local="2019-11-11T04:52:00+11:00" end-time-local="2019-11-11T04:52:00+11:00" start-time-utc="2019-11-10T17:52:00Z" end-time-utc="2019-11-10T17:52:00Z">
                <text type="warning_summary_footer">Details of warnings are available on the Bureau's website www.bom.gov.au, by telephone 1300-659-218* or through some TV and radio broadcasts.</text>
                <text type="product_footer">* Calls to 1300 numbers cost around 27.5c incl. GST, higher from mobiles or public phones.</text>
            </forecast-period>
        </area>
        <area aac="NSW_ME004" description="Central Coast" type="metropolitan" parent-aac="NSW_FA001">
            <forecast-period index="0" start-time-local="2019-11-11T00:00:00+11:00" end-time-local="2019-11-12T00:00:00+11:00" start-time-utc="2019-11-10T13:00:00Z" end-time-utc="2019-11-11T13:00:00Z">
                <text type="forecast">Mostly sunny. Light winds becoming northeasterly 30 to 45 km/h in the middle of the day then tending northerly 20 to 30 km/h in the late evening.</text>
                <text type="fire_danger">Very High</text>
                <text type="uv_alert">Sun protection 8:40am to 4:00pm, UV Index predicted to reach 10 [Very High]</text>
            </forecast-period>

I am trying to extract the fire_danger and uv_alert. I can extract the forecast line directly above but just can’t get the other 2.

The py is here

class BOMForecastData:
    """Get data from BOM."""

    def __init__(self, product_id):
        """Initialize the data object."""
        self._product_id = product_id

    def get_reading(self, condition, index):
        """Return the value for the given condition."""
        if condition == 'detailed_summary':
            if PRODUCT_ID_LAT_LON_LOCATION[self._product_id][3] == 'City':
                detailed_summary = self._data.find(_FIND_QUERY_2.format(index)).text
            else:
                detailed_summary = self._data.find(_FIND_QUERY.format(index, 'forecast')).text
            return (detailed_summary[:251] + '...') if len(detailed_summary) > 251 else detailed_summary
        
        find_query = (_FIND_QUERY.format(index, SENSOR_TYPES[condition][0]))
        state = self._data.find(find_query)

        if condition == 'icon':
            return ICON_MAPPING[state.text]
"""
        if condition == 'uv_alert':
            uv_alert = self._data.find(_FIND_QUERY.format(index, 'uv_alert')).text
            return uv_alert
        if condition == 'fire_danger':
            fire_danger =  self._data.find(_FIND_QUERY.format(index, 'fire_danger')).text
            return fire_danger
"""
        if state is None:
            if condition == 'possible_rainfall':
                return '0 mm'
            return 'n/a'
        s = state.text
        return (s[:251] + '...') if len(s) > 251 else s

I’m commenting out the non-working lines…

The full component is here…

This forum brings together people from all around the world and introduces us to regulations, terminology, and practices employed elsewhere. When I read ‘fire_dander’ I approached it with an open mind and assumed those clever Aussies had invented a term for the particulates produced by wildfires. A quick scan through the XML revealed … ‘fire_danger’. So it was nothing more than just another occurrence of a cross-cultural typo. :slight_smile:

1 Like

Just been listening to the news regarding Aussie fire danger. I’m sure I speak for everyone in wishing you good luck and best wishes @DavidFW1960 and the other Aussies on the forum.

Thanks Nick. It’s weird. It’s cool (23°C) today and tomorrow is supposed to be 37° and windy and they forecast ‘cataclysmic’ fire conditions. We have in the past (1994) been evacuated and a couple of other scares. Hard to know what will happen but it doesn’t really feel like much right now…

Where are you? We are mainly hearing about Sydney and also some fires in Queensland.

Just north of Sydney… 1/2 way between Syd & Newcastle…

1 Like

I can see that your XPath starts with a dot which selects the “current node”. While this may work in some cases, I am not convinced that this approach works reliably. If the root element never changes you could just specify that at the beginning (/product/forecast/...).

You’re talking a foreign language here lol… I would be grateful for any help or a PR… I am putting mt working version on github here https://github.com/DavidFW1960/bom_forecast/blob/eb640fd2b6c45128141223bb8b48578e6bf7b10b/custom_components/bom_forecast/sensor.py.work

Sorry, let’s try this a different way :slight_smile:

Try changing this line from:

_FIND_QUERY = "./forecast/area[@type='location']/forecast-period[@index='{}']/*[@type='{}']" 

to:

_FIND_QUERY = "/product/forecast/area[@type='location']/forecast-period[@index='{}']/*[@type='{}']"

Thanks for that. I can make that change but that’s not going to help me extract the uv or fire danger is it? Do you have any suggestions to do that?

Ah, now I see that there are different types of forecast in the XML. You have been looking at an area with type “location”, but uv alert and fire danger are in an area with type “metropolitan”.

XPath for uv alert:

/product/forecast/area[@type='metropolitan']/forecast-period[@index='0']/*[@type='uv_alert']

XPath for fire danger:

/product/forecast/area[@type='metropolitan']/forecast-period[@index='0']/*[@type='fire_danger']

So, these two are almost like _FIND_QUERY_2 in your code except that you would need to vary the type, i.e. the last portion of that XPath with forecast, uv_alert or fire_danger.

I would recommend using an XPath tester where you can paste the whole XML and try which combination of path and fields works.

Couldn’t get any joy from the XPath tester but I made it work. Thanks!

actually seeing a bunch of errors…

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 408, in _async_add_entity
    await entity.async_update_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 275, in async_update_ha_state
    self._async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 309, in _async_write_ha_state
    state = self.state
  File "/config/custom_components/bom_forecast/sensor.py", line 290, in state
    self._condition, self._index)
  File "/config/custom_components/bom_forecast/sensor.py", line 401, in get_reading
    uv_alert = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert')).text
AttributeError: 'NoneType' object has no attribute 'text'

Even though it’s working… hmm…

also with the path change I get this…
SyntaxError: cannot use absolute path on element

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 408, in _async_add_entity
    await entity.async_update_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 275, in async_update_ha_state
    self._async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 309, in _async_write_ha_state
    state = self.state
  File "/config/custom_components/bom_forecast/sensor.py", line 290, in state
    self._condition, self._index)
  File "/config/custom_components/bom_forecast/sensor.py", line 401, in get_reading
    uv_alert = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert')).text
AttributeError: 'NoneType' object has no attribute 'text'
2019-11-16 17:28:19 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 408, in _async_add_entity
    await entity.async_update_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 275, in async_update_ha_state
    self._async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 309, in _async_write_ha_state
    state = self.state
  File "/config/custom_components/bom_forecast/sensor.py", line 290, in state
    self._condition, self._index)
  File "/config/custom_components/bom_forecast/sensor.py", line 404, in get_reading
    fire_danger =  self._data.find(_FIND_QUERY_4.format(index, 'fire_danger')).text
AttributeError: 'NoneType' object has no attribute 'text'
2019-11-16 17:28:19 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 408, in _async_add_entity
    await entity.async_update_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 275, in async_update_ha_state
    self._async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 309, in _async_write_ha_state
    state = self.state
  File "/config/custom_components/bom_forecast/sensor.py", line 290, in state
    self._condition, self._index)
  File "/config/custom_components/bom_forecast/sensor.py", line 401, in get_reading
    uv_alert = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert')).text

I think the index can only be 0 or 1 (depends on if peak UV has been reached for the day)

You have to cater for the case where the XPath does not yield any result under that index:

    uv_alert_data = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert'))
    if uv_alert_data:
      return uv_alert_data.text

In general, I’d recommend to use more defensive programming with external sources because the format and details are out of your control, i.e. you cannot rely on the format to be always correct and always the same as it was on the day you developed your component.

So that stopped the error but I am now getting 6 sensors for 6 days and the state for all of them is n/a whereas before _1 was giving me the uv and fire for tomorrow… so the test not quite right.

Can I use:

    uv_alert_data = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert'))
    if uv_alert_data:
      uv_alert = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert')) ,text     
      return uv_alert

Seems not… still n/a for everything

Before the ‘if’ I was getting errors but it was only creating 1 up and one fire sensor (and they were the correct string)… but for some reason the condition is returning n/a for everything.

So close I can taste it!

Tried this too…

        if condition == 'uv_alert':
            uv_alert_data = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert'))
            if uv_alert_data:
                uv_alert = self._data.find(_FIND_QUERY_3.format(index, 'uv_alert')).text
            else:
                uv_alert = 'none'
            return uv_alert

        find_query = (_FIND_QUERY.format(index, SENSOR_TYPES[condition][0]))
        state = self._data.find(find_query)

So I moved this up above the last 2 lines shown…

Now all my sensors are showing ‘none’ as the state so it’s not reading the dats (as above I guess) Not sure how to get back to it reading the data without the errors I was getting before.

OK, so I don’t exactly know how the sensor state should work in your component. But maybe it’s time to add some debug log statements into your code, for example in BOMForecastSensorFriendly#state or BOMForecastData#get_reading to better see what happens in your code.

So I have no idea now…

I have tried to emulate the forecast which looks the same in my eyes.
Error:

2019-11-19 17:38:49 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 408, in _async_add_entity
    await entity.async_update_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 275, in async_update_ha_state
    self._async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 309, in _async_write_ha_state
    state = self.state
  File "/config/custom_components/bom_forecast/sensor.py", line 290, in state
    self._condition, self._index)
  File "/config/custom_components/bom_forecast/sensor.py", line 397, in get_reading
    uv_alert_data = self._data.find(_FIND_QUERY_3.format(index)).text
AttributeError: 'NoneType' object has no attribute 'text'

This log error just keeps on repeating.

It is producing the desired result…

Because it’s after 4:40pm it has switched from _0 to _1 and I only get that one alert.

In the py file I have:
lines 25++

_FIND_QUERY = "./forecast/area[@type='location']/forecast-period[@index='{}']/*[@type='{}']"
_FIND_QUERY_2 = "./forecast/area[@type='metropolitan']/forecast-period[@index='{}']/text[@type='forecast']"
_FIND_QUERY_3 = "./forecast/area[@type='metropolitan']/forecast-period[@index='{}']/*[@type='uv_alert']"
_FIND_QUERY_4 = "./forecast/area[@type='metropolitan']/forecast-period[@index='{}']/*[@type='fire_danger']"

I’m just concentrating on uv_alert for now.
Here is where I am trying to read it directly after the forecast.

    def get_reading(self, condition, index):
        """Return the value for the given condition."""
        if condition == 'detailed_summary':
            if PRODUCT_ID_LAT_LON_LOCATION[self._product_id][3] == 'City':
                detailed_summary = self._data.find(_FIND_QUERY_2.format(index)).text
            else:
                detailed_summary = self._data.find(_FIND_QUERY.format(index, 'forecast')).text
            return (detailed_summary[:251] + '...') if len(detailed_summary) > 251 else detailed_summary
        
        if condition == 'uv_alert':
            if PRODUCT_ID_LAT_LON_LOCATION[self._product_id][3] == 'City':
                uv_alert_data = self._data.find(_FIND_QUERY_3.format(index)).text
                if uv_alert_data:
                    uv_alert = self._data.find(_FIND_QUERY_3.format(index)).text
            else:
                uv_alert_data = self._data.find(_FIND_QUERY.format(index, 'uv_alert')).text
                if uv_alert_data:
                    uv_alert = self._data.find(_FIND_QUERY.format(index, 'uv_alert')).text
            return uv_alert

For day_1 the xml file looks like:

    <forecast>
        <area aac="NSW_FA001" description="New South Wales" type="region">
            <forecast-period start-time-local="2019-11-16T16:25:11+11:00" end-time-local="2019-11-16T16:25:11+11:00" start-time-utc="2019-11-16T05:25:11Z" end-time-utc="2019-11-16T05:25:11Z">
                <text type="warning_summary_footer">Details of warnings are available on the Bureau's website www.bom.gov.au, by telephone 1300-659-218* or through some TV and radio broadcasts.</text>
                <text type="product_footer">* Calls to 1300 numbers cost around 27.5c incl. GST, higher from mobiles or public phones.</text>
            </forecast-period>
        </area>
        <area aac="NSW_ME004" description="Central Coast" type="metropolitan" parent-aac="NSW_FA001">
            <forecast-period index="0" start-time-local="2019-11-16T17:00:00+11:00" end-time-local="2019-11-17T00:00:00+11:00" start-time-utc="2019-11-16T06:00:00Z" end-time-utc="2019-11-16T13:00:00Z">
                <text type="forecast">Partly cloudy. Slight (20%) chance of a shower later tonight. Winds easterly 20 to 25 km/h tending northeasterly 25 to 35 km/h in the evening.</text>
            </forecast-period>
            <forecast-period index="1" start-time-local="2019-11-17T00:00:00+11:00" end-time-local="2019-11-18T00:00:00+11:00" start-time-utc="2019-11-16T13:00:00Z" end-time-utc="2019-11-17T13:00:00Z">
                <text type="forecast">Partly cloudy. Medium (60%) chance of showers, most likely in the morning and afternoon. The chance of a thunderstorm in the morning and early afternoon. Winds northeasterly 15 to 20 km/h shifting southerly 25 to 35 km/h in the morning then tending south to southeasterly 30 to 45 km/h in the early afternoon.</text>
                <text type="fire_danger">Very High</text>
                <text type="uv_alert">Sun protection 8:50am to 3:50pm, UV Index predicted to reach 8 [Very High]</text>
            </forecast-period>

for day_0

    <forecast>
        <area aac="NSW_FA001" description="New South Wales" type="region">
            <forecast-period start-time-local="2019-06-24T04:45:16+10:00" end-time-local="2019-06-24T04:45:16+10:00" start-time-utc="2019-06-23T18:45:16Z" end-time-utc="2019-06-23T18:45:16Z">
                <text type="warning_summary_footer">Details of warnings are available on the Bureau's website www.bom.gov.au, by telephone 1300-659-218* or through some TV and radio broadcasts.</text>
                <text type="product_footer">* Calls to 1300 numbers cost around 27.5c incl. GST, higher from mobiles or public phones.</text>
            </forecast-period>
        </area>
        <area aac="NSW_ME004" description="Central Coast" type="metropolitan" parent-aac="NSW_FA001">
            <forecast-period index="0" start-time-local="2019-06-24T00:00:00+10:00" end-time-local="2019-06-25T00:00:00+10:00" start-time-utc="2019-06-23T14:00:00Z" end-time-utc="2019-06-24T14:00:00Z">
                <text type="forecast">Partly cloudy. Very high (95%) chance of showers, becoming less likely this evening. Winds south to southwesterly 20 to 30 km/h turning southeasterly 25 to 35 km/h in the middle of the day then decreasing to 15 to 25 km/h in the late afternoon.</text>
                <text type="uv_alert">Sun protection not recommended, UV Index predicted to reach 2 [Low]</text>
            </forecast-period>

The full py file is here:

Can anyone not help with this?
@pnbruckner maybe or @exxamalte or anyone???
I’m just trying to hack this py component to get some extra info - I don’t know python and am struggling. When I forked that repo I made only minor cosmetic changes and this is just beyond my skills right now although to me it seems I’m just missing something simple.

First things first:
You have to cater for the case the the XPath yields an empty (=None) result, i.e. the XML you are looking for simply doesn’t exist. So you should first check if the XPath yields a result, i.e. remove the .text from the first line. Then in the if statement you check if there was a result, and if so, then you retrieve the text from that result.
The following is just one occurrence, and you should follow the same pattern in the rest of the code.

                uv_alert_data = self._data.find(_FIND_QUERY_3.format(index))
                if uv_alert_data:
                    uv_alert = uv_alert_data.text