Calling all ISY994 users!

I’ve had my ISY994 for a few years with a pretty complex setup and I just started playing with Home Assistant integration a few days ago. Overall, I really like what I see, but I did run into a weird problem last night where the state of one of my ISY scenes wouldn’t change in HA to track the state in the ISY or of the devices in the scene.

I added the problematic scene entity (as a switch) to the “overview” page and monitored the state while turning the switch on and off. Here’s the problematic sequence for that scene over the course of several hours last night:

  1. HA State for the scene starts “on” based on the lightning icon, the toggle switch, and the states UI
  2. Click the toggle switch to turn it off and the icon toggles off
  3. DOF is sent to the ISY and a successful response is returned
  4. Lights in the scene on the ISY turn off
  5. “state_changed” event triggered in HA from “on” to “off”
  6. HA state stays “on”, as determined by the lightning icon and states UI
  7. Click the toggle icon to turn it back on and the icon toggles
  8. Nothing shows in the log, presumably because HA thinks the device is still on
  9. Repeat from step 1

While this was happening with this scene, other scenes worked just fine from HA. I could also control the scene just fine from the ISY admin console, but the state wouldn’t update in HA when I did so. It’s like any state updates were blocked for this specific scene/switch, whether they originated from HA or the ISY, but not for anything else.

I could manually change the state of the device in the HA “states” screen, but it would then remain stuck in the new state. Firing a “state_changed” event for that entity manually didn’t change the state.
Restarting the HA instance didn’t fix the problem with that scene, and neither did removing the entity manually from the entity registry and letting HA rediscover it.

All this happened last night, and this morning, having changed nothing, I’m now able to control that scene again from HA and see the state change as I expect.

I realize that this is a bit of a hodgepodge and I’m mainly interested in

  • Whether there’s a known issue in here somewhere
  • If someone else has seen something like this
  • If there’s anything specific that I can do in the future to debug if it happens again

I have lots of logs with “debug” enabled, but it’s a lot of data that’s hard to directly correlate to my attempts to turn the scene on and off. It worked fine for two straight days until it started acting up this last night, and I’m running a mostly-virgin hassio installation of 0.87.1 in a VirtualBox VM on a Mac Mini with only a handful of automations. I even tried commenting out everything that I’d added without improving the situation, so I don’t think it’s related to something that I’ve added.

After lots and lots of debugging and looking through logs, I think I finally figured out what’s going on.

I had an on/off button from an Insteon mini remote in the problematic scene along with a KeypadLinc, and that button reports “on” or “off” depending on what I last pressed. If I last used it to turn on the light in that scene, it’ll report “on” forever until I press “off”, when it’ll then report “off” forever. Rebooting the ISY appears to clear the status until you use that button on the remote.

I think the flaky behavior always happened when I last used the “on” button on the remote, and that sticky state caused the scene itself to remain on no matter what the light was doing. If I hit the “off” button on the remote, the state of the scene follows the light and everything makes sense. The problem comes when I try to turn the scene off from the HA Overview panel and the toggle changed, but the scene itself remained “on” no matter what. It was correct regarding the scene as a whole, but it was really the remote button that was keeping the scene on and HA can’t control that.

This helps explain why I could still control the lights in the scene with the mini remote and the ISY admin console, but not by toggling the scene switch in HA. If HA just blindly sent “on” or “off” without regard for what it thinks the current state is, it would have properly controlled the light in the scene even if the state was “wrong”. Instead, HA looked at the state and only reacted when I wanted to change the state. Since the remote was always on in these cases, HA would only ever send “off” and I could never get it to turn the light on. Man, that was frustrating and confounding.

I think that one way to solve this is to just use an HA group, but it would be great if I could just get that switch/scene to ignore the state and send the command anyway. I hoped that “assumed_state: false” would do that, but it doesn’t seem to help. I’ll continue to poke around.

Edit: Nevermind, had it backward. “assumed_state: true” works as expected, and I just can’t trigger on a state change of that scene reliably in an automation. Using the controlled KeypadLinc is working fine.

Thanks a bunch @rccoleman - this is excellent information.

It’s worth noting that you CAN blindly send “on” and “off” commands to a swtich/scene in HA by calling the service directly (from inside automations, scripts, or custom lovelace setups, etc.) But you’re right that if the interface thinks the switch is off, it’ll always send on when you click it.

I’ll look in to the root cause here, which must be inside the PyISY library.

Yes, now that I have a couple of weeks of Home Assistant boot camp under my belt, I see that there are lots of ways to approach this. I think one is to forcibly change the state via set_state() in HA in response to pressing the buttons, and I did that for a few motion sensors to solve another problem. I just thought of that and will give it a try when I get a chance.

I think that it’s actually working as-designed in Home Assistant because the ISY is reporting the “on” state for the remote button, and I can see it toggle in the admin console. It also explains why I would always see the scene as like “13% on” in Agave & Mobilinc, even if the lights were off. A workaround might be to ignore the state of the buttons on mini-remotes, but I don’t know if that’s possible. I don’t want to ignore the devices completely because I use the button events.

Ok, now I think I’m seeing a real issue. I might be slamming the ISY with REST calls because I can only get a sequence to work properly by inserting 1 second delays between calls. Here’s the action: part of the HA automation:

  action:
    - data:
        entity_id: switch.kit_lights_scene (ISY scene with Insteon devices)
      service: switch.turn_on
    - data:
        entity_id: switch.dnh_lights_scene (ISY scene with Insteon devices)
      service: switch.turn_on
    - data:
        entity_id: switch.laundryrm_light_scene (ISY scene with Insteon devices)
      service: switch.turn_on
    - data:
        entity_id: group.fr (HA group)
      service: homeassistant.turn_on

The first two clauses are scenes from the ISY, and the third is a group that looks like this:

  fr:
    name: Family Room
    entities:
      - switch.fr_reading_light (HA Z-Wave switch)
      - switch.fr_table_light (HA Z-Wave switch)
      - light.fr_torchiere_lights (ISY KeypadLinc)

When I trigger the action at the top, I see lots of traffic to the ISY and most of the lights come on, but the last item (the torchiere lights) never come on. I can trigger the automation over and over again and I get the same results every time.

On the ISY, I get the following in the error log:

Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36-5] 192.168.1.199:42164->80
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36]: GET-->/rest/nodes/47494/cmd/DON
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Socket error
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Closing socket w/failure
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36-5] 192.168.1.199:42172->80
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36]: GET-->/rest/nodes/36117/cmd/DON
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Socket error
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Closing socket w/failure
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36-5] 192.168.1.199:42174->80
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36]: GET-->/rest/nodes/20615/cmd/DON
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Socket error
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Closing socket w/failure
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36-5] 192.168.1.199:42176->80
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36]: GET-->/rest/nodes/3C%20EB%20EC%201
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Socket error
Wed 2019/02/27 08:53:58 PM System -170001 [HTTP:0-36] Closing socket w/failure

The last REST call is the one for the torchiere lights and I see no activity in the ISY activity monitor for it at all. The others all seem to work anyway, despite the failures. Here’s a snippet from the HA log with ISY994 debug turned on:

2019-02-27 21:09:46 INFO (MainThread) [homeassistant.helpers.script] Script Turn on lights garage open: Executing step call service
2019-02-27 21:09:46 INFO (SyncWorker_14) [homeassistant.components.isy994] ISY Request: http://192.168.1.218:80/rest/nodes/20615/cmd/DON
_50
_51
2019-02-27 21:09:46 INFO (SyncWorker_14) [homeassistant.components.isy994] ISY Response Recieved
2019-02-27 21:09:46 INFO (SyncWorker_14) [homeassistant.components.isy994] ISY turned on scene: 20615
2019-02-27 21:09:46 INFO (SyncWorker_0) [homeassistant.components.isy994] ISY Request: http://192.168.1.218:80/rest/nodes/3C%20EB%20EC%201/cmd/DON
_50
_51
_71[INST-TX-I1 ] 02 62 00 00 11 CF 11 00
_71[INST-ACK ] 02 62 00.00.11 CF 11 00 06 LTONRR (00)
2019-02-27 21:09:46 WARNING (SyncWorker_0) [homeassistant.components.isy994] Bad ISY Request: http://192.168.1.218:80/rest/nodes/3C%20EB%20EC%201/cmd/DON
2019-02-27 21:09:46 WARNING (SyncWorker_0) [homeassistant.components.isy994] ISY could not turn on node: 3C EB EC 1
2019-02-27 21:09:46 DEBUG (SyncWorker_0) [homeassistant.components.isy994.light] Unable to turn on light
_50

You can see the REST call for the problematic lights in the middle, the failure at the bottom, and no activity for that device in between.

Here’s what it looks like with a 1 second delay between each section of the action:

2019-02-27 21:13:30 INFO (SyncWorker_1) [homeassistant.components.isy994] ISY Request: http://192.168.1.218:80/rest/nodes/3C%20EB%20EC%201/cmd/DON
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.components.automation] Executing FR Reading Light Status
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.helpers.script] Script FR Reading Light Status: Running script
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.helpers.script] Script FR Reading Light Status: Executing step call service
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.components.automation] Executing Downstairs Lights On
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.helpers.script] Script Downstairs Lights On: Running script
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.helpers.script] Script Downstairs Lights On: Executing step call service
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.helpers.script] Script Downstairs Lights On: Executing step delay 0:00:03
2019-02-27 21:13:30 INFO (SyncWorker_15) [homeassistant.components.switch.command_line] Running command: curl -X GET http://[email protected]/rest/vars/set/2/64/1
2019-02-27 21:13:30 DEBUG (Thread-2) [homeassistant.components.isy994] ISY Update Received:
_71[INST-ACK    ] 02 62 00.00.1D CF 11 00 06          LTONRR (00)
_51
_71[INST-TX-I1  ] 02 62 3C EB EC 0F 11 80
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.components.automation] Executing FR Table Lamp Status
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.helpers.script] Script FR Table Lamp Status: Running script
2019-02-27 21:13:30 INFO (MainThread) [homeassistant.helpers.script] Script FR Table Lamp Status: Executing step call service
2019-02-27 21:13:30 INFO (SyncWorker_14) [homeassistant.components.switch.command_line] Running command: curl -X GET http://[email protected]/rest/vars/set/2/65/1
_50
2019-02-27 21:13:30 INFO (SyncWorker_1) [homeassistant.components.isy994] ISY Response Recieved
2019-02-27 21:13:30 INFO (SyncWorker_1) [homeassistant.components.isy994] ISY turned on node: 3C EB EC 1
2019-02-27 21:13:30 INFO (SyncWorker_1) [homeassistant.components.isy994] ISY updated node: 3C EB EC 1
_71[INST-ACK    ] 02 62 3C.EB.EC 0F 11 80 06          LTONRR (80)
_160120190227 21:13:35
2019-02-27 21:13:30 INFO (Thread-2) [homeassistant.components.isy994] ISY Updated Variable: 65
_13[VAR  2   65 ]       1
_160120190227 21:13:35
2019-02-27 21:13:30 INFO (Thread-2) [homeassistant.components.isy994] ISY Updated Variable: 64
_13[VAR  2   64 ]       1
_71[INST-SRX    ] 02 50 3C.EB.EC 3D.95.02 2F 11 80    LTONRR (80)
_71ISY/PLM Group=0, Max Hops=3, Hops Left=3]]>
ST1283C EB EC 150%
2019-02-27 21:13:30 INFO (Thread-2) [homeassistant.components.isy994] ISY Updated Node: 3C EB EC 1
2019-02-27 21:13:30 INFO (Thread-2) [homeassistant.components.isy994] ISY Node Control Event: 3C EB EC 1 ST
_71[D2D EVENT   ] Event [3C EB EC 1] [ST] [128] uom=100 prec=0

Much better. I have several node servers on the ISY that hammer the ISY with REST requests without seeing these socket errors, so I’m not entirely convinced that that’s the problem.

Has anyone seen anything like this?

Edit: I think that this is exactly the same issue: https://forum.universal-devices.com/topic/24970-need-help-to-interpret-the-isy-log/. Michel says that the ISY will explicitly not send and will fail a command if it thinks that the Insteon network is “overwhelmed”, so maybe we need to detect that and resend the command in PyISY, or in the ISY994 component? The latter knows that the command failed.

Edit: This modification to PyISY seems to help:

    def request(self, url, retries=0, ok404=False):                            
    if self.parent.log is not None:                                        
        self.parent.log.info('ISY Request: ' + url)                        
                                                                           
    try:                                                                   
        r = requests.get(url, auth=(self._username, self._password),       
                timeout=10, verify=False)                                  
                                                                           
    except requests.ConnectionError as err:                                
        self.parent.log.error('ISY Could not recieve response '            
                              + 'from device because of a network '        
                              + 'issue.')                                  
        return None                                                        
                                                                           
    except requests.exceptions.Timeout:                                    
        self.parent.log.error('Timed out waiting for response from the '   
                              + 'ISY device.')                             
        return None                                                        
                                                                           
    if r.status_code == 200:                                               
        self.parent.log.info('ISY Response Recieved')                      
        # remove unicode from string in python 2.7, 3.2,                   
        # and 3.4 compatible way                                           
        xml = ''.join(char for char in r.text if ord(char) < 128)          
        return xml                                                         
    elif r.status_code == 404 and ok404:                                   
        self.parent.log.info('ISY Response Recieved')                      
        return ''                                                       
    else:                                                               
        self.parent.log.warning('Bad ISY Request: {} {}: retry #{}'.format(ur
                                r.status_code, retries))                     
        # sleep for half a second                                            
        time.sleep(.5)                                                       
                                                                             
        if retries < 3:                                                      
          return self.request(url, retries+1, ok404=False)                   
        else:                                                                
          return None         

Basically, add a 0.5s delay and retry, up to three. When I see a failure, I sometimes go through 2 retries before it finally succeeds. Increasing the delay would probably increase the chance of success. In the meantime, I’ll see if the PyISY dev has some thoughts, and I’m curious to hear from anyone here.

While I think the best place for this would be in PyISY, dependencies are a pain to update in hass.io. I made a similar change in the isy994 component and it’s working fine for me as a custom component. If anyone is interested in the changes, let me know.

If you make the changes in the PyISY library and submit a pull request, I can help ensure it gets merged and we can make the next version of HA support the changes.

I saw your issue over on PyISY and this is a phenomenal discovery! I thought all along that this was just collisions in the Insteon network itself, which meant there was not much we could do. But if we reliably get 404s from the ISY and can just retry the message, that’s great!

It seems to err on the side of caution in that sometimes the ISY still executes the command even if it returns an error, and there’s no way that I know of to detect that. In any case, Michel confirmed in the linked thread above that the ISY will return a 404 error when it thinks that the Insteon network is overwhelmed. I have to believe that there’s a better response than “404 Not Found” for situations like this, maybe “503 Service Unavailable”, but it is what it is and I don’t foresee it changing anytime soon. I increased the delay to 1s and the retries to 5 to improve reliability in my setup where I’m turning on a bunch of lights all at once.

I’m definitely not a Python developer (yet), so there may be a better way to do this. This was just the quickest with the fewest changes that I could think of. I’ll revisit the PyISY change and figure out how to submit a pull request :slight_smile:. Thanks for the offer of assistance.

Thanks for all of the research and info here!

One big suggestion I have for all Insteon and ISY users: For ANY automations or scripts you have that will change multiple Insteon lights at once, make a scene for it! This is one of the most powerful aspects of the Insteon system: you can make nearly unlimited number of scenes, so make use of them! For any automation where you want to change 2+ Insteon devices, make a scene for it and just toggle that “switch” in HA instead. This will not only make the reliability be nearly perfect, but also create a better user experience because all of the lights will change at the same time :slight_smile:

Agreed, and I’m doing that wherever I can. I’m still struggling with “fixing” the state of battery operated devices that are part of ISY scenes so that those scenes can be reliably used for status and control, but I think I’m just missing some corner cases. That’s more of a hassle than I first thought, but I’m getting there.

The retry mechanism that I’m using in my custom ISY994 component has been working great, and the only time when I’ve seen a failure is when I accidentally left the “entity_id: xx” out of an action and Hass tried to turn every single switch on at once. The logs were a bloodbath of failures and retries, made even more frustrating because I’d turn a light off only to have a subsequent retry turn it back on again :). I’m sure that the much simpler change to PyISY will work just as well and I’ll test and submit a pull request this weekend.

I can’t thank you enough for doing the legwork to get this up and running on Hass - it’s really awesome. 3 weeks later, I now have all my ISY programs migrated over to Hass, with the last ones for the leak sensors. I expected to have to write a similar suite of programs like the ones I use on the ISY, but it was so much easier on Hass because of the automatic heartbeat monitoring. Really great work!

The ISY is still acting as an Insteon scene manager and conduit to Alexa (much cheaper than Nabu Casa, and easier to use), but otherwise I’m a total Hass-convert now.

Pull request submitted: https://github.com/automicus/PyISY/pull/46

Let me know if I screwed anything up :slight_smile:.

@OverloadUT I think I’m finally making some progress getting the PyISY PR accepted, but it would be great if you could back up my testing with some of your own. Do you have some time to try it out?

Hey @OverloadUT, hope you have been doing fine. I finally had the time to look into the slowness problem between the ISY and HASS, and I’m happy to report that I found what the problem is and I already have a solution. The PyISY component reads the input buffer from the event subscription from the ISY using the readline() function, assuming that all the data elements come separated by a ‘\n’ (newline) character, which is correct for the HTTP header. The XML sent by the ISY is never terminated by a \n, so readline() returns until the ‘\n’ is received as part of the first HTTP header field in the next message. In the worst case scenario, an important message will stay unprocessed for up to 25 seconds, which is the frequency in which the ISY sends a Heartbeat message. I’m super happy to share the fix, just would like to quickly chat with you to understand the process by which I will submit the pull request.

6 Likes

MY GOD. That’s an amazing discovery, and a nasty one to find! I figure it had to be something pretty low-level otherwise someone would have figured it out by now.

Do you mind reaching out to me on Discord? Happy to have a text chat there or do a call if you’d like. I will expedite getting this tested and implemented ASAP. This will be a MASSIVE improvement to ISY support!

@altersis, great catch! I’d love to see your proposed fix for it. I’ve already been working on a cleanup effort here and here and would be happy to roll your changes in with that update. I already combined the separate methods for TLS and non-TLS connections, but the non-TLS still uses readline (I have a node server that updates every second, so I haven’t seen any major delays).

@OverloadUT, tried Discord this AM, left you a message. In the meantime, I’ll open the bugs in PyISY and Hass to start tracking this fix.

@shbatm I spoke to altersis this morning. Our plan of record is that we’ll get his fix in to PyISY ASAP so we can get that fix into Hass, possibly even in a patch release.

Once that’s merged, you’ll want to port his fix in to your refactor branch so it’ll be included with that behemoth update, which I am starting to slowly get through reviewing :slight_smile:

3 Likes

Sounds like a plan.

This is awesome news. Thanks, everyone, for enhancing this component.

:tada::tada::tada::tada::tada::tada:

Once this PR is merged, we should get the bug fix for the delayed events in ISY994! Finally we can write automations and fully expect them to trigger in less than 1 second!

Edit: It has already been released! The fix is in 0.98.3

1 Like

Has the issue with using an automation in more the 5 ISY devices been addressed? Or thought about? It seems when you send more the 5 actions the subsequent ones are dropped because they aren’t sent as a batch action.