Insteon plm - switches stop responding

I just recently started using the insteon plm component and have been having issues where random switches will stop working until I restart HA. So far it’s never been the same switch twice. I’ve turned on debug for the insteon plm component and I’m not seeing any errors, it just stops after a few lines. This is what I see in the log when I try to turn on/off a switch that has stopped responding …

2018-05-02 19:24:54 DEBUG (MainThread) [insteonplm.devices] Starting Device._send_msg
2018-05-02 19:24:54 DEBUG (MainThread) [insteonplm.devices] Ending Device._send_msg
2018-05-02 19:24:54 DEBUG (MainThread) [insteonplm.devices] Starting Device._process_send_queue

Normally I see something like this for a switch that is working …

2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.devices] Starting Device._send_msg
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.devices] Ending Device._send_msg
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.devices] Starting Device._process_send_queue
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.devices] Attempt to acquire lock
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.devices] Lock acquired
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.plm] Starting: send_msg
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.plm] Ending: send_msg
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.devices] Ending Device._process_send_queue
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.plm] Aquiring write lock
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.plm] True
2018-05-02 19:24:23 DEBUG (MainThread) [insteonplm.plm] Writing message: {‘code’: 0x62, ‘address’: 37.66.E4, ‘flags’: 0x00, ‘cmd1’: 0x13, ‘cmd2’: 0x00, ‘acknak’: 0xNone}
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Starting: data_received
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Received 20 bytes from PLM: b’02623766e40013000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Total buffer: b’5820130002623766e40013000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Starting: _peel_messages_from_buffer
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’5820130002623766e40013000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’20130002623766e40013000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’130002623766e40013000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’0002623766e40013000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’0013000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’13000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’000602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’0602503766e449f958201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’58201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’201300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’1300’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’00’
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Finishing: _peel_messages_from_buffer
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Messages in queue: 2
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Processing message {‘code’: 0x62, ‘address’: 37.66.E4, ‘flags’: 0x00, ‘cmd1’: 0x13, ‘cmd2’: 0x00, ‘acknak’: 0x06}
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Processing message {‘code’: 0x50, ‘address’: 37.66.E4, ‘target’: 49.F9.58, ‘flags’: 0x20, ‘cmd1’: 0x13, ‘cmd2’: 0x00}
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Last item in self._recv_queue reached.
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] Finishing: data_received
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Starting Device._receive_message
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Got Message ACK
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Look for direct ACK
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Ending Device._receive_message
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Starting Device._receive_message
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Got Direct ACK message
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Lock is locked
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Ending Device._receive_message
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Starting Device._wait_for_direct_ACK
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Releasing lock
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.devices] Ending Device._wait_for_direct_ACK
2018-05-02 19:24:24 DEBUG (MainThread) [insteonplm.plm] True
2018-05-02 19:24:26 DEBUG (MainThread) [insteonplm.plm] No new messages received.

What is your platform? Whenever you see “Trimming leading buffer garbage” it is usually an issue with the serial or USB port config.

It’s a Raspberry Pi. I had tried switching to a different USB port and still had the issue today.

hass.io or Hassbian?

Manual install of HA on Raspbian within a python 3.6 virtual env.

Right now I have two switches not working. All others work fine. Based on what I’ve experienced over the last few weeks those two switches will continue to not work and never automagically start working again until HA is restarted.

Thanks for your help.

If you change the switch manually, does HA recognize the change? How long have you had this setup? Did it break recently, is it new, or has it been broken for a few releases?

I’ve had this HA setup minus the insteon plm component for a long time. I started using the insteon plm about a month ago, starting with .67 I believe. It has behaved this way since I started using it.

Yes, HA registers the change in state if I change the switch manually. But HA still cannot control after changing it manually. Here is what shows in the logs when I change the switch manually …

2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Starting: data_received
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Received 11 bytes from PLM: b’025037399b49f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Total buffer: b’f958251300025037399b49f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Starting: _peel_messages_from_buffer
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’f958251300025037399b49f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’58251300025037399b49f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’251300025037399b49f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’1300025037399b49f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’00025037399b49f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’f958451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’58451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’451101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’1101’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Buffer content: b’01’
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.messages] Trimming leading buffer garbage
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Finishing: _peel_messages_from_buffer
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Messages in queue: 1
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Processing message {‘code’: 0x50, ‘address’: 37.39.9B, ‘target’: 49.F9.58, ‘flags’: 0x45, ‘cmd1’: 0x11, ‘cmd2’: 0x01}
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Last item in self._recv_queue reached.
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.plm] Finishing: data_received
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.devices] Starting Device._receive_message
2018-05-03 21:48:21 DEBUG (MainThread) [insteonplm.devices] Ending Device._receive_message

And then trying to control from HA again (no response and switch state changes back in HA to actual) …

2018-05-03 21:57:29 DEBUG (MainThread) [insteonplm.devices] Starting Device._send_msg
2018-05-03 21:57:29 DEBUG (MainThread) [insteonplm.devices] Ending Device._send_msg
2018-05-03 21:57:29 DEBUG (MainThread) [insteonplm.devices] Starting Device._process_send_queue

If I create test code are you able to install manually from a git repository? Is that something you are technically comfortable with?

It’s not something I’ve done before, but with a little help I could probably do it. I guess that’s different than just dropping in files?

If you can untar/unzip a file and run pip3 you should be good.

Good news is I see part of the problem on why these devices never recover. I can create a patch for that. The bad news is I still think there is an underlying issue with your USB setup that may be driver related. I have had two other people with similar issues (i.e. "Trimming leading buffer garbage’) and a reinstall of the OS was required. The thing is they were not on Raspbian so I cannot speak to how that works with USBs.

In your logs can you search for any message that says: 'No direct ACK messages received."?

I don’t have that in my logs. Do I need to change my logger settings?

logger:
default: warn
logs:
insteonplm: debug
homeassistant.components.insteon_plm: debug
homeassistant.components.light.insteon_plm: debug
homeassistant.components.switch.insteon_plm: debug

Your log settings are correct. I am having trouble recreating the issue. I know a way to fix it but I am concerned it will create other problems. I will keep trying to recreate it.

Thanks! Let me know if there is anything else I can provide.

OK, I think I was able to recreate the issue but it would help to confirm the situation. Can you run the following command at the Hassbian level?
cat home-assistant.log | grep -E '1A.2B.3C' | grep -E 'Writing message|Processing message'
Replace 1A.2B.3C with the INSTEON address of a device that is not working. Use upper case for any letters.

Awesome! I had an extended power outage yesterday and everything got reset and so all of the switches are working right now. It takes anywhere from a couple hours to a couple days before a switch stops responding to HA. I’ll respond back with the log entries once that happens. Thanks.

I have posted a modification to my personal repo. You can find the file here:
https://github.com/teharris1/python-insteonplm/archive/stop-responding.tar.gz

Copy the file above on to your RasperryPi and untar it via:
tar -xvf stop-responding.tar.gz
This will create a directory called python-insteonplm-stop-responding. Change directory to this new directory. In your virtual environment use:
pip uninstall insteonplm
pip install .
This will uninstall the current insteonplm and install the patched version. Run it for a few days and tell me if the issue reoccurs.

Keep in mind I am still concerned about the messages related to trimming leading buffer garbage.

What is happening is that HA is sending a message to the PLM to send to the device. The PLM is supposed to send a acknowledgement of that message but the acknowledgement is getting garbled. So HA thinks that the message has not sent. The device that the message is intended for cannot send another message because the last message is still out there in limbo. The patch forces the original message to be resent even though it may have been sent correctly to begin with. But until the message is confirmed to be sent, that device is stuck.

The root cause of your issue is related to a likely underlying issue with your USB devices. I would suggest reinstalling from the ground up unfortunately.

Hold off downloading that file. I did some more testing and found an issue. I will let you know when it is ready.

OK it is ready. By the way, I am assuming you are using HA 0.67.x or 0.68.x. If you are using something less than 0.67 or if you are using 0.69.0b0 let me know because I will have to change the version number to correspond with the expected version.