Insteon plm - switches stop responding

Your log settings are correct. I am having trouble recreating the issue. I know a way to fix it but I am concerned it will create other problems. I will keep trying to recreate it.

Thanks! Let me know if there is anything else I can provide.

OK, I think I was able to recreate the issue but it would help to confirm the situation. Can you run the following command at the Hassbian level?
cat home-assistant.log | grep -E '1A.2B.3C' | grep -E 'Writing message|Processing message'
Replace 1A.2B.3C with the INSTEON address of a device that is not working. Use upper case for any letters.

Awesome! I had an extended power outage yesterday and everything got reset and so all of the switches are working right now. It takes anywhere from a couple hours to a couple days before a switch stops responding to HA. I’ll respond back with the log entries once that happens. Thanks.

I have posted a modification to my personal repo. You can find the file here:
https://github.com/teharris1/python-insteonplm/archive/stop-responding.tar.gz

Copy the file above on to your RasperryPi and untar it via:
tar -xvf stop-responding.tar.gz
This will create a directory called python-insteonplm-stop-responding. Change directory to this new directory. In your virtual environment use:
pip uninstall insteonplm
pip install .
This will uninstall the current insteonplm and install the patched version. Run it for a few days and tell me if the issue reoccurs.

Keep in mind I am still concerned about the messages related to trimming leading buffer garbage.

What is happening is that HA is sending a message to the PLM to send to the device. The PLM is supposed to send a acknowledgement of that message but the acknowledgement is getting garbled. So HA thinks that the message has not sent. The device that the message is intended for cannot send another message because the last message is still out there in limbo. The patch forces the original message to be resent even though it may have been sent correctly to begin with. But until the message is confirmed to be sent, that device is stuck.

The root cause of your issue is related to a likely underlying issue with your USB devices. I would suggest reinstalling from the ground up unfortunately.

Hold off downloading that file. I did some more testing and found an issue. I will let you know when it is ready.

OK it is ready. By the way, I am assuming you are using HA 0.67.x or 0.68.x. If you are using something less than 0.67 or if you are using 0.69.0b0 let me know because I will have to change the version number to correspond with the expected version.

Yes I’m on .67. I’ll give it a go when I get home this evening. Thanks for your help on this.

I downloaded the files but haven’t applied them yet. Since my power outage I haven’t seen the trimming message in the logs. I’ll give it one more day running like this … I’m interested to see if the trimming messages come back at the same time the issue starts back up. I assume they will.

FYI, the trimming message is back and one of the switches has stopped responding. I’ll go ahead and apply your changes and see what happens.

@joshbish Any update?

So far so good. I haven’t had any switches stop responding to HA and I’ve not seen the trimming message back. Would you expect that I will at some point see the trimming in the logs again, or does your change prevent that?

The only adverse affect I’ve seen is the response times are much longer. I have a scene at bedtime that turns off all the lights in the house, and while it usually takes a while, it has almost doubled. I’m guessing because of messages being resent?

I’ve actually been curious about that … with the insteon hub pro all of the switches would go on/off all at the same time instantly, but using the PLM with HA each light goes off one at a time and it takes a while if you have a lot of switches. Why is that? I assume it’s a limitation of the PLM only being able to send messages single threaded.

I would still expect to see the ‘trimming’ message to come back at some point since that is an issue at a lower level than the insteonplm library. I would not expect it to double but it does not surprise me that it takes longer. The change waits for an acknowledgement of the sent message and if it does not arrive, then it resends the message. That wait time is adding time to the overall message handling and is increasing the time between messages, but with the result of increased reliability. That feels like a reasonable trade-off. Let me know if you disagree.

As for instant on/off of all lights, that is a feature in INSTEON that I have not enabled in the library. You are the second person to ask for that recently, so I will bump that up in priority.

Thinking about this more, it is possible that this change is helping to manage the ‘trimming’ issue since there is now a pause waiting for the ACK/NAK message which would mean less conflicting read/write interactions with the USB port. This may result in less dropped bytes. I would not expect it to eliminate the issue but it may be helping to minimize the issue.

I agree that it’s an acceptable trade off, just wanted you to be aware. I appreciate the support.

Still going good. Question, are these changes going to be merged into a HA release? In other words, will I lose this change if I upgrade HA in the future?

@joshbish, these are not in the latest release yet but now that I see your confirmation I will push it to the latest release. Thanks for confirming.

Just pushed a change to HA. It should be in before the next release

FYI on a breaking change to both insteon_plm and insteonlocal. Please see this thread and give your input:

I’m fine with option 1.