PySerial threaded thread already stopped when trying to use in HA, but standalone works fine?

I’m working on a component for the Elk M1 Gold (and related) alarm / integration panels. I started with building a stand alone bit of Python which seemed to work reasonably well (at least with limited testing of processing Elk messages and so on, not everything implemented yet by far), and I used the serial.threaded version of PySerial, which worked fine - when new messages come in, they get parsed and put into a deque to be processed later whenever the main code gets around to it.

However, this breaks when I try to turn it into a HA component, and I’m wondering if it’s related to whatever special sauce HA is using with threading and coroutines and all that. https://home-assistant.io/developers/asyncio/ implies that you can still make non-async stuff, and that’s what I was going to do at first (since my existing code wasn’t using async / coroutines, and async PySerial is still experimental, apparently).

However, when I spin up the PySerial thread to make the connection to the Elk (in my case via TCP socket, but using PySerial means RS-232 can be supported with no changes other than setting the ‘host’ accordingly to a device), and call it’s connect method, I get an error that the thread has already stopped.

I’m not sure if this is specific to trying to use some other threaded thing from within HA, and there’s some other way of handling it without going to asyncio, or … ? I also don’t entirely understand how HA handles giving time to the various components to do things yet, so perhaps going serial.threaded is overkill and just plain serial would work (but this might preclude me from being able to use async methods elsewhere in the component/sensors/etc?).

This is my first time doing anything interesting in Python, I’ve dabbled in the past but nothing involving threads and such.

This is on the current (?) docker image from 9 days ago, image id b2eb6220af84

2017-05-17 19:59:18 ERROR (MainThread) [homeassistant.setup] Error during setup of component elkm1
Traceback (most recent call last):
  File "/usr/src/app/homeassistant/setup.py", line 188, in _async_setup_component
    None, component.setup, hass, processed_config)
  File "uvloop/future.pyx", line 230, in __iter__ (uvloop/loop.c:110600)
  File "uvloop/future.pyx", line 432, in uvloop.loop.BaseTask._fast_wakeup (uvloop/loop.c:113980)
  File "uvloop/future.pyx", line 101, in uvloop.loop.BaseFuture._result_impl (uvloop/loop.c:108900)
  File "/usr/local/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/elkm1.py", line 1275, in setup
    ELK = PyElk(address=host, usercode=code, log=_LOGGER)
  File "/config/custom_components/elkm1.py", line 1018, in __init__
    self.connect(address)
  File "/config/custom_components/elkm1.py", line 1050, in connect
    self._connectionTransport, self._connectionProtocol = self._connectionThread.connect()
  File "/usr/local/lib/python3.5/site-packages/serial/threaded/__init__.py", line 238, in connect
    raise RuntimeError('already stopped')
RuntimeError: already stopped

Here’s the relevant code that it’s erroring at:

1046     def connect(self, address):
1047         self._connection = serial.serial_for_url(address, timeout=1)
1048         self._connectionThread = serial.threaded.ReaderThread(self._connection, LineHandler) 
1049         self._connectionThread.start()
1050         self._connectionTransport, self._connectionProtocol = self._connectionThread.connect()

What does your protocol LineHandler look like? Does the connection_made method of your protocol get called successfully? Try adding some debug print statements in the run method of the serial ReaderThread class, to figure out at what point the thread is marked not alive.

1 Like
 904 class LineHandler(serial.threaded.LineReader):
 905     _pyelk = None
 906
 907     def set_pyelk(self, pyelk):
 908         self._pyelk = pyelk
 909
 910     # Implement Protocol class functions for Threaded Serial
 911     def connection_made(self, transport):
 912         super(LineHandler, self).connection_made(transport)
 913         self._pyelk._connected = True
 914
 915     def handle_line(self, data):
 916         # Validate event and add to incoming buffer
 917         self._pyelk.elk_event_enqueue(data)
 918
 919     def connection_lost(self, exc):
 920         if exc:
 921             traceback.print_exc(exc)
 922         self._pyelk._connected = False

I added some _LOGGER.error() calls in the LineHandler, and the connection_made method does get called, then seemingly immediately the connection_lost method is called.

The code works fine when run outside of HA (even inside HA’s docker, I tested to eliminate some problem with the docker container,just not in HA itself), but I can’t think of any reason HA should cause it to become disconnected?

I’m not sure how to go about trying to debug the PySerial code itself. If I was running in interactive mode I could use pdb, or even just write to stdout, but not sure how I can get any info out of it from inside HA. I tried using logging, but I didn’t seem to get any output, likely because there’s no way to pass in the same instance that HA is using.

Looking at serial.threaded, it would appear it immediately or almost immediately goes from being connected (line 185) to not connected (dropping out of the while loop and going disconnected).

I get logging output after super(LineHandler, self).connection_made(transport) in my LineHandler, so it should be connecting (rather than failing inside the super and throwing an exception there), so probably the first exception block from 186…190 doesn’t trigger. So somehow self.alive or self.serial.is_open must be changing to drop it out of the while loop?

https://github.com/pyserial/pyserial/blob/master/serial/threaded/init.py#L179

Here’s my current code, please excuse the badness etc :smiley:

I wasn’t planning to put any of it up until it was both functional and not full of ugly, but maybe it will help find my problem. I can just abuse git to retroactively make it look like my first commit was stellar :stuck_out_tongue:

It would go in config/custom_components/

Just add to configuration something like

elkm1:
  host: socket://1.2.3.4:2101
  code: 1234

Alternatively host: /dev/ttyUSB0 etc, though you may have a hard time testing it past making a connection without an actual Elk. I think it should just time out all the attempts to gather information from the Elk as long as it’s connected to something, though it would take a long time to finish starting. If you connect to something you can see the serial / TCP stream on, you’d at least see it’s attempts to communicate as proof it’s “working”.

Just converted my HA version back to standalone and discovered I’d introduced a “race condition” while doing some cleanup, but it wasn’t the cause of the real problem, fixing it had no effect. Updated github with the standalone test version - just run edit test_elkm1.py to set code and host and then run it with python. Only tested with python3.5.

Well, maybe that made a difference on HA side, after all, though it didn’t seem to matter on standalone. At least I don’t appear to have fiddle anything else, and now I got error about component failing to initialize instead (which is because my quick fix to my “race condition” meant that it wasn’t flagged as connected so it would return False in the HA component side of things … ). Commenting out the test for connected gets it to finish startup without further issues, so it looks like I may have figured out my problem … thanks for helping me figure it out :facepalm:

TL;DR : Attempting to set a property of a (not yet set) object from within LineHandler to set the connected flag so that in turn the HA component could see it was connected. I changed how I was doing this and broke it, while converting from standalone to component. At first fixing this didn’t appear to fix it, but maybe I just hadn’t correctly copied the updated version to custom_components/ … For whatever reason, the actual error didn’t appear in the log when running under HA, instead it just immediately closed the connection and fired the connection closed method and so on. Standalone, I got the expected trace dump and crash.

1 Like

You can add an __init__ method to your protocol where you add elk, and instantiate the protocol before passing it to the serial reader thread. You can use a lambda expression to pass the instance instead of a class to the reader, to workaround the reader calling it again.

I may look into that (lambdas and such), at least so I know how to do it, and perhaps make setting the Elk instance cleaner by passing it instead of setting it via a method, but for now I solved the immediate issue by just having Elk query serial’s connection state instead of having the LineHandler set the connection state in Elk in response to calls from serial. :slight_smile:

I’m now working on implementing sensor, binary_sensor, etc platforms.