APsystems APS ECU R local inverters data pull

What’s the checksum, and how do I calculate it?

Here’s the issues in my log over the last day or so. While there are errors, it hasn’t caused any stability issues with my HA instance. And since this is with bad data I’m not sure what other behavior you’d like. I can certainly catch these exceptions and maybe log in a little more useful of a format, but if I don’t get data from the ECU-R I’m not sure what I should do.

2021-01-03 02:33:11 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: invalid literal for int() with base 16: b''
2021-01-03 02:34:11 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: invalid literal for int() with base 16: b''
2021-01-03 03:03:29 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: [Errno 113] Host is unreachable
2021-01-03 03:04:49 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: ECU query didn't return minimum 16 bytes, no inverters active.
2021-01-03 11:33:16 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: ECU query didn't return minimum 16 bytes, no inverters active.
2021-01-03 11:53:16 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: invalid literal for int() with base 16: b''
2021-01-03 13:28:16 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: invalid literal for int() with base 16: b''
2021-01-04 02:33:21 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: invalid literal for int() with base 16: b''
2021-01-04 02:34:21 ERROR (MainThread) [custom_components.apsystems_ecur] Unexpected error fetching apsystems_ecur data: invalid literal for int() with base 16: b''

I have some other errors very similar from other “official” components, like the DenonAVR and the MyQ ones.

Ok, the checksum is given in the received datastring (byte 6-9) it’s a sum of the supposed to be string length that should have been received if all goes well (look at my datasheet in the previous post in this thread). For example the string: “APS1100160001END” has a string length of 16. So no difficult CRC or something :slight_smile:

I would (in pseudo code) think of something like; if length datastring <> byte 6-9 exit function and wait for the next pull which will hopefully be more succesfull.

If you happen to look within the same minute as the error occurred, you don’t have to break a sweat, you’re just looking at some older data and not an entity error. Saves log lines also :slight_smile:

So until now we’ve seen the following possible errors:
[Errno 113] Host is unreachable
[Errno 111] Connection refused
invalid literal for int() with base 16: b’’
ECU query didn’t return minimum 16 bytes, no inverters active.

Ok, checking string length shouldn’t be too difficult - as long as I get that piece of data. I’ll work on that, but probably won’t get to it until this evening.

No problem, I’m so thankfull for the things you’ve done allready!

1 Like

wasnt using service mode anymore, i have these issues on normal mode and overnight. Moving the device to different Wifi logic, i have seen other devices not so well dealing with multi AP wifi ssid. Will see if stability changes now.

@ksheumaker, also consider using \n as it is a common part of serial transport protocol. I’ve seen that after every END in the sniffing sessions. Maybe that reduces errors.

So it would make the query suffix “END\n” if I got it right? Btw., I didn’t detect \r (carriage return) during sniffing sessions, just did a double check.

Let’s see what \n does for this issue, you’ve used it in the beginning and I was triggered by the fact that incorrect formatted send strings (2 digits more in ECU-ID) are also being interpreted but with wrong responses (no closing END in the string). Incorrect formatted send strings can produce incorrect data responses I’d conclude. I have implemented the \n as suggested and HA still runs (fingers crossed).

Sure, good catch. I can add the \n. And yes, you are correct, you can change this:


        self.ecu_query = 'APS1100160001END'
        self.inverter_query_prefix = 'APS1100280002'
        self.inverter_query_suffix = 'END'

        self.inverter_signal_prefix = 'APS1100280030'
        self.inverter_signal_suffix = 'APS1100280030'

to

        self.ecu_query = 'APS1100160001END\n'
        self.inverter_query_prefix = 'APS1100280002'
        self.inverter_query_suffix = 'END\n'

        self.inverter_signal_prefix = 'APS1100280030'
        self.inverter_signal_suffix = 'END\n'

Looks like a had a bug in inverter_signal_suffix - where it wasn’t END but the same as the prefix. So that also might fix some things.

Wow i didn’t look at the code in detail, good thing you bumped into it. This should bring an improvement. nevertheless good practice to include that checksum. And I think the suffix can also become a global suffix variable. That saves copy and paste errors :wink:

Ok new version pushed to github. I fixed the ‘END’ issue on the one query, and added ‘\n’ to all queries of the ECU. I also implemented the “checksum” of comparing the length of returned data. This also attempts to be a little better with error handling - probably more still needs to be done there. You should see warning messages like this in your log now “Using cached data from last successful communication from ECU. Error: XXXX”. This includes checksum failures or other connection type issues.

Maybe we should eventually return unavailable if it fails after X times, or the cached data is more than X minutes old? Otherwise you could think things are working correctly, until you checked the logs.

The bigger change is now the querying of the ECU is now done correctly (I think) with the way home assistant does asynchronous I/O. There was a chance in previous versions that querying the ECU could block the main HA thread. This I believe has been fixed - but i’m pretty novice when it comes to writing HA integrations.

any body get some of these?
it doesnt recover from this until i reset the ECU

Logger: custom_components.apsystems_ecur
Source: custom_components/apsystems_ecur/APSystemsECUR.py:60
Integration: APSystems PV solar ECU-R ([documentation](https://github.com/ksheumaker/homeassistant-apsystems_ecur))
First occurred: 9:22:07 AM (56 occurrences)
Last logged: 10:17:07 AM

Unexpected error fetching apsystems_ecur data: [Errno 111] Connection refused

Traceback (most recent call last): File "/srv/hass/lib/python3.7/site-packages/homeassistant/helpers/update_coordinator.py", line 144, in async_refresh self.data = await self._async_update_data() File "/srv/hass/lib/python3.7/site-packages/homeassistant/helpers/update_coordinator.py", line 132, in _async_update_data return await self.update_method() File "/home/pi/PI/.homeassistant/custom_components/apsystems_ecur/__init__.py", line 47, in async_update_data return await ecu.async_query_ecu() File "/home/pi/PI/.homeassistant/custom_components/apsystems_ecur/APSystemsECUR.py", line 55, in async_query_ecu return self.query_ecu() File "/home/pi/PI/.homeassistant/custom_components/apsystems_ecur/APSystemsECUR.py", line 60, in query_ecu sock.connect((self.ipaddr,self.port)) ConnectionRefusedError: [Errno 111] Connection refused

Last night I only applied the \n and this morning I’ve seen the same error (connection refused) once. With the exception of the "invalid literal for int() with base 16: b’’ errors (5 times) I haven’t had any issues. So I expect that the \n and/or the bugfix in the suffix did well for the “ECU query didn’t return minimum 16 bytes, no inverters active.”-error. The 56 occurrences you have with connection refused are really a lot! Is your IP-setting correct? Your router should be able to assign a fixed IP-address based on the MAC address of the ECU-R.

For socket procedures there is a best practices to be found. Socket programming needs to be robust to minimize errors or to recover from them. I’m more used to event driven receive methods don’t know if this is also possible in Python but I’ve seen a while-loop to wait for and complete data reception. Also if sockets where not destroyed correctly connection may be refused. Therefor investigation is needed on the use of sock.shutdown versus sock.close

what i read on socket close , is not really required, but maybe good in case of exceptions to call it anyway. If indeed a socket remains open on ECU, connection refused would indeed not clear automatically (as it did with me). I installed last version now, expect indeed improved stability :slight_smile:

I’ve been running my new version for about 16 hours, including overnight. Here’s what I got in the logs:

2021-01-05 02:32:44 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: [Errno 111] Connection refused)
2021-01-05 02:33:44 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: invalid literal for int() with base 16: b'')
2021-01-05 03:03:02 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: [Errno 113] Host is unreachable)
2021-01-05 08:23:06 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: invalid literal for int() with base 10: b'')

I can probably fix the invalid literal int error stlil, but it’s caused by bad data at some point - i’ll have to look into it a little more. The connection refused I think is something we will have to deal with due to the design of the ECU-R device.

FYI the code is doing both a socket shutdown and close. This would not be an issue with the connection refused errors, since the socket didn’t even open. I do almost all of the querying of the ecu before parsing anything, so it should be closed almost every time, even if an exception is thrown. I can think of one corner case, that I’ll work on.

Good results at daylight today (installed the update before sunrise this morning).

2021-01-05 13:00:26 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: invalid literal for int() with base 10: b'')
2021-01-05 14:16:45 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: invalid literal for int() with base 10: b'')
2021-01-05 15:17:19 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: invalid literal for int() with base 10: b'')
2021-01-05 16:28:39 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: invalid literal for int() with base 10: b'')

I agree on connection refused, all is within acceptable range with these error numbers. Would be nice if the literal can be fixed though. Maybe there is still data available, tried to enlarge buffers? I’ve now set it to 4K. Oh, and haven’t seen any entity errors on the dashboard today!

Edit: The 4K buffer is a wild guess and works fine but I’ve found this page: https://code.activestate.com/recipes/408859/ where I think they have methods to work around blocked I/O. Also comments are placed where an advise is given to use a short delay between send statements and the use of “\n” being an end marker (which is available in the data from the ECU-R after END). Maybe this will help reduce errors? I would then use “END\n” as an end marker to uniquely identify the end marker from other data.

I can add a delay between statements, and I’ll look into refactoring the send/recv parts of the code, as it’s certainly not the best.

I have some debugging code in my instance right now. I’m just waiting for the int error to happen to capture the debug output to see the full hex string of the data it’s parsing. Once I have that I should be able to fully understand what’s happening.

So far it hasn’t happened yet, but I’ll keep an eye on it overnight and into tomorrow.

I changed a bit more in my network, got more idea where i have an issue in cabling probably.

Knowing that and seeing no more connections refused, i think the device being on Wifi connection makes it sometimes unreliable and the process that serves our data pulls doesnt handle connection hiccups very nicely. Nothing we can do about that (as you both said already, so just confirming).

running overnight it seems i only have 2 errors on the literal situation, no others anymore. I think i will retry the service mode in a bit :slight_smile:

Only four “invalid literal” error messages this morning, no connection refused or unreachable host errors.

Good to hear you have the connection refused errors a bit under control Sander. Zigbee is on the same frequency as WiFi so one might expect some interference now and than but the error numbers you had are way above what you normally could expect.

Unfortunately I had a bug in my debug code, so when I finally got an invalid literal, I didn’t get the data I needed. I have that fixed and am waiting for it to happen again.

I did finally get my dashboard how I wanted it. I now have an accurate layout of the panels on the roof. I color coded them so that all the panels on the same inverter are the same color.

I used the bar-card Lovelace component to display the panels

I also used the browser_mod integration to override the action when you click on a panel, and it pulls up a better “more info” screen

Too bad it’s a gloomy rainy day so I can’t see it’s full potential :slight_smile:

1 Like