Tuya checksum errors after a while

I have issues with a Tuya devices running ESPHome.
SPM02-D2TW this is the WIFI model.
This device runs on a bk72xx CB2S.

I have no issues installing ESPHome on the device and it’s reporting correctly.
Until it randomly does’t.
After running for a while, sometimes up to two weeks without issues it starts to log invalid message checksum and nothing is geting updated/reported to Home Assistant.

[08:57:12][W][tuya:119]: Tuya Received invalid message checksum 07!=B5
[08:57:12][W][tuya:119]: Tuya Received invalid message checksum 00!=26
[08:57:12][W][tuya:119]: Tuya Received invalid message checksum 07!=88
[08:57:12][W][tuya:119]: Tuya Received invalid message checksum FD!=EA

I have added the reset button into ESPHome yaml so that I can restart it remotely

button:
  - platform: restart
    name: "Restart"

And restarting with this button will “fix” the issue until it starts reporting checksum errors again.
Also after reboot I will get the total energy that the device has logged while the ESPHome reported checksum errors.
That to me says that the Tuya MCU still does its thing.
And my energy dashboard gets messed up.
I do get the correct total energy used but it looks like I had a lot energy usage the hour I restarted the device. which sucks.

At the moment, I just added a automation to restart the device once a day, to avoid this, but would like to have a correct fix for this.

I feel like this is some type of memory leakage on ESPHome.

I tried some logger settings, where I set level to WARN
And also disable logger with baud_rate: 0 as the Tuya MCU uses the uart, maybe it could be issues there. but no positive result from those changes.

bk72xx:
  board: cb2s

logger:
  baud_rate: 0
  level: WARN

captive_portal:

uart:
  tx_pin: P11
  rx_pin: P10
  baud_rate: 9600

button:
  - platform: restart
    name: "Restart"

tuya:

sensor:
  - platform: wifi_signal
    name: "Signal Strength"
    update_interval: 60s
    internal: true
  - platform: tuya
    name: "Total Energy"
    sensor_datapoint: 1
    unit_of_measurement: "kWh"
    device_class: "energy"
    state_class: "total"
    accuracy_decimals: 2
    filters:
      - multiply: 0.01
  - platform: tuya
    name: "Frequency"
    sensor_datapoint: 101
    unit_of_measurement: "Hz"
    device_class: "frequency"
    state_class: "measurement"
    accuracy_decimals: 2
    filters:
      - multiply: 0.01
  - platform: tuya
    name: "L1 Voltage"
    sensor_datapoint: 102
    unit_of_measurement: "V"
    device_class: "voltage"
    state_class: "measurement"
    accuracy_decimals: 1
    filters:
      - multiply: 0.1
  - platform: tuya
    name: "L2 Voltage"
    sensor_datapoint: 105
    unit_of_measurement: "V"
    device_class: "voltage"
    state_class: "measurement"
    accuracy_decimals: 1
    filters:
      - multiply: 0.1
  - platform: tuya
    name: "L3 Voltage"
    sensor_datapoint: 108
    unit_of_measurement: "V"
    device_class: "voltage"
    state_class: "measurement"
    accuracy_decimals: 1
    filters:
      - multiply: 0.1
  - platform: tuya
    name: "N Current"
    sensor_datapoint: 2
    unit_of_measurement: "A"
    device_class: "current"
    state_class: "measurement"
    accuracy_decimals: 2
  - platform: tuya
    # this is powerfactor ?
    name: "Power Factor"
    sensor_datapoint: 15
    device_class: "power_factor"
    state_class: "measurement"
    accuracy_decimals: 2
    filters:
      - multiply: 0.01
  - platform: tuya
    name: "L1 Current"
    sensor_datapoint: 103
    unit_of_measurement: "A"
    device_class: "current"
    state_class: "measurement"
    accuracy_decimals: 3
    filters:
      - multiply: 0.001
  - platform: tuya
    name: "L2 Current"
    sensor_datapoint: 106
    unit_of_measurement: "A"
    device_class: "current"
    state_class: "measurement"
    accuracy_decimals: 3
    filters:
      - multiply: 0.001
  - platform: tuya
    name: "L3 Current"
    sensor_datapoint: 109
    unit_of_measurement: "A"
    device_class: "current"
    state_class: "measurement"
    accuracy_decimals: 3
    filters:
      - multiply: 0.001
  - platform: tuya
    name: "L1 Power"
    sensor_datapoint: 104
    unit_of_measurement: "W"
    device_class: "power"
    state_class: "measurement"
  - platform: tuya
    name: "L2 Power"
    sensor_datapoint: 107
    unit_of_measurement: "W"
    device_class: "power"
    state_class: "measurement"
  - platform: tuya
    name: "L3 Power"
    sensor_datapoint: 110
    unit_of_measurement: "W"
    device_class: "power"
    state_class: "measurement"
  - platform: tuya
    name: "Total Power"
    sensor_datapoint: 111
    unit_of_measurement: "W"
    device_class: "power"
    state_class: "measurement"
    accuracy_decimals: 1

I have a similar problem with an at4pw meter: although the data points are correct, the power and power factor sensors do not work but every now and then they transmit the correct data to homeassistant and then return to being unavailable.
In the logs I see errors like “Tuya Received invalid message checksum 08!=87”. Resetting doesn’t change things, as do other uart speeds other than 115200 and different values ​​of rx_buffer_size. It’s strange because with openbeken this problem didn’t exist. I believe there is some problem with the Tuya MCU component.

You appear to have this issue more or less constant, while for me this is growing into this issue.
The “band-aid” with an reset automation appears to make the power meter report data all the time as it probably never reaches some over usage of memory.

What I’m thinking is that maybe tuya MCU sends to many packages in the same time and ESPHome can’t differentiate the packages received and gets the packages out of order, then the checksum check fails.
Or the packages received gets fragmented in memory when the memory leak/usage gets to large…

I wonder if only handling 1-2 datapoints would work better then all of them?
If that is the case, the memory leaking theory is probably more plausible, but might just extend the runtime until memory fills up.

I only have this issue with the Zemismart SPM02-D2TW.
But I don’t have other BK72xx devices running in “production” either.

I’ve had the same issue for a year or so with my solar hot water controller, and solved it the same way using an automation to trigger a reset if the data goes stale for more than an hour. The errors are logged every few minutes but don’t cause much issue in practice, depending on which data point is not being read correctly. It changes over time though, so sometimes temperature sensors can no longer be read and I get anomalies on the graphs.

It looks like a problem in LibreTiny as identical config works perfectly on an ESP32 module in the same unit, which I’d connected temporarily while ordering a module that had a compatible pinout with the original PCB.

I’m using a WB3S board as it’s one of the few modules available that can be directly soldered to my device without modifying the PCB layout (the original module was Realtek based). I don’t really want to desolder it again as the pads get quite weak, and the serial issue is more of an annoyance than a complete show-stopper.

This is still an issue with the ESPHome 2024.12.2 release.

Forgot to mention that OpenBeken works fine on the same module, so this is definitely an issue specific to LibreTiny.

I have seen multiple cb2s module failures, the interesting part is even when I use an esp-02 in place of the cb2s I still see similar module failures. The only correlation I can see so far is heat. The failures are all with Aubess Mini smart switches and they are typically enclosed in a receptacle box. I have also observed other related failures of the cb2s. I have replaced more than 5 of the beken bk72xx modules with esp-02’s. most are fine after that but some are repeat offenders. In some cases only power cycling the module recovers it operationally and at some point that also fails. Seems to be a flash quality problem.

I bought another different power meter, it too is running a CB2S module.
I have been testing on it.
This one seems to be more stable but again, once in a while within a few days, it stops reporting anything for 5-10 minutes then continues as usual.
It’s different to the SPM02-D2TW device as it starts to report again by its self while the SPM02-D2TW device a restart is needed…