Packet Transport - Rolling Code Problems

Hi folks,
Trying to get data from a set of devices to go to a specific device that can then respond to this input.

If at all relevant, receiver (jake) is an ESP8266. Senders are all ESP32-C3s.

How it’s currently set up; on (one of) the senders (they’re all in effect set up the same):

udp:
  id: udp_to_jake
  addresses: 
    - 192.168.2.62
  port: 18511

packet_transport:
  - platform: udp
    udp_id: udp_to_jake
    update_interval: 1000ms
    encryption: !secret udp_encryption_4
    rolling_code_enable: True
    binary_sensors: 
      - id: ld2450_presence
        broadcast_id: mrcharrington_presence

and on reciever:

udp:
  - id: udp_listen
    port:
      listen_port: 18511
      broadcast_port: 18511

packet_transport:
  - platform: udp
    udp_id: udp_listen
    id: packet_in
    ping_pong_enable: True
    providers:
      - name: mrcharrington
        encryption: !secret udp_encryption_4
      - name: obrian
        encryption: !secret udp_encryption_5
      - name: winston
        encryption: !secret udp_encryption_6

All is good. It works. For a bit. Until something seems to go wrong with the rolling code (a short while in)

22:23:26	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:0000085B is old

22:23:27	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:0000085C is old

22:23:28	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:0000085D is old

22:23:29	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:0000085E is old

22:23:30	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:0000085F is old

22:23:30	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:00000860 is old

22:23:31	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:00000861 is old

22:23:32	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:00000862 is old

22:23:33	[W]	[packet_transport:386]	

Rolling code for mrcharrington 00000003:00000863 is old

(etc. etc.)

The data then seems to fail to transfer over.

What I’ve done so far:

  • Taken jake back to a minimal firmware (ie. to get the old rolling code out). Removed all components, left OTA and Wi-Fi only. Then reflashed (thought was this would clear out the advanced rolling code if that was the problem).
  • Removed and re-added the rolling_code_enable line from the senders (works fine with it out).

My first thought was that somewhere along the line I’d re-flashed the sender after the receiver and it hadn’t retained the old code. But I’d expect the first action to sort that.

Not sure exactly what the answer is here. I guess the security isn’t essential, but it’s a bizarre problem that is frustrating me somewhat.

Any thoughts appreciated.

First reboot the receiver - it doesn’t save rolling codes in flash, so will take the first one it sees after a reboot as the starting point.

What flash save interval do you have in the sender?

If you set verbose logging in the receiver you will get more information in the logs about what codes have been seen.

First I suggest you add this buttons to your YAML on both the receiver and the sender.

button:
  - platform: restart
    name: "restart"
  - platform: factory_reset
    name: Restart with Factory Default Settings
    id: onkyo_factory_reset
    entity_category: "diagnostic"

The restart is just a simple reboot, but it can be done remotely then.
The factory reset however is a reboot, but with a wipe of the NVRAM, which makes sure you are running with a virgin setup when you bug hunt.
Factory reset both receiver and sender and then test.

indicate a timing issue, so if you are not using some time service, then try to add one to both receiver and sender.

Not a timing issue in the sense of clock time. The rolling code is incremented by the sender on each packet, that message suggests the sender’s code went backwards, which would imply an issue saving to flash across a reboot.

Timing service may have been part of the problem (although re-reading above, possibly not). Adding:

time:
  - platform: homeassistant

To both receiver and sender and they seemed to behave for longer. After a few minutes, old rolling codes started to become a problem again (so maybe not). Home Assistant is running on a standard x86 computer, so has a real clock etc.

As for the flash save interval, this is not explicitly set in any of the devices. The 8266 (receiver) has:

esp8266:
  board: esp_wroom_02
  restore_from_flash: True

Set. This is so that on a reboot switches go back to where they were before (there’s a relay attached; this is attached to an alarm panel and I don’t really want it going off because the ESP decided to restart).

I think I may have fixed it:

Step 1
I did decrease the frequency of pushes and that seemed to be working (5s instead of 1000ms/1s). The only devices reporting old rolling codes initially were ones that I didn’t change to 5s.

As it’s progressive like, I wouldn’t have expected this to be the problem, as surely if the ESP8266 can’t process an input, the next input will be still have a higher value, unless it’s trying to decode old packets after reading the new packets and can’t keep up(?).

However, after a few hours running, we had bad rolling codes again and the sensors had gone unknown for these.

Meanwhile the logs weren’t massively happy with being verbose (some API disconnects) but this seems less frequent with the 5s pushing time. I guess ESP8266s possibly don’t have the compute power for what I’m asking for (three devices acting as provider).

Step 2
After step 1 failed, I tried something slightly different. Rather than having three devices feed into the ESP8266 I opted to use one of the ESP32-C3s as an intermediary. It takes in the data from the other two over packet transport and then submits all three to the ESP8266.

This seemed to fix it. Possibly the problem was I was overwhelming the ESP8266.

Thanks for the input, having the factory reset button was very helpful to assure that I knew that the devices were clean.

Could it be because you have multiple senders and they somehow use the same rolling code table on the receiver?

Not entirely sure, but the multiple devices and rolling codes seem to work fine on the ESP32-C3 without any issue.

I think I may have just overwhelmed the ESP8266.

In the end I’ve had to go without encryption as, while with a single device sending the information it wasn’t having the rolling code errors, it did keep disconnecting from the API and periodically rebooting (Exception given as the reason). Heap fragmentation was also hitting the low-60%s.

Did increase the CPU frequency speed to 160MHz on the 8266:

esphome:
  name: jake
  platformio_options: 
    board_build.f_cpu: 160000000L

But problem remained.

The board does have five automations on them which possibly isn’t helping (they shouldn’t have to do anything unless the API is off, but with disconnects etc. and will imagine checking the conditions causes overhead). After removing encryption I upped the frequency of updates to 1s (thinking without encryption this would be fine) but they still seemed to be struggling at 80MHz (API disconnects), so kept it at 160MHz which seems to be stable.

This is a secure Wi-Fi network (WPA2 for 8266/WPA3 for ESP32-C3) and they’re isolated in their own firewall zone so I guess I’ll survive. A little less than ideal.

Perhaps shouldn’t have used an 8266 for this purpose; at the time I wasn’t confident in doing something myself for this context - power in there is 12V not 5V/3.3V; this is a Shelly UNI that can just take that; the Gen 1 devices seemed easier to flash. Possible future upgrade.

This is a bit of a follow-up to the previous issue, so opted to not open a new thread - I didn’t find a solution and just thought something might be a problem with the ESP8266’s (ie. lacking compute power to actually complete the task). This was likely true, but the issue has persisted.

Since the earlier posts I acquired two ESP32-C3 dev boards (one to do the work that jake was supposed to be doing, one to do something different. Using ESPNow this time as this is supposed to be a contingency for if Home Assistant is down so everything still works (in such a case Wi-Fi might be dead also).

Reciever:

  - platform: espnow
    id: packet_in
    peer_address: "[MAC HERE]"
    providers:
      - name: goldstein
        encryption: !secret udp_encryption_goldstein

Provider:

  - platform: espnow
    id: unicast_to_julia
    espnow_id: espnow_component
    peer_address: "[MAC HERE]"
    update_interval: 60s
    encryption: !secret udp_encryption_goldstein
    rolling_code_enable: True
    binary_sensors:
      - id: ld2450_presence
        broadcast_id: goldstein_presence
      - id: ampleforth_presence
        broadcast_id: ampleforth_presence
      - id: mrcharrington_presence
        broadcast_id: mrcharrington_presence
      - id: winston_presence
        broadcast_id: winston_presence
      - id: obrian_presence
        broadcast_id: obrian_presence
      - id: bigbrother_presence
        broadcast_id: bigbrother_presence

However, they are not making friends here. Somewhere along the line Goldstein’s rolling code went out of sync and it’s infinite “rolling code is old errors” on both this ESP another one (and when I tried to divert via a third it also had the problem, likely a legacy of previously being connected together). And I’m almost certain this is not following the documentation (ie. is not stored on receiver device; will wipe on reboot). It is persistent.

In the log the rolling code increases each send as expected.

Attempts made:

  • Factory reset (nope, on boot immediately comes up rejecting the rolling code).
  • Reboot (nope, on boot rejects the rolling code).
  • Reflash (nope, on boot rejects the rolling code)
  • Minimal flash then reflash (nope, on boot rejects the rolling code).
  • Hardwired USB flash (nope, identical problem)
  • Full power disconnection, all pins, USB etc. sat on my desk connected to nothing, for 20 mins (nope, on switch on, rejects the rolling code immediately)
  • Removing and re-adding the packet transport/platform: espnow section (and espnow section) from the sender (nope)

No ESPs have batteries or should have a mystery power source I don’t know about. Waveshare ESP32-C3-Zero-M (had problem as reciever); Seeed Xiao ESP32-C3 (had problem as receiver); and the other devices are Apollo Automation MTR-1s (one of these had the problem when a receiver and another is the sender).

I assume it must be storing it somewhere on the device, which doesn’t match with what the documentation says. This perhaps feels more like a bug than a technical problem?

It’s definitely not saved on the receiver (you can examine the code if you like.)

You don’t by any chance have two devices with the same name sending packets? That would do it.

If you turn on VERBOSE logging in the receiver you will get the MAC address of the sender logged. If you see two different MAC addresses that’s your problem.

I found the problem! I did take a look at the code and (while not a C wizard) insofar as I could make sense I couldn’t see it storing.

So, in the packet_transport: section I had multiple

- platform: espnow
  rolling_code_enable: True
  [...]

- platform: espnow
  rolling_code_enable: True

etc. etc.

For different devices. Some of them had “peer” set for a specific one. Others did not.

I reduced it down to a single instance eg:

  - platform: espnow
    id: unicast_out_to_julia
    espnow_id: espnow_component
    peer_address: [MAC]
    rolling_code_enable: True
    encryption: !secret udp_encryption_goldstein
    binary_sensors:
      - id: ld2450_presence
        broadcast_id: goldstein_presence
      - id: ampleforth_presence
        broadcast_id: ampleforth_presence
      - id: mrcharrington_presence
        broadcast_id: mrcharrington_presence
      - id: winston_presence
        broadcast_id: winston_presence
      - id: obrian_presence
        broadcast_id: obrian_presence
      - id: bigbrother_presence
        broadcast_id: bigbrother_presence

  - platform: espnow
    id: unicast_to_caliente
    espnow_id: espnow_component
    peer_address: [MAC]
    update_interval: 30s
    rolling_code_enable: True
    encryption: !secret udp_encryption_goldstein
    sensors:
      - id: combined_temperature
        broadcast_id: goldstein_temperature

  - platform: espnow
    id: packet_in
    ping_pong_enable: True
    update_interval: 500ms
    encryption: !secret udp_encryption_goldstein
    ping_pong_recycle_time: 1min
    rolling_code_enabled: true
    providers:
      - name: winston
        encryption: !secret udp_encryption_winston
      - name: ampleforth
        encryption: !secret udp_encryption_ampleforth
      - name: mrcharrington
        encryption: !secret udp_encryption_mrcharrington
      - name: bigbrother
        encryption: !secret udp_encryption_bigbrother
      - name: obrian
        encryption: !secret udp_encryption_obrian
    sensors: 
      - id: current_alarm_state_numeric
        broadcast_id: current_alarm_state_numeric
    binary_sensors:
      - id: entrance_timer_running
        broadcast_id: entrance_timer_running
      - id: zone_1_binary
        broadcast_id: zone_1_binary
      - id: zone_2_binary
        broadcast_id: zone_2_binary
      - id: zone_3_binary
        broadcast_id: zone_3_binary
      - id: zone_4_binary
        broadcast_id: zone_4_binary
      - id: zone_5_binary
        broadcast_id: zone_5_binary
      - id: zone_6_binary
        broadcast_id: zone_6_binary

Became:

  - platform: espnow
    id: packet_in
    update_interval: 60s
    encryption: !secret udp_encryption_goldstein
    rolling_code_enable: true
    sensors: 
      - id: current_alarm_state_numeric
        broadcast_id: current_alarm_state_numeric
      - id: combined_temperature
        broadcast_id: goldstein_temperature
    binary_sensors:
      - id: entrance_timer_running
        broadcast_id: entrance_timer_running
      - id: zone_1_binary
        broadcast_id: zone_1_binary
      - id: zone_2_binary
        broadcast_id: zone_2_binary
      - id: zone_3_binary
        broadcast_id: zone_3_binary
      - id: zone_4_binary
        broadcast_id: zone_4_binary
      - id: zone_5_binary
        broadcast_id: zone_5_binary
      - id: zone_6_binary
        broadcast_id: zone_6_binary
      - id: ld2450_presence
        broadcast_id: goldstein_presence
      - id: ld2450_presence
        broadcast_id: goldstein_presence
      - id: ampleforth_presence
        broadcast_id: ampleforth_presence
      - id: mrcharrington_presence
        broadcast_id: mrcharrington_presence
      - id: winston_presence
        broadcast_id: winston_presence
      - id: obrian_presence
        broadcast_id: obrian_presence
      - id: bigbrother_presence
        broadcast_id: bigbrother_presence
    providers: 
      name: julia
      encryption: !secret udp_encryption_goldstein

I also took ping pong out as it seemed to stop updates being instantly pushed when I reduced the update_interval (likely requiring a full 60s cycle or another sensor to update to trigger it). I increased the interval initially on the basis of “maybe this is too fast”, and then noticed without ping it seemed to update anyway. Doesn’t need to send anything out if nothing is changing.

It seems it doesn’t like having multiple entries. Assume the peer_address was possibly being ignored(?), three rolling codes were being maintained and getting muddled (possibly sending out one initially, then future ones were all “old” because they were from the other espnow packet_transport entity)?

I can’t seem to purposefully drag them out of sync now (which is good), so haven’t tested if it recovers, but no issues spotted in 24 hours.

I guess the key rule is perhaps one packet transport per output. UART probably being the exception to this as it would have to be a separate one as each different UART would have a separate id/pins etc. My thought was it was wasteful to broadcast packets to all senders/receivers (as is now happening), but if that’s happening anyway… they’re only going to be able to read things sent by a specifc “provider:” anyway so will likely disregard/not waste time on unreadable packets(?)

Thanks for the guidance :slight_smile: It’s nice to have network harmony.

Good catch. There is a bug then, in that the rolling code is currently managed by each packet_transport instance separately, but they all share the same preference storage. Probably the rolling code should be per-device not per instance.

I’ll figure out a fix, but good to know you are up and running.

There is extra overhead in broadcasting rather than targeting a MAC address because the WiFi layer ignores any packet not addressed to it or the broadcast address, but with an update interval of 60s that’s not a big deal.

Thanks. It’s still working absolutely fine 24 hours later.

Just to query; I’m guessing if I have:

espnow:
  peers:
    - [MAC 1]
    - [MAC 2]
    - [MAC 3]

That will just reach those peers. There’s a warning in the documentation about avoiding broadcasts as it might reach devices not owned by you (which I guess is fine if we’re using encryption/not the end of the world)?

Did accidentally forget to add a device to the list of MACs for a receiver and despite having the encryption key and provider set correctly the transferring sensor showed as “unknown” till I added the provider to the peer list, so may already know the answer, but just want to confirm.

Thanks :slight_smile:

I’ve not used espnow myself, so can’t say but I assume that’s correct. The manual warning about broadcasts is a bit of a red herring - any ESPNow transmission is potentially capturable by anyone whether it’s a broadcast or not, they would just need to spoof the MAC address if it’s not a broadcast.

And yes, if using encryption you are reasonably safe. I still wouldn’t use it to secure a bank, but you should be safe from consumer-grade attacks.