Custom firmware ESPHome-Xiaomi_bslamp2

Hey Maurice,

Great work on all of this. I’m currently trying to build a custom component to interact with the front panel.

On the voltages, wouldn’t it be more meaningful to measure the duty cycle of the GPIO pin over the voltage to the LED? I think just measuring voltage you could run into a few different issues.

Just 2c.

1 Like

Cool that you’re working on the front panel!

Since I’m mainly trying to come up with code to make the LED driving backward compatible, measuring voltages in the old and new situation to see if I do the correct things seemed good enough to me.
For the white temperature light, this worked very well, so I stuck to that method.

I must say, that because of the results that I get for the RGB LEDs (weird graphs requiring more and more measurements to reverse engineer), I am starting to lean towards hooking up another MCU to the LED GPIOs, to do some automatic measurements on duty cycles based on various color inputs. The whole “change light in HA and measure pins by hand”-thing is getting old quickly :smiley:

1 Like

I’m away from home this week but when I was poking around the device it seemed like the while LED’s are driven differently to the RGB LED’s. My probing around suggested that the RGB channels are driven using PWM vs the voltage based driver chips for the white channels.

1 Like

Yes, those are two different beasts alltogether, RGB and white.
It makes for a nice reverse engineering puzzle :-p

Once in a while, my lamp reboots spontaneously. I hooked up the serial console and waited for one of these crashes, and I saw the following on my console output:

[15:02:43]Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
[15:02:43]Core 0 register dump:
[15:02:43]PC      : 0x400d6cff  PS      : 0x00060130  A0      : 0x80163618  A1      : 0x3ffb1f00  
[15:02:43]A2      : 0x3ffb8b90  A3      : 0x3ffd3658  A4      : 0x3ffd365c  A5      : 0x00008cc0  
[15:02:43]A6      : 0x00050b23  A7      : 0x3ffb8058  A8      : 0xbaad5678  A9      : 0x3ffb1ea0  
[15:02:43]A10     : 0x00000005  A11     : 0x3f40305c  A12     : 0x00000043  A13     : 0x3f4013a5  
[15:02:43]A14     : 0x3ffd3364  A15     : 0x00060b23  SAR     : 0x0000000a  EXCCAUSE: 0x0000001c  
[15:02:43]EXCVADDR: 0xbaad569c  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  
[15:02:43]
[15:02:43]Backtrace: 0x400d6cff:0x3ffb1f00 0x40163615:0x3ffb1f20 0x401636a9:0x3ffb1f40 0x400de3a5:0x3ffb1f60 0x400e0eb2:0x3ffb1f90 0x400eda41:0x3ffb1fb0 0x4008926d:0x3ffb1fd0
[15:02:44]
[15:02:44]Rebooting...

So a LoadProhibited exception while reading from memory location 0x3ffd3364.
A new puzzle to chew on. I will check my code to see if I might be doing bad memory stuff. I surely hope so, otherwise we might be looking at a bug in the underlying platform code.

Update:

Hrm, not my code:

0x400d6cff: esphome::api::APIServer::loop() at src/esphome/components/api/api_server.cpp line 67
0x40163615: esphome::Component::call_loop() at src/esphome/core/component.cpp line 61
0x401636a9: esphome::Component::call() at src/esphome/core/component.cpp line 82
0x400de3a5: esphome::Application::loop() at src/esphome/core/application.cpp line 72
0x400e0eb2: loop() at src/main.cpp line 265
0x400eda41: loopTask(void*) at /root/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp line 19
0x4008926d: vPortTaskWrapper at /home/paul/src/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143

Update:

I see various reboots here, quite annoying. It’s not directly clear where the issues are triggered, so I took back a step and now will run my lamp with a mostly empty esphome configuration. If that doesn’t show the crashes, I will add stuff one step at a time to see where things go haywire.

3 Likes

I have some hopes on new framework releases to get rid of some memory management issues. The crashes that I see all seem related to invalid memory access. I’m currently working on getting everything compiled, because the single core processor is a bit of a challenge.

One thing that I noticed is that there seem to be two different issues:

  • The connection of Home Assistant to the ESP32 is dropped quite often, because the ESP32 reports that the final TCP ACK for an ESPHome ping sequence is not received by the ESP32 (and kills the connection). In tcpdump I do see an ack though, so it;'s either not transported to the ESP32, or it misses out on the packet. Right after the ESP32 sends the TCP FIN sequence, Home Assistant will always successfully reconnect.

  • When I have the logging screen open from the ESPHome web terminal, then I do see ugly reboots of the device. When I don’t have any logging channel attached, the device seems stable. To test for reboots while not logging, I added the uptime sensor platform to the config. The below graph shows the uptime. Twice I had the logging console open, and those two times show up clearly in the uptime.

afbeelding

So still working on drilling this down. If I can get to the point where the device does not reboot, but loses its connection with Home Assistant once in a while, I am willing to accept that as a feasible setup. The connection loss in Home Assistant is irritating from a development standpoint, but it is not hugely disturbing for daily use.

Update:

I managed to hack and compile the firmware using [email protected] and [email protected], but unfortunately that did not change the disconnect and reboot behavior of the device :-1: I just saw a few disconnects followed by a reboot. The reboot was still happening at src/esphome/components/api/api_server.cpp line 67 (which isn’t a surprise, since the esphome api code was not changed in the process, I mainly hoped for less disconnects triggering less reboots as a result).

Update:

Hello from deep down in the firmware code. I am in the process of debugging the recurring disconnects and I might have found something interesting. With extra debugging added, I see the following pattern:

SENSOR STATE IS PUBLISHED
[00:56:17][D][sensor:099]: 'Bedside Lamp Office Uptime': Sending state 462.02200 s with 0 decimals of accuracy
[00:56:17][VV][api.service:105]: send_sensor_state_response: SensorStateResponse {
[00:56:17]  key: 1910806034
[00:56:17]  state: 462.022
[00:56:17]  missing_state: NO
[00:56:17]}

A TCP ACK IS REGISTERED, SETTING BUSY TO FALSE
[00:56:17]-S: 0x3ffd35c8
[00:56:17]_sent() sets busy = false: 0x3ffd35c8
[00:56:17]_sent() end of function, busy = 0: 0x3ffd35c8

A TCP SEND IS REGISTERED SETTING BUSY TO TRUE (which  means: waiting for a TCP ACK)
[00:56:17]send() sets busy = 1, _pcb_sent_at=462060: 0x3ffd35c8

POLLING NOW CONTINUOUSLY SEES BUSY = TRUE
[00:56:17]-P: 0x3ffd35c8
[00:56:17][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=462223, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:17]-P: 0x3ffd35c8
[00:56:17][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=462723, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:18]-P: 0x3ffd35c8
[00:56:18][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=463223, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:18]-P: 0x3ffd35c8
[00:56:18][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=463723, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:19]-P: 0x3ffd35c8
[00:56:19][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=464223, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:19]-P: 0x3ffd35c8
[00:56:19][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=464723, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:20]-P: 0x3ffd35c8
[00:56:20][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=465223, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:20]-P: 0x3ffd35c8
[00:56:20][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=465723, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:21]-P: 0x3ffd35c8
[00:56:21][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=466223, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:21]-P: 0x3ffd35c8
[00:56:21][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=466723, _pcb_sent_at=462060, _rx_last_packet=462056
[00:56:22]-P: 0x3ffd35c8
[00:56:22][W][AsyncTCP.cpp:963] _poll(): _pcb_busy=1, now=467223, _pcb_sent_at=462060, _rx_last_packet=462056

UNTIL IT DECIDES IT IS A TIMEOUT
[00:56:22]_poll() sets busy = false after ack timeout: 0x3ffd35c8
[00:56:22][W][AsyncTCP.cpp:969] _poll(): Houston, we have an ack timeout 4
[00:56:22][V][api.connection:696]: Error: Disconnecting Home Assistant 2021.3.2 (192.168.100.135)

The weird thing in the pattern here, is that there is a send in the log that is coming AFTER the ACK that to me looks like the ACK for the send call. If the code handles these two events in the wrong order here (sometimes), then the timeout can be explained very easily.

Some signs that this might indeed be a code issue:

  • The timeout is always on this step of the data exchange.
  • The connection seems to work just fine. I do see the correct TCP flows when scanning the network at the Home Assistant side.
  • When I hack the code to simply ignore the ACK timeouts, then the socket connection with Home Assistant is kept as-is and it is still very much functional. I get update and can change light settings, without running into issues. So disabling the timeout check, actually makes my device more stable :wink:
  • It is also likely not Home Assistant that is breaking the network connection here, because the same kind of connection drops can be observed when viewing logging from the ESPHome web console. This means I have two endpoints that show the same behavior, making the network stack in the ESP32 more likely the culprit.

Unfortunately, one needs to sleep at some point, so I’ll continue this bug hunt tomorrow.

Update:

With some more debugging I am now 100% sure that there is a race condition going on, and that the ACK is processed before the AsyncTCP code has decided that it wants to process an ACK.

I color coded the console output and what you see is the red text, which is interrupted mid sentence by the green text. I also added timestamps (now = …) and those proof 100% that this is a race condition. The earlier logline in green has a later timestamp than the last bit of the interrupted red log line.

Going to cook up a fix and see if the upstream project can incorporate it!

Update: Issue created with AsyncTCP upstream

I created an issue ticket for AsyncTCP.
And I creaed a fix in a forked repo, which I described in another thread.

3 Likes

Only work and no joy for me nowadays. Illness in the family doesn’t really contribute on the positive side either, but I do follow your work with interest whenever time allows it, mmakaay. Could your latest finding possibly explain the erratic disconnection errors that Yeelight is famous for?

Sorry to hear that, all the best for you and your family.

The disconnect issue that I found could be the issue, but only if the standard firmware makes use of the same AsyncTCP library code. Maybe you could check your copy of the firmware, to see if your can find the AsyncTCP string in there (it’s a long shot, but if it is visible as a string, chances are high that they are using it).

My modified version of the library is performing great. Here’s the graph of my lamp uptime and connection for the last 8 hours:

afbeelding

Only one recent disconnect, but that one might slightly have to do with me unplugging the device and moving it upstairs to my work desk there.
Having tackled this issue also gets rid of the spontaneous reboot issue. Not because the underlying issue is fixed, but because it is not triggered anymore. I’ll still look at fixing that reboot issue, but right now getting on with the actual implementation for the lamp’s RGB channels is a lot more tempting.

Thanks, mmakaay. When life serves you lemons, try squeezing it in your eyes to get the full experience :wink:

Searching with ‘Strings’ gives nothing when using “AsyncTCP”.
I did however split the bin-file into partitions using ‘esp32_image_parser’ giving me nine entries; nvs, otadata, phy_init, miio_fw1, miio_fw2, test, mfi_p, factory_nvs, core dump and minvs.
My plan was to have a closer look at these using Ghidra that has an ESP32-extension that disassembles and decompiles the files, but I am at very best a novice when it comes to coding. It looks like this probably fits your skills a lot better than mine.

If you can put up the miio_fw* files for download somewhere, I can take a look at them.
I still have a lamp using the original firmware, which I will extract before modding that one, but that one first needs some soldering to get there.

Funny find of the day: while measuring the GPIO outputs for RGB white (R=255, G=255, B=255) over a brightness range of 1 - 255, I noticed something unexpected in the measurements:

There is a weird little bump in the graph!

I measured these BTW, to find an R:G:B ratio to work with for the LEDs.
Below you can see the analysis of this.

I found a 1.0 : 2.10 : 0.78 ratio. It’s not fully static across the whole brightness spectrum as you can see in the graph (especially the green channel does rise significantly over brightnesses 1 - 100), but I will simply assume that keeping it static in my code will work.

I’ve never used this device, have no intention of using it, but wanted to stop by and say - great job!

I love the maker community and what people are able to do when they put their minds to it.

5 Likes

Peculiar findings, mmakaay. One would believe that driving a led was a bit more straightforward. I was wrong when remembering that GPIO4 was linked to the “Xiaomi test function” It is GPIO25. I suppose it is linked to the entry “test” when looking at the original firmware, and I wonder what it does test. If it cycles through the different default values of the lamp it could be easier measuring.

Ah, GPIO 25 is the one that I found to be connected, but which I could not relate it to base functionality of the lamp. It’s pulled up bij default, so maybe pulling it to ground might be the way to trigger the test partition. I will try that.

It wouldn’t be to useful for measuring though. I can easily automate the lamp settings, it’s the reading that I do manually. I did solder some wires to the RGB pins and hooked those up to my logic analyzer. That maken is very easy to get a pwm reading. However, from there on it is still manual work.
I am requesting the urge to build my own logic analyzer :laughing:

Just for fun, a picture of Project Frankenstein :laughing:

1 Like

“Beware; for I am fearless, and therefore powerful.” -Frankenstein- :grin::+1:t2:

1 Like

The firmware is coming along nicely. The main thing I want to fix for RGB, is the balance of the white color in the middle of the RGB color selection. There’s a hint of red in there, which can easily be aproved upon.
I took a quick video to show the responsiveness of this ESPHome-based firmware. I’m very happy with it. The connection stability fix works like a charm and the lamp has been working solid for the last few days.

Upcoming: a bit more of RGB hacking and then I’ll continue with the front touch panel integration.

5 Likes

That’s just marvellous. I’m tipping my hat to your work and dedication, mmakaay! I had a quick look at the Yeelight forum here, and there are quite a few irritated people over there. Yeelight just came up with a version 43 - special edition also. Supposedly gonna fix LAN control option from resetting…? There is something smelling a bit off of the whole thing :thinking:
When you bring this together I would not be surprised if you’ll be one of the most popular people around. :grin:

2 Likes

I will bring this together for sure, being popular is just a nice side-effect :smile:

I just finished tracking and fixing a bug with transitioning the light from ON to OFF. When turning the light on, I could get a nice transitioning from of to the required color+brightness. However, when turning the light off, it simply stayed on during the transitioning period, to eventually turn off abruptly. Very annoying, because it mainly feels as an unresponsive lamp.

I found that this was not an issue with my code, but something in the ESPHome light component code.
I filed a pull request and hope that it will be accepted:

Note: if this code is not accepted or if accepting it takes too long, then I have a work-around available that archieves the same thing from within my code.

Update:

In my codebase, I implemented a work-around for the reported issue, which automatically triggers as long as the big is not fixed in the upstream code. This means that the RGB transitions now all look good (between colors, brightnesses and ON/OFF switching) and will remain to do so when this bug gets fixed.

Here’s a new video of the lamp, which shows the new transitioning in action:

Note that these transitions are optional. If you prefer to have no transitions (like shown in my previous video a few posts up), you can configure this by setting the transition time to 0s.

1 Like

I moved the repository code to a proper ESPHome component format. This makes installation a lot easier and cleaner. It also allows me to make the configuration a clean yaml config, instead of having to inject objects using an ugly lambda function. Here’s what the component configuration now looks like:

light:
  - platform: yeelight_bs2
    name: ${friendly_name} Custom Light
    red: led_r
    green: led_g
    blue: led_b
    white: led_w
    master1: master1
    master2: master2
    default_transition_length: ${transition_length}
    effects:
      - random:
          name: "Slow Random"
          transition_length: 30s
          update_interval: 30s
      - random:
          name: "Fast Random"
          transition_length: 3s
          update_interval: 4s

Full details are in the updated repo.

As you can see, I still need component definitions for the various GPIOs. I want to modify the component to make these bulit-in defaults (I presume these pinouts do not differ between devices) to make the configuration even cleaner.