Any known issues with integrations not reliably being able to add senors on initial add?

I’m seeing that the integration I work on is no longer consistently able to add all of it’s devices and sensors during the initial addition to home assistant. Sometimes it works and everything is present but now frequently 2 (of 4) devices and some of the sensors on the one of the present devices are MIA until home assistant is restarted.

Comparing debug log output before/after I cannot see any obvious difference in behavior. I see init\async_setup_entry() called in both cases with all of the expected platform initialization and sensors being added that should happen. In the bad case the sensors for several of the platforms are MIA.

Then the 2 extra devices get detected after querying the main device and I execute:
await self.hass.config_entries.async_forward_entry_unload(self.config_entry, Platform.SENSOR)
await self.hass.config_entries.async_forward_entry_setup(self.config_entry, Platform.SENSOR)
in both the good and bad case. In the good case that results in the 2 extra devices appearing in in the list of found devices in the config flow. In the bad case they do not.

In the latest repro the end to end time for the enumeration a 100-200ms faster than the good case so it doesn’t seem to be a race condition due to slow enumeration.

Looking at the latest repro and comparing the end set of sensors (on the two devices that made it), it’s clear that in the middle of init\async_setup_entry() while executing and adding the platforms that the integration supports it just stops adding sensors despite the calls to async_add_entities occurring as normal.

A couple of months back I had never seen this occur while now I can repro it about 50% of the time - so it’s not a rare occurrence.

At this point I think this probably isn’t a subtle bug in the integration but looks more like a regression in the core home assistant code. Anyone else seen this problem?

I have never encountered this myself but in your case I’d track this by progressively adding new debugging lines upstream in the code.

I’ve added debugging for every initialization method in my integration but in both good/bad cases everything executes the same. Are you suggesting I add debug logging to the HA code as well to try and narrow down where the behavior diverges?

Sure, do that too if needed. Also, do interactive debugging. A few well placed breakpoints can finally point you to the bug.

Interactive debugging is likely going to be a bust since I can’t reproduce the failure on demand and I have little to no current knowledge of HA internal code flow currently so I don’t know yet where sensors I provide should be getting added. Hence the logging to try and narrow down where the problem might be.

For example, place some breakpoints that trigger only if the buggy condition occurs. Make sure that you set the project to stop all threads when any breakpoint hits. When one triggers you can move back in the stack inspecting stuff. Although in asynchronous code like HA the current stack only has so much execution flow.