Constant crashing [resolved]

Clean install, brand new pi, brand new sd card, rpi64 image (latest) brand new pi 4, pi, brand new power supply, brand new aeotec stick - followed Aeotec instructions to the letter - added two z wave plugs - now just CONSTANT crashing cannot add any zwave device inclusion mode never finds anything. When I say constant crashing I mean it’s usually not even up long enough to troubleshoot maybe 30 minutes if lucky. Spent 2 days troubleshooting and trying to google errors. I’m not exactly sure where to post errors or exactly which logs to include so apologies if wrong place. Greatly appreciate any help.

will pay bounty for successful root cause resolution

the only error in the logs (occurs fairly often but not on regular schedule)

21-04-21 01:54:00 ERROR (MainThread) [supervisor.api.ingress] Ingress error: 400, message=‘Invalid response status’, url=URL('http://172.30.33.0:8099/socket.io/?EIO=4&transport=websocket&sid=

I’m having the exact same issue. i’ve reinstalled multiple times. thought it was mt hardware so i picked up a home assistant blue, same thing. i’ve installed every way you can (pi, docker, VM, Blue) and get the same problem

Thanks for responding. I’m sorry you’re having this problem but glad it’s not just me - do you have errors in your core log?

What about errors in your system log and host console (connect a monitor for this last one).

there were no notable errors in the console log - there are some errors in the core log - hard to know if they’re the cause

here’s the one at the tail of that log

2021-04-21 01:44:52 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 131, in async_init
    result = await self._async_handle_step(
  File "/usr/src/homeassistant/homeassistant/data_entry_flow.py", line 214, in _async_handle_step
    result: dict = await getattr(flow, method)(user_input)
  File "/usr/src/homeassistant/homeassistant/components/ipp/config_flow.py", line 125, in async_step_zeroconf
    info = await validate_input(self.hass, self.discovery_info)
  File "/usr/src/homeassistant/homeassistant/components/ipp/config_flow.py", line 49, in validate_input
    printer = await ipp.printer()
  File "/usr/local/lib/python3.8/site-packages/pyipp/ipp.py", line 208, in printer
    response_data = await self.execute(
  File "/usr/local/lib/python3.8/site-packages/pyipp/ipp.py", line 177, in execute
    response = await self._request(data=message)
  File "/usr/local/lib/python3.8/site-packages/pyipp/ipp.py", line 146, in _request
    return await response.read()
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 1032, in read
    self._body = await self.content.read()
  File "/usr/local/lib/python3.8/site-packages/aiohttp/streams.py", line 370, in read
    block = await self.readany()
  File "/usr/local/lib/python3.8/site-packages/aiohttp/streams.py", line 392, in readany
    await self._wait("readany")
  File "/usr/local/lib/python3.8/site-packages/aiohttp/streams.py", line 306, in _wait
    await waiter
  File "/usr/local/lib/python3.8/site-packages/aiohttp/helpers.py", line 656, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

could you please let me know if you have any success with anything. Have you tried joining the beta channel?

You may be not alone but still the majority of users (i.e. me) have rock solid systems while running configurations like yours or much more complex.
So my recommendation is the usual way to nail down problems in complex IT scenarios:

  1. Strip down to minimum system and see if problem persists. Remove Aeotec and any other hardware and integration that you installed. When problem resolves, build up again step by step until you find the culprit. Sometimes it’s the interaction of multiple factors that is needed to reproduce the problem.

  2. If that does not solve the problem start to replace the components of your remaining minimum system. Start with the cheapest (probably the SD card). Replace part by part including the power supply and the Pi itself.

  3. If problem still persists check your environment. Are there any intense radio or magnetic sources close to your installation that might induce surges into your system? Are there heavy spikes or outages on your power grid? Is there a powerful antenna nearby? An overhead power line over your house? A Wifi access point, microwave oven, a motor that is switched regularly?

If these instructions lead to solution of your problem please donate an adequate bounty to Nabu Casa, Wikipedia, Avaaz or the like. ;o)

1 Like

Hi Jorg

Thanks for the reply
Some background - i’m an engineer - i have a lifetime of experience troubleshooting and have spent dozens of hours trying to find the root cause of this problem without success. I have four other running instances of HA without problems. I have several years of experience with HA. This does not appear to be a hardware problem but we’ve swapped HW already to test to no avail. We’d expect to find logs of which would either eliminate one system or implicate another. The system is bare bones with no unnecessary hardware, sw, add-on etc. This is the second ground-up build in which we’ve had this problem (including on replacement hardware) There is no EM or RF interference nearby. I’m an EE and FCC licensed operator - I’m confident it’s not an external interference. We’re happy to pay a substantial bounty to anyone who can identify the problem even if its only so we can submit a bug report

Do any of them have the same config as the system with the crashing issue?

Have you tried restoring a snapshot from one of the stable systems to the one with crashing issues?

forgot to add, in case it’s helpful, everything worked perfectly at first. Problems begin in each case after first reboot.

Have you tried the 32 bit install? I seem to remember early problems with 64 and not sure what the current state is. I don’t run on a RPi (and would suggest that nobody does past the “play” stage) so can’t provide much more help with it. Just noticed that you mentioned the 64bit OS and though it would at least make for an interesting data point.

Terry

the other ones are running in containers on host OS so not exactly the same. Also some of the other instances use the older Aeotec stick - this one uses the Gen5+ new stick. So I have tried restoring from snapshot from this instance (made when it was working) but it still will fail after a host boot. I have not tried restoring a snapshot from another instance

Hmm no i haven’t tried that - will try that today

Have you tried seeing how long it stays up without the Aeotec stick ?

our other instances all run on NUC but this one is so small we thought a pi4 would be fine. I have lots of pi’s to test but no spare NUCs unfortunately

No, but can try that - will post results - do you suspect a driver issue? Is it possible for a driver to crash core?

That or a power issue. Your power supply may be good for a pi but the stick draws power as well.

Either way, easily eliminated as a cause.

1 Like

The stick draws 40mA in normal operation. I have an official pi4 power supply which is rated at 3A. There is nothing in any other slot. We’ve tried different power supplies but it’s not a power problem because the OS never goes down. Only HA core is crashing - the pi remains responsive throughout each core restart.

So you did not try without the stick?
It’s an easy test.

it’s not at this location. Will try today. Weird that the stick would crash core but will try.