If there is a lot of network congestion over the network that HA shares disk activity it becomes unresponsive even after the congestion clears.
I run HA in a container on a Synology NAS. Its networking is configured to use the host’s (i.e. not a bridge).
If I download a large torrent I get a lot of log entries for HA like this for various device state updates:
2024/07/21 20:57:51,stdout,e[33m2024-07-21 20:57:50.147 WARNING (MainThread) [homeassistant.helpers.entity] Updating state for device_tracker.zenwifi_xxxxxx (<class 'homeassistant.components.nmap_tracker.device_tracker.NmapTrackerEntity'>) took 1.192 seconds. Please create a bug report at https://github.com/home-assistant/core/issues?q=is%3Aopen+is%3Aissue+label%3A%22integration%3A+nmap_tracker%22e[0m
And then the log suddenly stops and I am not able to reach HA from my browser.
Even if the network and disk congestion clears HA remains unavailable and no new log entries are written.
Any advice? (Apart from the obvious )
UPDATE 2024-08-01 to reflect that it is cause by intense disk activity rather than network congestion
It is your NAS that can not handle more connections.
Each connection will reserve some ram for cache and with torrents this runs up fast and eats away your available ram.
Once the ram is gone no new connections can be made until the cache is released again, but because torrents often use UDP, then there is no built in feature in the protocol to close a connection and the torrents generally just lets connections time out, which can take many minutes.
Would I expect to see low RAM reported on the NAS? When I log into the NAS it seems to be working fine and the RAM usage isn’t high. Other apps and services are a bit sluggish due to the congestion but they are all still working. Also they become fast again immediately after the torrent has downloaded whereas HA remains inaccessible until I restart it many hours later.
No idea if it reports it or not.
The other apps usually just have a few always connected connections and their network resources are therefore reserved.
HA open and close many connections all the time.
Yes, but still just one new.
HA creates connections to all the devices and addons/containers/servers and browsers viewing the GUI.
Each addon/container/server that runs on the NAS might make connections too.
Whilst performing some other operations on my NAS that were disk intensive I found the same issue with HA: it becomes unresponsive and then does not recover.
Again, the logs show warnings for device state updates taking longer (seconds instead of microseconds) than they should - and then the log abruptly stops and there is no other activity.
Other apps running on the NAS are fine. They slow down when there is intensive disk activity but then recover when the activity has finished - HA does not.
I have updated the OP title to reflect that it is disk activity, rather than network congestion.
Can anyone offer an explanation or solution? - Thanks
Synology DS220+, Intel Celeron J4025 2-core CPU and 2GB RAM
2 x WD Red Plus 4TB 3.5 SATA 128MB HDD
Not sure what you are after for networking - it’s pretty standard - using a single ethernet port to an Asus router to a VirginMedia cable modem.
Primary apps are qBitTorrent running in a container
Emby media player running as a DSM Package
Home Assistant Core and Zigbee2MQTT running in containers
I get the problem when qBitTorrent gets over 90% disk utilisation for a lengthy period of time (10-15mins).
Your hardware is simply too inferior to run all those containers.
Your CPU is a 2-core CPU with one thread per core, so it will be swapping data a lot when running that many containers.
At the same time your ram amount is pretty low.
HA can run on a RPi 3B with just 1Gb of ram, but that is only for the most basic setup.
You might have 1Gb more, but you also run a NAS, mediaplayer and torrent server at the same time.
You might be able to alleviate the problem a bit with more ram, but only to a certain degree, then your CPU will start to limit you.
Thanks for your suggestions @WallyR, however, I am pretty sure it is the HDD contention. When the problem starts I can log on to the NAS and use Resource Manager to see what the utilisation of resources is and what is using them. RAM and CPU look fine - no more than 80%, however the HDD is hitting 100%.
Note that the issue isn’t that HA suffers when this is the case (all services become slow to respond) - the issue is that it does not recover afterwards. It just stops logging and then becomes totally unresponsive.
HA works fine when using Emby etc. - it is just when hit the hard disk hard that it stops. This can be as simple as copying a large number of large files.
Unfortunately not. Now and again the CPU rises for no seemingly good reason and stays there for days, even weeks. And then, again with no reason drops again enough for the fan to no longer engage.
I am living with it now because it is okay for longer period of times than it is not okay!
That’s too bad. The hard-disk load (in an unrelated container no less) making HA unresponsive (running in its own VM), and it staying unresponsive is incredibly strange. This also affects Z-Wave controller in my case. It seems to stop responding entirely, and the only way to bring it back is to reboot the HA VM after the disk activity stops.