Performance issues on GOOD hardware? Would appreciate any help

I’ve been using hassio for about two years now and haven’t had any problems until the last few weeks after I upgraded to the latest core/supervisor versions that have been pushed lately.

Apologies if this is posted in the wrong category, I couldn’t find a support category on here anywhere.

Originally posted on discord

I am now getting significant lag even just using lovelace. 5+ seconds of loadtime. But what’s worse is I’m now running out of RAM (2GB) for the first time in two years.

Running the latest HASSIO 2023.3.2 through vbox on my linux PC. Followed this guide to install
System: Home Assistant OS 9.5 (amd64 / qemux86-64)
Home Assistant Core: 2023.3.3
Home Assistant Supervisor: 2023.03.1
Hardware:
i3 processor @ 2.10GHz
SSD 120GB
RAM: 2GB
Wired ethernet connection with 1Gbit/s connection.

Not running much on HASSIO itself.

  • zigbee2mqtt
  • adaptive-lighting
  • tasmota
  • mqtt broker
  • nginx proxy
  • terminal & ssh (disabled)
  • file editor (disabled)

Been tearing my hair out over this. I’m also very unsure of how to debug the issue. I found a very limited number of tutorials involving diagnosing hassio performance issues, so I’ll post what I can figure out.

Does anyone know what could be going on? What should I try next? I’m reading about possible memory leaks in the build from last april - have those been fixed? People are able to run this on a raspberry pi for God’s sake even with a clutter of programs why can’t I run it on an actual PC?

Debug info (please ask for more!)
py-spy top --pid 66 --rate 30
turning sample rate any higher causes it to lag behind.

Collecting samples from 'python3 -m homeassistant --config /config' (python v3.10.10)
Total Samples 35612
GIL: 80.65%, Active: 170.97%, Threads: 24

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 19.35%  19.35%   162.5s    164.0s   _worker (concurrent/futures/thread.py)
 16.13%  16.13%   159.1s    159.1s   do_execute (sqlalchemy/engine/default.py)
  0.00%   0.00%   86.70s    89.70s   run_once (pychromecast/socket_client.py)
  0.00%  70.97%   65.40s    579.5s   _run_event_loop (homeassistant/components/recorder/core.py)
  3.23%  25.81%   35.60s    243.6s   get_sun_events (adaptive_lighting/switch.py)
  3.23%   3.23%   32.20s    34.80s   _loop (paho/mqtt/client.py)
  0.00%   0.00%   29.33s    29.37s   read_events (watchdog/observers/inotify_c.py)
  3.23%   3.23%   26.87s    26.87s   select (scapy/supersocket.py)
  3.23%   6.45%   22.43s    28.43s   __setattr__ (astral/__init__.py)
  3.23%   3.23%   21.40s    21.80s   run (zeroconf/_services/browser.py)
  3.23%   9.68%   20.27s    115.1s   time_of_transit (astral/sun.py)
  0.00%   0.00%   19.47s    19.47s   localize (pytz/__init__.py)
  3.23%   3.23%   19.13s    37.60s   eq_of_time (astral/sun.py)
  0.00%   0.00%   18.43s    18.43s   write (asyncio/selector_events.py)
  0.00%   3.23%   16.97s    30.13s   _sock_connect (asyncio/selector_events.py)
  0.00%  25.81%   16.00s    259.7s   <listcomp> (adaptive_lighting/switch.py)
  3.23%   3.23%   15.87s    15.87s   minutes_to_timedelta (astral/sun.py)
  3.23%   3.23%   14.80s    15.43s   register (selectors.py)
  0.00%  70.97%   14.57s    845.2s   _run (asyncio/events.py)
  0.00%   0.00%   14.47s    14.47s   do_commit (sqlalchemy/engine/default.py)
  3.23%   3.23%   13.67s    13.67s   _real_close (socket.py)
  0.00%   0.00%   13.20s    14.07s   fnva (fnvhash/__init__.py)
  0.00%   0.00%   12.97s    31.77s   orm_setup_cursor_result (sqlalchemy/orm/context.py)
  0.00%   0.00%   12.13s    12.17s   unregister (selectors.py)
  3.23%   9.68%   11.03s    72.57s   _async_write_ha_state (homeassistant/helpers/entity.py)
  0.00%   0.00%   10.80s    21.40s   create_task (asyncio/base_events.py)
  0.00%   0.00%   10.67s    16.07s   cascade_iterator (sqlalchemy/orm/mapper.py)
  0.00%   0.00%   10.63s    10.63s   dequeue (logging/handlers.py)
  0.00%   0.00%   10.50s    10.50s   julianday (astral/sun.py)
  0.00%   0.00%   10.23s    31.73s   validate_mapping (voluptuous/schema_builder.py)
  0.00%   0.00%    8.80s     8.80s   hour_angle (astral/sun.py)

The following commands were executed after hassio reported it was out of memory:

  ~ ps aux
PID   USER     TIME  COMMAND
    1 root      0:16 /sbin/docker-init -- /init
    7 root      0:00 s6-svscan -t0 /var/run/s6/services
   36 root      0:00 s6-supervise s6-fdholderd
  834 root      0:00 s6-supervise stdin
  836 root      0:00 s6-supervise ttyd
  837 root      0:00 s6-supervise sshd
  840 root      0:00 bash /usr/bin/bashio ./run
  844 root      0:00 sshd: /usr/sbin/sshd -D -e [listener] 0 of 10-100 startups
  888 root      0:04 {tmux: server} tmux -u new -A -s homeassistant zsh -l
  889 root      0:03 zsh -l
 1150 root      0:00 ttyd -d1 -i hassio -p 62866 tmux -u new -A -s homeassistant zsh -l
 1163 root      0:00 sshd: hassio [priv]
 1165 hassio    0:00 sshd: hassio@pts/0
 1166 root      0:01 -zsh
 1201 root      0:00 ps aux
➜  ~ free
              total        used        free      shared  buff/cache   available
Mem:        3043676     2903124       46748        3112       93804       92516
Swap:        760916      760896          20

Glances shows python3 as using 93% CPU, but not which addon/service/integration is running it.

When I do run out of memory, I’m forced to hard reset the vbox. I’ve tried just sending acpipowerbutton but after waiting several hours hassio still wouldn’t shut down. Here’s the debug for what hassio is doing when out of memory:

Does anyone know what could be going on? What should I try next? I’m reading about possible memory leaks in the build from last april - have those been fixed? People are able to run this on a raspberry pi for God’s sake even with a clutter of programs why can’t I run it on an actual PC?
I’ve been trying to debug this on my own for several weeks now and am a bit burnt out.

EDIT: Tried upgrading to 2023.3.3 to see if python-3.11 would help at all. Also added another GB of ram (total 3GB) for the heck of it. It makes things a teeny bit faster but I still ran out of memory after running home assistant for around 19 hours.

Start by disabling third party integrations and see if that helps.

Can I simply rename the custom_components folder in the /config directory to achieve that?

Well it’s been 10 minutes after renaming custom_components and I’m ready to call it fixed. For the first time in weeks the frontend lovelace loads in under a second.

Are there any tools available for debugging which custom_component is causing the performance issues, or do you recommend I reenable one at a time?

Re-enable half of them at a time. Then you know which half the problem integration is in. Do that again repeatedly with the problem half and you should narrow it down pretty quickly.

Thank you! Figured out which component. I removed these components last week while debugging the issue on my own, but I never did a ‘reboot’ until this time, only a ‘restart’. Glad you convinced me to try again!

Will leave this thread open until more time has passed for confirmation it’s been solved.

Can you confirm which custom component was causing the error?