The issue seen on the recording you posted should be resolved in 0.113
I’m having the same issue with, after 12 hours (with 0.113.2) now after 1 hour with (0.113.3) Any help?
Please post a py-spy
.
Also if you have the spotcast
custom integration installs, be sure to upgrade.
There is no py-spy for arm devices right?
You can install via cargo
if you are using the latest home assistant images, and enter the docker container via https://developers.home-assistant.io/docs/operating-system/debugging/
apk add cargo
cargo install py-spy
mkdir /config/www
/root/.cargo/bin/py-spy record --pid 227 --duration 120 --output /config/www/pi.svg
You’ll need to use top
or something else to find the correct pid
EDIT: I was able to compile using pip instead of cargo, but had to use the instructions at this GitHub issue: https://github.com/benfred/py-spy/issues/23
I’m getting the following error trying to install py-spy:
error: could not find native static library `unwind`, perhaps an -L flag is missing?
error: aborting due to previous error
error: could not compile `remoteprocess`.
To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `py-spy v0.3.3`, intermediate artifacts can be found at `/tmp/cargo-installtigzV2`
Caused by:
build failed
I’m running Home Assistant as a VM on Proxmox.
Are you using the latest image with python 3.8.3?
The cargo install will only work on the newer images with the newer rust.
Edit: I see you got it working by other means
Yes, I am on .113.3 with the latest supervisor and OS as well, so not sure what the issue was there but going the pip route was fine once I found the other hoop to jump through.
Hello, I also have a problem with high cpu for several versions. From reboot it grows from 2% to about 35% (RPI4 Docker Supervised currently at 0.114).
Thank you for any tips.
Py-Spy for 2 top PIDs (120s):
Thanks for the py-spy
recordings. Telegram looks like its taking up a fair amount of cpu time.
Did you hit the 35% cpu when these were taken?
Also, a dump
at the same time would be helpful.
No, it was the day after the reset (pick about 15%). Now there was an update to 114.1. I have to wait around 2 days to see if cpu consumption will increase to 35%. I will wait and let you know and try to make py-spy record again.
Hi @bdraco, I’m also one of the users having the CPU-increase-up-to-saturation, and trying to isolate and fix it locally by applying all changes going on in the integrations under suspect :), so I think I can add some interesting info of what I found:
Suspects
-
Indeed, telegram_bot (configured as “polling”) takes a lot of threads and uses a lot of CPU (but before 0.112 it was not a problem). Changing it to “webhooks” (which, with NabuCasa, works seamlessly) does reduce this, but no change on the CPU issue going under (but I could disable it for a test run if required )
-
I have 3 cast devices, but I removed the integration and marked it as “ignore”.
-
3 esphome devices. I locally applied the changes from Add the ability to use the shared Home Assistant Zeroconf instance by bdraco · Pull Request #13 · esphome/aioesphomeapi · GitHub and Use the shared Zeroconf instance in esphome by bdraco · Pull Request #38747 · home-assistant/core · GitHub
-
8 Shelly devices (custom integration
ShellyForHass
), which I read somewhere that it was a suspect, and indeed it is.I manually changed the CC repo and the main library (
PyShelly
) to share the zeroconf instance from HA, and I have it ready to make both PR’s if it works, but my first impression after the change was that the issue was quicker! (CPU saturation in < 15h, when initially took > 1day) -
2 Sonos devices, which I also saw in the suspect list, I think, and they take 9 threads (but no increase over time, and not specially present in the py-spy plots)
Setup context:
- RPI4-4GB+SSD+UPS+conbee II stick, solid as rock
- Supervisor install with docker (no pure-hassio), v0.114.0 with custom changes (the ones described above)
- ADDONS: Appdaemon4, deCONZ, esphome, vcode, ADB, adguard, mqtt, dhcp, mariaDB, samba, nginx-proxy-manager
- Custom integrations: shelly, xiaomi_miio_fan, eventsensor
- Integrations: homekit, mobile_app(iOS) x4, sonos x2, androidtv x2, hue x16, nut, tplink x1, nabucasa + alexa, denon_avr, tuya x3, influxdb, recorder, and the ones from addons: deconz x29, esphome x3, adguard, mqtt
Last experiment
I’ve just found and applied all changes from your 2 PRs in zeroconf
: Reduce the time window that the handlers lock is held by bdraco · Pull Request #287 · python-zeroconf/python-zeroconf · GitHub and Ensure all listeners are cleaned up on ServiceBrowser cancelation by bdraco · Pull Request #290 · python-zeroconf/python-zeroconf · GitHub, so I will restart once more to test the behavior → Edited: done, with interesting results:
These are my last HA sessions, plotting a 15-min rolling mean of the reported CPU usage.
The current one, with all changes described, looks stable for now (~7h running), but it is running with a 3-4% increase in CPU usage over the stable reference on v0.112, so maybe there is something more under the hood
I can provide py-spy dumps and plots. This dump is from right now:
For PyShelly
. Is it better once https://github.com/jstasiak/python-zeroconf/pull/290 is applied?
The only other thing that stands out is there have been some recent influxdb
changes that another user reported was causing an issue.
For
PyShelly
. Is it better once https://github.com/jstasiak/python-zeroconf/pull/290 is applied?
I think so, but couldn’t be precise about that.
But it is better than without it for sure, in general for the system-as-one, I cannot say specifically for pyshelly
(I’m not even sure if it is using the mdns in my config, as all devices are defined by ip and discovery is off right now :))
The only other thing that stands out is there have been some recent
influxdb
changes that another user reported was causing an issue.
Thanks, I’ll search about it, but my first impression is that influx is not the cause, and I’ve not seen any influx error log or presence in dumps/py-spy dumps/plots…, but I’ll review it.
The thing that surprised me the most was the pyhap
usage, and I don’t use the homekit_controller, but just the homekit
to control a few things with Siri on watches
BTW, the current run evolves stable, but wrong things are happening in the network/HA for sure: I’m having lot of hue bridge fetch errors, deconz zigbee sensors not triggering sometimes, small delays on automations, and ghostly things alike… (I even tried to restart everything: router/zigbee hubs/individual devices, but it’s not that)
Tomorrow I’ll try to disable some things to continue testing, and maybe update to 114.2 and redo the custom changes in pyshelly, aioesphome and zeroconf again…
Is something interesting (related to this issue) in 114.1 and 114.2 revisions that could help here?
If you are on python 3.8, https://github.com/home-assistant/core/pull/38821 could fix the executor being overloaded (in 0.114.1).
Saw it yesterday, I think. It is one of the customizations, first with 100, last runs with the selected value of 64 from the PR. No apparent change in behaviour
Mostly in reference to pyhap:
Also to analyze the py-spy
s, look at the file and line numbers and if its a select()/sleep()
like operation, exclude this from the analysis as you can assume its using no cpu and just blocking.
As requested @bdraco I am sending py-spy records (120, 360s and dump). After two days, CPU usage jumped to 30%. Is there anything else I can do to help fix this problem? Thanks for your help
Thank you. telegram
is the only thing that stands out. There is quite a bit of time in zeroconf
. Let’s see if 0.114.3 solves the issue for you as it has the zeroconf
fix.
I realized I only posted this in https://community.home-assistant.io/t/high-cpu-usage-after-0-113.
Here is the current status all the known cpu related issues that I’m aware of: