Python3 high CPU Usage

bdraco · July 12, 2020, 12:11am

The issue seen on the recording you posted should be resolved in 0.113

XinuX · August 1, 2020, 10:37pm

I’m having the same issue with, after 12 hours (with 0.113.2) now after 1 hour with (0.113.3) Any help?

bdraco · August 2, 2020, 9:09am

Please post a py-spy.

Also if you have the spotcast custom integration installs, be sure to upgrade.

XinuX · August 2, 2020, 9:27am

There is no py-spy for arm devices right?

bdraco · August 2, 2020, 11:12pm

You can install via cargo if you are using the latest home assistant images, and enter the docker container via https://developers.home-assistant.io/docs/operating-system/debugging/

apk add cargo
cargo install py-spy
mkdir /config/www
/root/.cargo/bin/py-spy record --pid 227 --duration 120 --output /config/www/pi.svg

You’ll need to use top or something else to find the correct pid

apop · August 6, 2020, 8:11pm

EDIT: I was able to compile using pip instead of cargo, but had to use the instructions at this GitHub issue: https://github.com/benfred/py-spy/issues/23

I’m getting the following error trying to install py-spy:

error: could not find native static library `unwind`, perhaps an -L flag is missing?

error: aborting due to previous error

error: could not compile `remoteprocess`.

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `py-spy v0.3.3`, intermediate artifacts can be found at `/tmp/cargo-installtigzV2`

Caused by:
  build failed

I’m running Home Assistant as a VM on Proxmox.

bdraco · August 6, 2020, 10:55pm

Are you using the latest image with python 3.8.3?

The cargo install will only work on the newer images with the newer rust.

Edit: I see you got it working by other means

apop · August 6, 2020, 11:34pm

Yes, I am on .113.3 with the latest supervisor and OS as well, so not sure what the issue was there but going the pip route was fine once I found the other hoop to jump through.

bedson · August 15, 2020, 5:36pm

Hello, I also have a problem with high cpu for several versions. From reboot it grows from 2% to about 35% (RPI4 Docker Supervised currently at 0.114).
Thank you for any tips.

Py-Spy for 2 top PIDs (120s):

bdraco · August 15, 2020, 7:11pm

Thanks for the py-spy recordings. Telegram looks like its taking up a fair amount of cpu time.

Did you hit the 35% cpu when these were taken?

Also, a dump at the same time would be helpful.

bedson · August 15, 2020, 7:26pm

No, it was the day after the reset (pick about 15%). Now there was an update to 114.1. I have to wait around 2 days to see if cpu consumption will increase to 35%. I will wait and let you know and try to make py-spy record again.

azogue · August 17, 2020, 6:01pm

Hi @bdraco, I’m also one of the users having the CPU-increase-up-to-saturation, and trying to isolate and fix it locally by applying all changes going on in the integrations under suspect :), so I think I can add some interesting info of what I found:

Suspects

Indeed, telegram_bot (configured as “polling”) takes a lot of threads and uses a lot of CPU (but before 0.112 it was not a problem). Changing it to “webhooks” (which, with NabuCasa, works seamlessly) does reduce this, but no change on the CPU issue going under (but I could disable it for a test run if required )
I have 3 cast devices, but I removed the integration and marked it as “ignore”.
3 esphome devices. I locally applied the changes from Add the ability to use the shared Home Assistant Zeroconf instance by bdraco · Pull Request #13 · esphome/aioesphomeapi · GitHub and Use the shared Zeroconf instance in esphome by bdraco · Pull Request #38747 · home-assistant/core · GitHub
8 Shelly devices (custom integration ShellyForHass), which I read somewhere that it was a suspect, and indeed it is.

I manually changed the CC repo and the main library (PyShelly) to share the zeroconf instance from HA, and I have it ready to make both PR’s if it works, but my first impression after the change was that the issue was quicker! (CPU saturation in < 15h, when initially took > 1day)
2 Sonos devices, which I also saw in the suspect list, I think, and they take 9 threads (but no increase over time, and not specially present in the py-spy plots)

Setup context:

RPI4-4GB+SSD+UPS+conbee II stick, solid as rock
Supervisor install with docker (no pure-hassio), v0.114.0 with custom changes (the ones described above)
ADDONS: Appdaemon4, deCONZ, esphome, vcode, ADB, adguard, mqtt, dhcp, mariaDB, samba, nginx-proxy-manager
Custom integrations: shelly, xiaomi_miio_fan, eventsensor
Integrations: homekit, mobile_app(iOS) x4, sonos x2, androidtv x2, hue x16, nut, tplink x1, nabucasa + alexa, denon_avr, tuya x3, influxdb, recorder, and the ones from addons: deconz x29, esphome x3, adguard, mqtt

Last experiment

I’ve just found and applied all changes from your 2 PRs in zeroconf: Reduce the time window that the handlers lock is held by bdraco · Pull Request #287 · python-zeroconf/python-zeroconf · GitHub and Ensure all listeners are cleaned up on ServiceBrowser cancelation by bdraco · Pull Request #290 · python-zeroconf/python-zeroconf · GitHub, ~~so I will restart once more to test the behavior~~ → Edited: done, with interesting results:

These are my last HA sessions, plotting a 15-min rolling mean of the reported CPU usage.

The current one, with all changes described, looks stable for now (~7h running), but it is running with a 3-4% increase in CPU usage over the stable reference on v0.112, so maybe there is something more under the hood

I can provide py-spy dumps and plots. This dump is from right now:

bdraco · August 17, 2020, 6:17pm

For PyShelly. Is it better once https://github.com/jstasiak/python-zeroconf/pull/290 is applied?

The only other thing that stands out is there have been some recent influxdb changes that another user reported was causing an issue.

azogue · August 17, 2020, 6:57pm

For PyShelly . Is it better once https://github.com/jstasiak/python-zeroconf/pull/290 is applied?

I think so, but couldn’t be precise about that.
But it is better than without it for sure, in general for the system-as-one, I cannot say specifically for pyshelly (I’m not even sure if it is using the mdns in my config, as all devices are defined by ip and discovery is off right now :))

The only other thing that stands out is there have been some recent influxdb changes that another user reported was causing an issue.

Thanks, I’ll search about it, but my first impression is that influx is not the cause, and I’ve not seen any influx error log or presence in dumps/py-spy dumps/plots…, but I’ll review it.

The thing that surprised me the most was the pyhap usage, and I don’t use the homekit_controller, but just the homekit to control a few things with Siri on watches

BTW, the current run evolves stable, but wrong things are happening in the network/HA for sure: I’m having lot of hue bridge fetch errors, deconz zigbee sensors not triggering sometimes, small delays on automations, and ghostly things alike… (I even tried to restart everything: router/zigbee hubs/individual devices, but it’s not that)

Tomorrow I’ll try to disable some things to continue testing, and maybe update to 114.2 and redo the custom changes in pyshelly, aioesphome and zeroconf again…
Is something interesting (related to this issue) in 114.1 and 114.2 revisions that could help here?

bdraco · August 17, 2020, 7:10pm

If you are on python 3.8, https://github.com/home-assistant/core/pull/38821 could fix the executor being overloaded (in 0.114.1).

azogue · August 17, 2020, 7:59pm

Saw it yesterday, I think. It is one of the customizations, first with 100, last runs with the selected value of 64 from the PR. No apparent change in behaviour

bdraco · August 17, 2020, 8:12pm

Mostly in reference to pyhap:

Also to analyze the py-spys, look at the file and line numbers and if its a select()/sleep() like operation, exclude this from the analysis as you can assume its using no cpu and just blocking.

bedson · August 18, 2020, 4:45pm

As requested @bdraco I am sending py-spy records (120, 360s and dump). After two days, CPU usage jumped to 30%. Is there anything else I can do to help fix this problem? Thanks for your help

bdraco · August 18, 2020, 4:57pm

Thank you. telegram is the only thing that stands out. There is quite a bit of time in zeroconf. Let’s see if 0.114.3 solves the issue for you as it has the zeroconf fix.

bdraco · August 19, 2020, 3:02am

I realized I only posted this in https://community.home-assistant.io/t/high-cpu-usage-after-0-113.

Here is the current status all the known cpu related issues that I’m aware of: