High CPU usage after 0.113

There wasn’t anything apparent in the py-spy recording. A thread dump from py-spy might reveal if there is an integration creating a lot of threads.

This is the output of the dump command. I executed it at the moment when CPU usage was at about 80%.

About a minute later the CPU usage went down again to about 10%. Looks like I get about one spike of high CPU usage every 2 minutes or so. At the same time I see an increase in network traffic (incoming and outgoing).

On MQTT I do have an OpenMQTTGateway, one Zigbee2MQTT Gateway and 5 Tasmota devices. Not much going on there.

Process 5995: python3 -m homeassistant --config /config
Python v3.8.3 (/usr/local/bin/python3.8)

Thread 0x560CE284B8B0 (idle): "MainThread"
    select (selectors.py:468)
    _run_once (asyncio/base_events.py:1823)
    run_forever (asyncio/base_events.py:570)
    run_until_complete (asyncio/base_events.py:603)
    run (asyncio/runners.py:43)
    run (homeassistant/runner.py:119)
    main (homeassistant/__main__.py:312)
    <module> (homeassistant/__main__.py:320)
    _run_code (runpy.py:87)
    _run_module_as_main (runpy.py:194)
Thread 0x7FEA28243B20 (active): "Thread-1"
    dequeue (logging/handlers.py:1427)
    _monitor (logging/handlers.py:1478)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA2813FB20 (active): "SyncWorker_0"
    _worker (concurrent/futures/thread.py:78)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA272FFB20 (active): "SyncWorker_1"
    _worker (concurrent/futures/thread.py:78)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA269BFB20 (active): "SyncWorker_2"
    _worker (concurrent/futures/thread.py:78)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA263FFB20 (active): "SyncWorker_3"
    _worker (concurrent/futures/thread.py:78)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA25DFFB20 (active): "SyncWorker_4"
    scan (netdisco/ssdp.py:259)
    run (concurrent/futures/thread.py:57)
    _worker (concurrent/futures/thread.py:80)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA257FFB20 (active): "SyncWorker_5"
    _worker (concurrent/futures/thread.py:78)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA251FFB20 (active): "SyncWorker_6"
    _worker (concurrent/futures/thread.py:78)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA24BFFB20 (active): "SyncWorker_7"
    _worker (concurrent/futures/thread.py:78)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA23EBFB20 (idle): "Recorder"
    wait (threading.py:302)
    get (queue.py:170)
    run (homeassistant/components/recorder/__init__.py:338)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA237FFB20 (active): "Thread-2"
    dequeue (logging/handlers.py:1427)
    _monitor (logging/handlers.py:1478)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA216D9B20 (active): "zeroconf-Engine-275"
    update_record (zeroconf/__init__.py:1612)
    update_record (netdisco/mdns.py:21)
    update_record (zeroconf/__init__.py:2622)
    handle_response (zeroconf/__init__.py:2662)
    handle_read (zeroconf/__init__.py:1405)
    run (zeroconf/__init__.py:1314)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA215D5B20 (idle): "zeroconf-Reaper_276"
    wait (threading.py:306)
    wait (zeroconf/__init__.py:2360)
    run (zeroconf/__init__.py:1422)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA1FDD5B20 (active): "Thread-6"
    loop (paho/mqtt/client.py:1163)
    loop_forever (paho/mqtt/client.py:1782)
    _thread_main (paho/mqtt/client.py:3428)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)
Thread 0x7FEA1A4EDB20 (active): "Thread-7"
    loop (paho/mqtt/client.py:1163)
    loop_forever (paho/mqtt/client.py:1782)
    _thread_main (paho/mqtt/client.py:3428)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)

In this case it would be best to get a py-spy recording with a duration of at least 120 seconds

I have:
Home Assistant 0.114.1

arch aarch64
chassis embedded
dev false
docker true
docker_version 19.03.8
hassio true
host_os HassOS 5.0
installation_type Home Assistant OS
os_name Linux
os_version 4.19.127-v8
python_version 3.8.3
supervisor 234

Using System Monitor platform:
pi4

@bdraco one thing I have noticed is that the CPU utilisation climbs a fraction higher with each and every event from any platform. Could this be a result of the recent listener optimisations?

That code is very straight forward, so it’s not that. I wish it were as then we wouldn’t be waiting on upstream merges to fix the known ones.

1 Like

Also, here is the latest list of known issues I’ve been working though https://github.com/home-assistant/core/issues/38862#issuecomment-674275786

Unfortunatley since 0.113 I’m having the same issue of CPU usage constantly going up (only reboot helps). The same happens after updating to 0.114.2.

image

arch armv7l
chassis embedded
dev false
docker true
docker_version 19.03.11
hassio true
host_os HassOS 4.12
installation_type Home Assistant OS
os_name Linux
os_version 4.19.127-v7
python_version 3.8.3
supervisor 234
timezone Europe/Warsaw
version 0.114.2
virtualenv false

Running on Raspberry Pi 3 B+.

Restart is not the same as reboot. I hope you mean restart.

Could you please wait until the CPU use is high and run the py-spy tool mentioned further up in this topic then paste the results here. This will help with diagnosis of the issue.

It doesn’t seem to go higher than ~30% after a few days, the load sensor (15m) indicates a value around 1 though.
I hope the fix on zeroconf 0.28.1 will solve my issue. I’ll let you know when 0.114.3 is released. Thank you

Yeah, I meant restart. Unfortunately I did it once again so I will need to wait for CPU usage to rise… which probably will take few days. Will get back with py-spy results. Thanks!

1 Like

That makes sense if it is zeroconf since the thread it is running it is probably maxing out the cpu core.

Just wanted to add I’m experiencing the same issue since 0.113 on an RPi3 B+. You can see after a restart it’s normal but it slowly continues to increase. It seems to stop around 20-25%. Will keep an eye out for 0.114.3 and update as soon as it’s released.

Please post py-spy recordings & dumps if you can.

Sorry for the delay. I did a py-spy recording for about 4 minutes. During the whole time CPU usage was above 60% on an i7 CPU. btw… I’m now on 0.114.2

This is how it looked for the last 6 hours:

I will now disable all automations just to make sure it’s not related to them.

I’m still not sure where the network traffic comes from. I’m also investigating this one.

Update: Network traffic is mostly from 224.0.0.251 (multicast DNS).

I just realized that restarting Home-Assistant throws this in the log:

2020-08-20 08:12:22 ERROR (zeroconf-ServiceBrowser__nut._tcp.local.-_ipps._tcp.local.-_api._udp.local.-_plugwise._tcp.local.-_Volumio._tcp.local.-_googlecast._tcp.local.-_dkapi._tcp.local.-_miio._udp.local.-_axis-video._tcp.local.-_spotify-connect._tcp.local.-_daap._tcp.local.-_hap._tcp.local.-_elg._tcp.local.-_wled._tcp.local.-_viziocast._tcp.local.-_bond._tcp.local.-_esphomelib._tcp.local.-_ipp._tcp.local.-_printer._tcp.local._362) [homeassistant.components.zeroconf] Failed to get info for device
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zeroconf/__init__.py", line 213, in service_update
    service_info = zeroconf.get_service_info(service_type, name)
  File "/usr/local/lib/python3.8/site-packages/zeroconf/__init__.py", line 2371, in get_service_info
    info = ServiceInfo(type_, name)
  File "/usr/local/lib/python3.8/site-packages/zeroconf/__init__.py", line 1732, in __init__
    if not type_.endswith(service_type_name(name, allow_underscores=True)):
  File "/usr/local/lib/python3.8/site-packages/zeroconf/__init__.py", line 268, in service_type_name
    raise BadTypeInNameException("Type '%s' must end with '._tcp.local.' or '._udp.local.'" % type_)
zeroconf.BadTypeInNameException: Type '' must end with '._tcp.local.' or '._udp.local.'

It only appears once after starting up. Probably not related. Just wanted to mention it.

Update: the message repeats. It is not only once.

Sorry for spamming here… just wanted to let you know it is not related to the automations. Nothing changed while all automations have been disabled.

Something happened at around 13:30 and CPU usage is way up there again.

py-spy does show about the same as last time.

I had the zeroconf error again at 13:25. Could be related. Exact same error as I’ve posted above.

I also see these kind of messages every few minutes:

2020-08-20 13:25:55 INFO (MainThread) [homeassistant.components.mqtt.discovery] Component has already been discovered: binary_sensor 240AC48074A0, sending update
2020-08-20 13:25:55 INFO (MainThread) [homeassistant.components.mqtt] Got update for entity with hash: ('binary_sensor', '240AC48074A0') '{'state_topic': 'opengw/OpenMQTTGatew
ay_ESP32_BLE/LWT', 'name': 'OpenMQTTGateway_ESP32_BLE', 'unique_id': '240AC48074A0', 'device_class': 'connectivity', 'payload_on': 'online', 'payload_off': 'offline', 'payload
_available': 'online', 'payload_not_available': 'offline', 'platform': 'mqtt'}'
2020-08-20 13:25:55 INFO (MainThread) [homeassistant.components.mqtt] Ignoring unchanged update for: binary_sensor.openmqttgateway_esp32_ble_2

So I will re-enable all automations but get rid of the OpenMQTTGateway for testing.

You have a device on your network that is broadcasting an invalid mdns name. We recently added a workaround for this issue to prevent the ServiceBrowser from collapsing. It shouldn’t cause any issue anymore besides an annoying log message. The solution is for the vendor to fix the device firmware if they are interested in doing so (most of these were caused by older HomeKit stacks that have been fixed with device firmware updates).

Thank you for the py-spy. 0.114.3 is out. Can you give it a spin?

3 Likes

This sounds promising, thanks @bdraco. There are many devices on the network and I assume a few of them could have a bad firmware.

I’m installing the update right now. Will report back in a few hours. I’ve also removed the OpenMQTTGateway from the network