Hi,
I have a little problem with my HassOS installation on an RPI3B. Every now and then, the system will freeze or be extremely slow. I have seen it often around updates which fail then because the update process protests that it doesn’t have internet.
The observability of HassOS sucks really badly, unfortunately. Luckily, I just found a telegraf addon and thanks to that I could see that last night the system hung from shortly after midnight until shortly before 8 in the morning. I don’t see anything suspicious, though, like slowly increasing RAM usage of a container or something like that. When it came back, the homeassistant container seemed to be hogging CPU, though but not for long.
So I check the host logs and found that the OOM killer has been killing Python3. I can’t tell, which Python3 that was, though. Homeassistant? Supervisor? Oh, and apparently, it produced a coredump… That would certainly explain the lockup… Producing a coredump even slows down my Ryzen 7 desktop. I can see how it would lock up a poor little Pi for hours…
[75880.681731] telegraf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[75880.681755] CPU: 2 PID: 13176 Comm: telegraf Tainted: G C 5.10.92-v8 #1
[75880.681761] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[75880.681767] Call trace:
[75880.681783] dump_backtrace+0x0/0x1b0
[75880.681791] show_stack+0x20/0x30
[75880.681800] dump_stack+0xec/0x154
[75880.681807] dump_header+0x50/0x20c
[75880.681816] oom_kill_process+0x208/0x210
[75880.681823] out_of_memory+0xec/0x330
[75880.681833] __alloc_pages_slowpath.constprop.0+0x824/0xba0
[75880.681841] __alloc_pages_nodemask+0x2a4/0x320
[75880.681847] pagecache_get_page+0x13c/0x2e0
[75880.681853] filemap_fault+0x6c8/0xa60
[75880.681862] ext4_filemap_fault+0x3c/0xa00
[75880.681871] __do_fault+0x44/0x110
[75880.681878] handle_mm_fault+0x6b4/0xd90
[75880.681885] do_page_fault+0x148/0x3e0
[75880.681891] do_translation_fault+0x60/0x78
[75880.681900] do_mem_abort+0x48/0xb0
[75880.681908] el0_ia+0x68/0xd0
[75880.681914] el0_sync_handler+0x98/0xc0
[75880.681923] el0_sync+0x180/0x1c0
[75880.681981] Mem-Info:
[75880.682000] active_anon:80133 inactive_anon:86000 isolated_anon:0
[75880.682000] active_file:245 inactive_file:1154 isolated_file:47
[75880.682000] unevictable:0 dirty:0 writeback:0
[75880.682000] slab_reclaimable:10355 slab_unreclaimable:12195
[75880.682000] mapped:157 shmem:4 pagetables:3816 bounce:0
[75880.682000] free:4513 free_pcp:0 free_cma:0
[75880.682014] Node 0 active_anon:320532kB inactive_anon:344000kB active_file:980kB inactive_file:4616kB unevictable:0kB isolated(anon):0kB isolated(file):188kB mapped:628kB dirty:0kB writeback:0kB shmem:16kB writeback_tmp:0kB kernel_stack:12160kB all_unreclaimable? no
[75880.682032] DMA free:18052kB min:53248kB low:57344kB high:61440kB reserved_highatomic:0KB active_anon:320532kB inactive_anon:344000kB active_file:1024kB inactive_file:4356kB unevictable:0kB writepending:0kB present:970752kB managed:931500kB mlocked:0kB pagetables:15264kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[75880.682040] lowmem_reserve[]: 0 0 0 0
[75880.682082] DMA: 1220*4kB (UMEC) 457*8kB (UME) 337*16kB (UE) 119*32kB (UME) 19*64kB (ME) 2*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19208kB
[75880.682218] 1814 total pagecache pages
[75880.682260] 310 pages in swap cache
[75880.682269] Swap cache stats: add 636466, delete 636156, find 47616/567183
[75880.682277] Free swap = 0kB
[75880.682284] Total swap = 232872kB
[75880.682299] 242688 pages RAM
[75880.682306] 0 pages HighMem/MovableOnly
[75880.682314] 9813 pages reserved
[75880.682321] 16384 pages cma reserved
[75880.682329] Tasks state (memory values in pages):
[75880.682337] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[75880.682371] [ 111] 1001 111 2035 229 40960 115 -900 dbus-daemon
[75880.682384] [ 112] 0 112 269780 96 126976 455 0 os-agent
[75880.682398] [ 117] 0 117 33742 104 204800 131 -250 systemd-journal
[75880.682411] [ 121] 0 121 97108 349 122880 324 0 udisksd
[75880.682423] [ 138] 0 138 3581 21 65536 386 -1000 systemd-udevd
[75880.682443] [ 340] 1005 340 21524 33 61440 148 0 systemd-timesyn
[75880.682456] [ 343] 0 343 247903 656 167936 352 0 NetworkManager
[75880.682468] [ 347] 0 347 58384 0 73728 261 0 rauc
[75880.682480] [ 353] 0 353 75250 0 65536 123 0 rngd
[75880.682492] [ 355] 0 355 2938 36 61440 144 0 systemd-logind
[75880.682505] [ 356] 0 356 2352 62 49152 108 0 wpa_supplicant
[75880.682518] [ 400] 0 400 530 7 32768 21 0 hciattach
[75880.682571] [ 402] 0 402 1881 39 40960 62 0 bluetoothd
[75880.682583] [ 437] 0 437 789067 10724 630784 3279 0 dockerd
[75880.682596] [ 445] 0 445 387017 1499 258048 1067 0 containerd
[75880.682608] [ 855] 0 855 287102 0 122880 221 0 docker-proxy
[75880.682621] [ 862] 0 862 287038 0 122880 182 0 docker-proxy
[75880.682633] [ 876] 0 876 418327 506 217088 367 1 containerd-shim
[75880.682646] [ 895] 0 895 49 0 28672 5 0 s6-svscan
[75880.682659] [ 987] 0 987 49 0 28672 4 0 s6-supervise
[75880.682673] [ 1146] 0 1146 49 0 28672 3 0 s6-supervise
[75880.682686] [ 1150] 0 1150 177333 181 86016 865 0 observer
[75880.682699] [ 1176] 0 1176 315210 1357 221184 518 0 docker
[75880.682711] [ 1177] 0 1177 827 1 40960 31 0 hassos-cli
[75880.682724] [ 1227] 0 1227 418391 568 217088 364 1 containerd-shim
[75880.682736] [ 1246] 0 1246 49 0 28672 4 0 s6-svscan
[75880.682749] [ 1345] 0 1345 49 0 28672 3 0 s6-supervise
[75880.682761] [ 1536] 0 1536 49 0 28672 4 0 s6-supervise
[75880.682773] [ 1537] 0 1537 49 0 28672 3 0 s6-supervise
[75880.682786] [ 1540] 0 1540 43941 9734 372736 7723 0 python3
[75880.682798] [ 1541] 0 1541 1094 123 40960 397 0 bash
[75880.682810] [ 1705] 0 1705 418391 555 212992 366 1 containerd-shim
[75880.682823] [ 1727] 0 1727 49 0 28672 3 0 s6-svscan
[75880.682836] [ 1811] 0 1811 370842 1379 249856 516 0 docker
[75880.682848] [ 1823] 0 1823 45 0 16384 4 0 foreground
[75880.682860] [ 1824] 0 1824 49 0 28672 3 0 s6-supervise
[75880.682872] [ 1836] 0 1836 44 0 16384 3 0 foreground
[75880.682884] [ 1893] 0 1893 653 1 36864 87 0 cli.sh
[75880.682897] [ 1999] 0 1999 411 0 32768 11 0 sleep
[75880.682909] [ 2015] 0 2015 418391 540 212992 370 1 containerd-shim
[75880.682928] [ 2035] 0 2035 49 0 28672 5 0 s6-svscan
[75880.682942] [ 2114] 0 2114 49 0 28672 3 0 s6-supervise
[75880.682955] [ 2272] 0 2272 418391 526 217088 391 1 containerd-shim
[75880.682968] [ 2294] 0 2294 49 0 28672 6 0 s6-svscan
[75880.682980] [ 2333] 0 2333 49 0 28672 3 0 s6-supervise
[75880.682992] [ 2337] 0 2337 180002 4654 159744 317 0 coredns
[75880.683004] [ 2423] 0 2423 49 0 28672 3 0 s6-supervise
[75880.683017] [ 2605] 0 2605 418391 532 208896 346 1 containerd-shim
[75880.683029] [ 2657] 0 2657 49 0 28672 4 0 s6-svscan
[75880.683042] [ 2718] 0 2718 49 0 28672 4 0 s6-supervise
[75880.683054] [ 2978] 0 2978 49 0 28672 3 0 s6-supervise
[75880.683066] [ 2983] 0 2983 218 8 28672 3 0 mdns-repeater
[75880.683114] [ 3085] 0 3085 49 0 28672 3 0 s6-supervise
[75880.683126] [ 3086] 0 3086 49 0 28672 3 0 s6-supervise
[75880.683139] [ 3089] 0 3089 1080 1 36864 504 0 bash
[75880.683151] [ 3090] 0 3090 23668 158 86016 504 0 pulseaudio
[75880.683163] [ 3116] 0 3116 1081 0 36864 502 0 bash
[75880.683176] [ 3117] 0 3117 1256 1 49152 80 0 udevadm
[75880.683188] [ 3124] 0 3124 501 4 36864 98 0 rlwrap
[75880.683200] [ 3125] 0 3125 427 0 36864 11 0 cat
[75880.683216] [ 3375] 0 3375 287102 0 118784 178 0 docker-proxy
[75880.683258] [ 3382] 0 3382 305614 0 126976 213 0 docker-proxy
[75880.683270] [ 3397] 0 3397 418455 545 208896 356 1 containerd-shim
[75880.683282] [ 3417] 0 3417 49 0 28672 4 0 s6-svscan
[75880.683295] [ 3505] 0 3505 49 0 28672 4 0 s6-supervise
[75880.683307] [ 3990] 0 3990 49 0 28672 3 0 s6-supervise
[75880.683319] [ 3991] 0 3991 49 0 28672 4 0 s6-supervise
[75880.683332] [ 3993] 0 3993 5383 14 73728 4171 0 ttyd
[75880.683344] [ 3995] 0 3995 1079 0 36864 118 0 sshd
[75880.683356] [ 4579] 0 4579 418391 545 208896 350 1 containerd-shim
[75880.683369] [ 4629] 0 4629 49 0 28672 5 0 s6-svscan
[75880.683381] [ 4743] 0 4743 49 0 28672 4 0 s6-supervise
[75880.683394] [ 4899] 0 4899 49 0 28672 3 0 s6-supervise
[75880.683406] [ 4902] 0 4902 6306 81 86016 3665 0 hass-configurat
[75880.683425] [ 8202] 0 8202 418391 553 217088 344 1 containerd-shim
[75880.683437] [ 8222] 0 8222 49 0 28672 4 0 s6-svscan
[75880.683449] [ 8265] 0 8265 49 0 28672 3 0 s6-supervise
[75880.683462] [ 8414] 0 8414 49 0 28672 3 0 s6-supervise
[75880.683474] [ 8417] 0 8417 208494 98743 1642496 13090 0 python3
[75880.683487] [ 12672] 0 12672 418327 524 225280 338 1 containerd-shim
[75880.683499] [ 12691] 0 12691 49 0 28672 4 0 s6-svscan
[75880.683512] [ 12733] 0 12733 49 0 28672 4 0 s6-supervise
[75880.683525] [ 13143] 0 13143 49 0 28672 3 0 s6-supervise
[75880.683538] [ 13146] 0 13146 1242507 3755 307200 986 0 telegraf
[75880.683551] [ 15272] 0 15272 418327 571 212992 333 1 containerd-shim
[75880.683563] [ 15292] 0 15292 49 0 28672 7 0 s6-svscan
[75880.683575] [ 15335] 0 15335 49 0 28672 4 0 s6-supervise
[75880.683589] [ 15774] 0 15774 49 0 28672 3 0 s6-supervise
[75880.683602] [ 15775] 0 15775 49 0 28672 3 0 s6-supervise
[75880.683614] [ 15779] 0 15779 1449 1 40960 188 0 nginx
[75880.683627] [ 15778] 0 15778 66716 1 385024 2987 0 npm start --set
[75880.683675] [ 15826] 0 15826 89771 13675 1089536 8363 0 node-red
[75880.683688] [ 15947] 0 15947 1507 16 40960 234 0 nginx
[75880.683702] [ 75231] 0 75231 289090 317 147456 15 0 runc
[75880.683714] [ 75232] 0 75232 288738 333 143360 3 0 runc
[75880.683726] [ 75233] 0 75233 288674 389 135168 22 0 runc
[75880.683738] [ 75234] 0 75234 270290 338 131072 2 0 runc
[75880.683751] [ 75235] 0 75235 270578 208 135168 22 0 runc
[75880.683763] [ 75236] 0 75236 288674 357 143360 9 0 runc
[75880.683775] [ 75237] 0 75237 288674 328 143360 3 0 runc
[75880.683788] [ 75238] 0 75238 288674 368 139264 22 0 runc
[75880.683800] [ 75239] 0 75239 288738 347 143360 4 0 runc
[75880.683813] [ 75240] 0 75240 270226 338 131072 4 0 runc
[75880.683825] [ 75292] 0 75292 270290 325 131072 35 0 runc
[75880.683837] [ 75293] 0 75293 270642 385 139264 25 0 runc
[75880.683850] [ 75295] 0 75295 270226 321 135168 11 0 runc
[75880.683863] [ 75296] 0 75296 270226 317 135168 7 0 runc
[75880.683875] [ 75297] 0 75297 270354 321 135168 11 0 runc
[75880.683887] [ 75298] 0 75298 288738 322 143360 18 0 runc
[75880.683934] [ 75299] 0 75299 270290 332 131072 7 0 runc
[75880.683947] [ 75300] 0 75300 288738 314 143360 23 0 runc
[75880.683959] [ 75304] 0 75304 270578 207 126976 35 0 runc
[75880.683972] [ 75305] 0 75305 270642 312 135168 18 0 runc
[75880.683984] [ 75308] 0 75308 288674 160 139264 32 0 runc
[75880.683997] [ 75309] 0 75309 270642 305 135168 21 0 runc
[75880.684009] [ 75310] 0 75310 270642 318 139264 16 0 runc
[75880.684021] [ 75312] 0 75312 288738 316 143360 10 0 runc
[75880.684034] [ 75313] 0 75313 289090 306 135168 43 0 runc
[75880.684049] [ 75315] 0 75315 288738 301 139264 29 0 runc
[75880.684062] [ 75316] 0 75316 270226 325 126976 8 0 runc
[75880.684074] [ 75319] 0 75319 270290 303 131072 24 0 runc
[75880.684086] [ 75323] 0 75323 288802 308 139264 22 0 runc
[75880.684099] [ 75327] 0 75327 270226 317 131072 33 0 runc
[75880.684111] [ 75331] 0 75331 270290 313 131072 16 0 runc
[75880.684124] [ 76220] 0 76220 2822 103 45056 0 0 systemd-coredum
[75880.684136] [ 76342] 0 76342 1094 125 40960 395 0 bash
[75880.684149] [ 76343] 0 76343 1358 126 40960 0 0 curl
[75880.684164] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=158eefef5b9d1fd8d17343bd719d3c1a07d228d598b19d7e65b433f6d4fc9757,mems_allowed=0,global_oom,task_memcg=/docker/42cd56a53fb2a0d53362bb21d2cc9d994767270e27cd17e673cca4fd7a96b243,task=python3,pid=8417,uid=0
[75880.684661] Out of memory: Killed process 8417 (python3) total-vm:833976kB, anon-rss:394972kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1604kB oom_score_adj:0
[75881.073776] oom_reaper: reaped process 8417 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[75889.596406] systemd-coredump[76220]: Failed to get COMM: No such process
[75890.408397] audit: type=1701 audit(1645335256.608:223): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=117 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" sig=6 res=1
[75890.834466] systemd-coredump[76514]: Process 117 (systemd-journal) of user 0 dumped core.
Sidenote: I know, this dump says telegraf invoked the OOM killer, but the problem existed before. I installed telegraf because of the problem.
And for the observability… The “observer” just tells me
Supervisor: Connected
Supported: Unsupported
Healthy: Unhealthy
Right… Unsupported? Probably because of the telegraf addon? Unhealthy? Öh, yeah, I know, but WHY and HOW?
Active addons, by the way, are File Editor, NodeRED, Telegraf and Terminal. I even don’t use Samba.
So far it’s obvious that it’s a memory issue but lacking insight into the architecture of HassOS and observability, I have no idea how to track it down.