Crash repost

bkbartk · September 20, 2021, 6:27pm

Hello,

for the last few weeks HA crashes a few times a week which is not good.
I finally found a log file home-assistant.log.1
end it ends like this.

2021-09-20 19:06:35 WARNING (SyncWorker_3) [homeassistant.components.rpi_power.binary_sensor] Under-voltage was detected. Consider getting a uninterruptible power supply for your Raspberry Pi.
2021-09-20 19:07:35 WARNING (SyncWorker_4) [homeassistant.components.rpi_power.binary_sensor] Under-voltage was detected. Consider getting a uninterruptible power supply for your Raspberry Pi.
2021-09-20 19:12:35 WARNING (SyncWorker_1) [homeassistant.components.rpi_power.binary_sensor] Under-voltage was detected. Consider getting a uninterruptible power supply for your Raspberry Pi.
2021-09-20 19:13:14 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [1746431136] Client unable to keep up with pending messages. Stayed over 512 for 5 seconds
2021-09-20 19:13:20 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [1746431136] Client unable to keep up with pending messages. Stayed over 512 for 5 seconds
2021-09-20 19:16:21 DEBUG (Thread-3) [arris_dcx960.arrisdcx960] Disconnected from mqtt client: 1
2021-09-20 19:17:23 WARNING (Thread-4) [homeassistant.components.mqtt] Disconnected from MQTT server core-mosquitto:1883 (1)

after this there is nothing, but HA becomes totally unresponsive.
the undervoltage messages are normal and shouldn’t be an issue.
I have the feeling the issue arised when I configured duckdns and opened port 8123.
But I’m not fully sure about this.

this message stands out in the error, but can’t figure what those messages should be

Client unable to keep up with pending messages. Stayed over 512 for 5 seconds

Is there some way I can find where those pending messages come from or what they mean?
or another solution of course is also welcome.

tom_l · September 20, 2021, 10:44pm

Incorrect. They are not normal and are the cause of your system instability. Get a better power supply.

bkbartk · September 21, 2021, 6:50am

Ok,
I already replaced my charger with this one,

since then I have far less of these messages, but I still have.
what powersupply should I use for an Rpi3?
would the official one suffice?
or are there better once?

tom_l · September 21, 2021, 7:15am

That should be sufficient however as it is still generating low volts errors, have you tried replacing the USB power cable?

bkbartk · September 21, 2021, 8:41am

thank you,
yes I replaced everything,

but I just ordered the official rpi power supply

so hopefully that will do the trick.

bkbartk · September 24, 2021, 5:40pm

I installed a new adapter and now the undervoltage messages are gone.
I don’t have any issues with this at all.

I also applied a heatsink.
but what I do notice is a sudden temperature spike and the system becoming slow or unavailable

the log shows some time outs

2021-09-24 19:22:53 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /core/stats request
2021-09-24 19:22:53 ERROR (MainThread) [homeassistant.components.hassio] Failed to to call /core/stats -
2021-09-24 19:22:53 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /supervisor/stats request
2021-09-24 19:22:53 ERROR (MainThread) [homeassistant.components.hassio] Failed to to call /supervisor/stats -
2021-09-24 19:25:53 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /supervisor/stats request
2021-09-24 19:25:53 ERROR (MainThread) [homeassistant.components.hassio] Failed to to call /supervisor/stats -
2021-09-24 19:25:53 ERROR (MainThread) [homeassistant.components.hassio.handler] Timeout on /core/stats request
2021-09-24 19:25:53 ERROR (MainThread) [homeassistant.components.hassio] Failed to to call /core/stats -

but only for the requests which did come through.
It seems like the system is really busy doing something, But I don’t have a clue what it is.
Is there some way I can find what causes those issues?

supervisor and observer are both unresponsive.

bkbartk · September 25, 2021, 7:11pm

I still want to know what causes my cpu to spike,
so I created a shell command like this.

shell_command:
  log_cpu_usage: top -b -n1 > top_"`date +"%Y-%m-%d %H:%m:%S"`".txt

a sensor like this

sensor:
- platform: systemmonitor
  resources:
    - type: processor_use
    - type: processor_temperature

And a trigger like this

- id: '1632596313135'
  alias: Log CPU percentage
  description: ''
  trigger:
  - platform: numeric_state
    entity_id: sensor.processor_temperature
    for:
      hours: 0
      minutes: 1
      seconds: 0
      milliseconds: 0
    above: '55'
  condition: []
  action:
  - service: shell_command.log_cpu_usage
  mode: single

files are created in the config directory.
So I hope I can log the process which causes the cpu to spike

bkbartk · September 28, 2021, 1:24pm

Ik final comment I hope,

the top command didn’t work.
So I tried glances and portainer,
but that made the issue some what worse.

it seems like a memory issue after all.
so I changed
gpu_mem=16
in config.txt
which gave me 48mb extra.
and I followed this instruction to increase the swap file

so fingers crossed.