Nest Integration causing HA lockup/freeze during setup

Hello,

On a fresh install, I am having to switch to the Desktop App SDM method, from the older Web App method (I believe it was called) I used to use. During setup, I get all the way to Enter Project ID, which responds with “An unknown error occurred”, then my entire HA instance stops responding - I end up having to restart the core container.

Unfortunately because of this, the log also stops rolling. The last line before HA stops responding indicates that authentication was successful (I have logger set to debug for all Nest/SDM/OAuth components and helpers).

Can anyone please help with this given the little information I have; or confirm that they too are having this issue?

Documentation Followed
Home Assistant Supervised, Debian 11 on Proxmox KVM

Thank you,

This is not a known issue. I am not familiar if home assistant supervised has anythin to do with it, but it may be worth seeing what versions of google-nest-sdm and grpcio are installed via pip3 freeze as a debugging hint.

You don’t get any log messages for the "An unknown error occurred”? That is surprising.

Hey, thanks for responding! I show:
google-nest-sdm==0.4.9
grpcio==1.31.0

I actually got a little further on the latest attempt. Last related log after Successfully Authenticated is:
[homeassistant.components.nest.config_flow] Creating subscriber id …

I also see another error I hadn’t seen before (after HA becomes inaccessible), but don’t know if it’s at all related:

ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/aiohttp/http_websocket.py", line 676, in ping
    await self._send_frame(message, WSMsgType.PING)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/http_websocket.py", line 653, in _send_frame
    self._write(header + message)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/http_websocket.py", line 663, in _write
    raise ConnectionResetError("Cannot write to closing transport")
ConnectionResetError: Cannot write to closing transport

as well as

[homeassistant.components.websocket_api.http.connection] [140455236334304] Connection closed by client

I assumed the latter error has to do with my web interface connection closing, but not sure.

I will set aiohttp to debug and test again.

Edit:

No aiohttp error this time. Also, I’ve realized Nest (with working config) is what broke my previous HA install - which is the whole reason I started fresh. Once I disabled Nest on the old, it stopped “locking up” 2 minutes into startup like it was before.

Hi, this definitely sounds like a huge issue so lets try to get to the bottom of this if you have time to collect more data.

There is one known issue where grpcio installation can slow down the system, when its compiling from source code rather than a wheel. And this would show that its installing the package:

logger:
    default: info
    logs:
        homeassistant.util.package: debug

That should log some messages about installing packages, or you could see this if you issue a “ps -x” and see a pip install command running for google-nest-sdm. Nest Integration is failing API and killing HA for 10mins · Issue #59516 · home-assistant/core · GitHub is tracking it

If it’s this, we may be able to get you a temporary workaround until we fix the root cause

However, if its saying “Creating subscriber id” then I would assume it is past that part already… Especially since you already have grpcio and google-nest-sdm installed.

There is also discussion in API: 500: Internal server error with Nest Integration · Issue #63054 · home-assistant/core · GitHub that sounds likely similar.

Edit: Actually, not sure the original issue is the same, but the second report (which sounds unrelated may be)

Greatly appreciate you looking into it, thanks!

I see absolutely nothing related in the logs after the subcriber ID part, aside from another of the websocket errors (not related?). However here is my docker exec -it homeassistant ps ax:

    1 root      0:00 foreground  backtick  -D  3000  -n  S6_SERVICES_GRACETIME
   40 root      0:00 [s6-supervise]
  189 root      0:00 s6-supervise home-assistant
  192 root      0:49 python3 -m homeassistant --config /config
  387 root      0:00 [s6-ls]
  388 root      0:00 [s6-sort]
  390 root      0:00 foreground  s6-svwait  -D  -t  220000  /var/run/s6/service
  397 root      0:00 s6-svwait -D -t 220000 /var/run/s6/services/home-assistant
  398 root      0:00 s6-ftrigrd
  405 root      0:00 ps ax

docker exec -it homeassistant ps -x returns as not an option

But some anecdotal things that might be of use:

  • Sometimes the HA container freezes until I restart it, but sometimes it comes back on it’s own for awhile. It stopped logging and immediately became unresponsive, after “Unknown error occurred”, for almost two minutes, then started logging again and was accessible. Then it froze again shortly after; eventually the container automatically restarted.
    Supervisor log:
21-12-30 18:36:37 ERROR (MainThread) [supervisor.homeassistant.api] Error on call http://172.30.32.1:8123/api/config:
21-12-30 18:36:37 WARNING (MainThread) [supervisor.misc.tasks] Watchdog miss API response from Home Assistant
21-12-30 18:39:08 ERROR (MainThread) [supervisor.homeassistant.api] Error on call http://172.30.32.1:8123/api/config:
21-12-30 18:39:08 ERROR (MainThread) [supervisor.misc.tasks] Watchdog found a problem with Home Assistant API!
21-12-30 18:39:08 INFO (SyncWorker_1) [supervisor.docker.interface] Restarting ghcr.io/home-assistant/qemux86-64-homeassistant
  • Sometimes the logs will continue rolling in while HA is unresponsive, and other times logging will freeze as well

Are you running 2021.12?

How about also adding these additional logger debug entries:

logger:
  default: info
  logs:
    homeassistant.components.nest: debug
    homeassistant.helpers.config_entry_flow: debug
    homeassistant.helpers.config_entry_oauth2_flow: debug
    google_nest_sdm: debug
    google_nest_sdm.auth: debug
    google_nest_sdm.device: debug
    google_nest_sdm.google_nest_subscriber: debug

This may give us some additional visibility into things

I am, I’m on 2021.12.7, and here are the logger entries (some migrated from troubleshooting the old method):

default: info
logs:
  aiohttp: debug
  homeassistant.util.package: debug
  homeassistant.components.amcrest: debug
  homeassistant.helpers.config_entry_flow: debug
  homeassistant.helpers.config_entry_oauth2_flow: debug
  homeassistant.components.nest: debug
  homeassistant.components.nest.config_flow: debug
  homeassistant.components.nest.climate_sdm: debug
  homeassistant.components.nest.camera_sdm: debug
  homeassistant.components.nest.sensor_sdm: debug
  google_nest_sdm: debug
  google_nest_sdm.device: debug
  google_nest_sdm.device_manager: debug
  google_nest_sdm.google_nest_subscriber: debug
  google_nest_sdm.event: debug

I’m sorry, I know it’s frustrating without logs. Maybe there is another component to set at debug?

Ok, maybe also include the auth one that is missing rha robe logs outgoing commands.

There may be some additional pubsub packages we can include…

Maybe a pubsub subscriber thread is getting locked up and there isn’t a timeout.

Ah, sorry, I overlooked that one. Including the auth one does not produce any additional logs, unfortunately. I’m really not sure why I’m not getting anything beyond Subscriber id - baffling.

Edit: for anyone else having this issue, I’m currently working around it by using HomeBridge to facilitate the connection to Nest.

Two other thoughts:

  • I’ve heard that enabling debugpy: can help us find functions that are taking a long time (Remote Python Debugger - Home Assistant)
  • What about turning the default to debug if that doesn’t help

My best guess is maybe the pubsub client worker thread is wedged, and maybe the event thread gets blocked, taking down home assistant. I can definitely look into adding timeouts if that is the case, in addition to fixing the root cause. Having some stack trace of what is “stuck” could help.

What is odd is usually it would throw an exception would that happens. I would think it is in homeassistant.components.websocket_api or something from here core/connection.py at b17120a5113a424cab9c621225956f6c8ec67a87 · home-assistant/core · GitHub

I’m also wondering if anything about pubsub or subscriber anywhere in the log before this…

Thank you for working on this, definitely important to get this resolved.

Can you check the version of google-cloud-pubsub also?

I am curious if more recent versions like 2.4.1 or higher have this problem.

As of Unable to upgrade grpcio to newer versions due to armv7 link error · Issue #56669 · home-assistant/core · GitHub I am optimistic this issue may be fixed (right now in dev builds), but it would be really helpful to work on confirmation in case if not then i can continue to find a solution for the next release.

Do you have time to do additional testing with me?

Hi there, I’m very sorry for the extended time away from this.

I have absolutely no clue what did it, but upon revisiting this and testing with full debug, it worked flawlessly. The only thing I can really think of is some security updates on the host… not even an HA update.

I’m happy to try to break it for more testing, but I’m shocked and happy it’s working (all of a sudden).

google-cloud-pubsub==2.1.0

Here are some relevant logs I could find:

2022-01-13 19:15:41 DEBUG (ThreadPoolExecutor-1_0) [google_nest_sdm.google_nest_subscriber] Existing subscription not found; Creating

2022-01-13 19:15:41 DEBUG (ThreadPoolExecutor-1_0) [google_nest_sdm.google_nest_subscriber] Creating subscription: {'name': 'projects/[...]/subscriptions/home-assistant-[...]', 'topic': 'projects/sdm-prod/topics/enterprise-[...]', 'message_retention_duration': seconds: 900
}


2022-01-13 19:15:43 DEBUG (ThreadPoolExecutor-2_0) [google_nest_sdm.google_nest_subscriber] Subscriber 'projects/[...]/subscriptions/home-assistant-[...]' configured on topic 'projects/sdm-prod/topics/[...]'
2022-01-13 19:15:43 DEBUG (ThreadPoolExecutor-2_0) [google.cloud.pubsub_v1.subscriber._protocol.streaming_pull_manager] Creating a stream, default ACK deadline set to 10 seconds.
2022-01-13 19:15:43 DEBUG (ThreadPoolExecutor-2_0) [google.cloud.pubsub_v1.subscriber._protocol.dispatcher] Started helper thread Thread-CallbackRequestDispatcher
2022-01-13 19:15:43 DEBUG (ThreadPoolExecutor-2_0) [google.api_core.bidi] Started helper thread Thread-ConsumeBidirectionalStream
2022-01-13 19:15:43 DEBUG (Thread-LeaseMaintainer) [google.cloud.pubsub_v1.subscriber._protocol.leaser] The current deadline value is 10 seconds.
2022-01-13 19:15:43 DEBUG (Thread-LeaseMaintainer) [google.cloud.pubsub_v1.subscriber._protocol.leaser] Snoozing lease management for 0.842503 seconds.
2022-01-13 19:15:43 DEBUG (ThreadPoolExecutor-2_0) [google.cloud.pubsub_v1.subscriber._protocol.leaser] Started helper thread Thread-LeaseMaintainer
2022-01-13 19:15:43 DEBUG (ThreadPoolExecutor-2_0) [google.cloud.pubsub_v1.subscriber._protocol.heartbeater] Started helper thread Thread-Heartbeater
2022-01-13 19:15:43 DEBUG (Thread-Heartbeater) [google.cloud.pubsub_v1.subscriber._protocol.heartbeater] Sent heartbeat.
2022-01-13 19:15:43 DEBUG (MainThread) [google_nest_sdm.google_nest_subscriber] Starting background watchdog thread
2022-01-13 19:15:43 DEBUG (MainThread) [google_nest_sdm.auth] request[get]=https://smartdevicemanagement.googleapis.com/v1/enterprises/[...]/structures
2022-01-13 19:15:43 DEBUG (Thread-ConsumeBidirectionalStream) [google.api_core.bidi] waiting for recv.

2022-01-13 19:15:44 DEBUG (MainThread) [google_nest_sdm.auth] request[get]=https://smartdevicemanagement.googleapis.com/v1/enterprises/[...]/devices,
2022-01-13 19:15:44 DEBUG (Thread-LeaseMaintainer) [google.cloud.pubsub_v1.subscriber._protocol.leaser] Snoozing lease management for 6.476877 seconds.,
2022-01-13 19:15:44 DEBUG (Thread-LeaseMaintainer) [google.cloud.pubsub_v1.subscriber._protocol.leaser] The current deadline value is 10 seconds.

Thank you very much for your time, attention, and help on this!