Shelly Gen4 devices not connecting after HA restart or network restart

Hello,

I’ve issues with some Shelly Gen4 devices after restarting HA or my network. I’m using the official shelly integration, so this topic does not resolve my issue.

First of all: I’m using an Omada Network with SDN which is also integrated into HA. This comes in handy for the workaround I’m currently using. I can see, if a device is available to the network (which it is in 99,9% of all cases, when not connected to HA).

I’m using the entity “signal strength” to detect, if a shelly device is not available to ha. The rest of the devices entities are also marked as not available in this case. In this scenario the shelly remains online to the network and the web interface can be reached by using its ip address in the webbrowser.

I waited for hours to see, if the connection will be established again, but sadly nothing happens. I want to unterstand in the first place: Why is HA not connected to my shelly gen4 devices? I have some old gen1 devices in the same network using CoIoT for the connection without any issues in the same scenario.

A friend does have the exact same issue with his devices sometimes (he uses Gen3 devices in the first place).

The issue can be manually solved by hitting the “reconnect” button in HA. It tooks around 5 seconds and the device is back in action again. So I’m wondering why HA does not try a reconnect, when the device is not available.

I created a little helper template to “see”, if a device is not available:

{% set my_pattern = '^sensor\..*_signalstarke$' %}
{% set reconnect_suffix = '_reconnect' %}
{% set cooldown_minutes = 5 %}

{% set sensors = states.sensor 
  | selectattr('entity_id', 'search', my_pattern)
  | selectattr('state', 'in', ['unavailable', 'unknown', 'none'])
  | list %}

{% set ns = namespace(connected=true) %}
{% for s in sensors %}
  {% set dev_id = device_id(s.entity_id) %}
  {% set device_entities = device_entities(dev_id) %}
  {% set has_tracker = device_entities | select('search', '^device_tracker\.') | list | count > 0 %}
  {% set button_id = device_entities | select('search', 'button\..*' ~ reconnect_suffix ~ '$') | first | default(none) %}
  
  {% if has_tracker and button_id is not none %}
    {% set last_triggered = state_attr(button_id, 'last_triggered') %}
    {% set is_cooled_down = last_triggered is none or (now() - last_triggered).total_seconds() > (cooldown_minutes * 60) %}
    {% if is_cooled_down %}
      {% set ns.connected = false %}
    {% endif %}
  {% endif %}
{% endfor %}

{{ ns.connected }}

It is basically looking if the are unconnected devices, where it can “hit” the reconnect button which is not done in the cooldown_minutes timeframe. I also have a script for the automatic reconnect:

alias: Shelly-Automatischer-Reconnect
icon: mdi:wifi-sync
sequence:
  - repeat:
      for_each: |
        {% set my_pattern = '^sensor\..*_signalstarke$' %}
        {% set reconnect_suffix = '_reconnect' %}
        {% set cooldown_minutes = 5 %}

        {% set sensors = states.sensor 
          | selectattr('entity_id', 'search', my_pattern)
          | selectattr('state', 'in', ['unavailable', 'unknown', 'none'])
          | list %}

        {% set ns = namespace(results=[]) %} {% for s in sensors %}
          {% set dev_id = device_id(s.entity_id) %}
          {% set device_entities = device_entities(dev_id) %}
          {% set has_tracker = device_entities | select('search', '^device_tracker\.') | list | count > 0 %}
          {% set button_id = device_entities | select('search', 'button\..*' ~ reconnect_suffix ~ '$') | first | default(none) %}
          
          {% if has_tracker and button_id is not none %}
            {% set last_triggered = state_attr(button_id, 'last_triggered') %}
            {% set is_cooled_down = last_triggered is none or (now() - last_triggered).total_seconds() > (cooldown_minutes * 60) %}
            
            {% if is_cooled_down %}
              {% set ns.results = ns.results + [button_id] %}
            {% endif %}
          {% endif %}
        {% endfor %}

        {{ ns.results }}
      sequence:
        - target:
            entity_id: "{{ repeat.item }}"
          action: button.press
          data: {}
        - delay: "00:00:01"
description: ""

But I do not understand why this is necessary. Is it a bug in the shelly integration? It’s all up to date with the current stable state as of this post.

What are the errors in home-assistant.log?

I cannot find any log file entry regarding a shelly device. The only logs I can find are those who occur, when a shelly is disconnected (by power loss). I’m currently installing a new device. I’m not sure if there is another log option other than “Settings” → “Protocol”, other than the debug?

I just activated shelly debug log to have more data the next time it occurs. I will force the problem by restarting my PoE Switch for my WLAN APs, but this will take some time.

action: logger.set_level
data:
  aioshelly: debug
  homeassistant.components.shelly: debug

Debug log file is a mess: I have to many devices, there are ~30 messages per second. Is it possible to limit the log file output to specific devices somehow? Or is there any log file that might help without debug output?

I’m also not sure, if there are any identifiers in this log file which need to be protected to the public. Is it save to upload?