Bluetooth raspberry pi 3 fails after some time and does not recover

Hi,

Using a raspberry pi 3+ with HA OS. I wanted to start using bluetooth as a device tracker, but it seems to fail after some time. To recover I delete the integration, so a new one is made and everythin works for a while (some hours most of the time).

the log show the following:
homeassistant.components.bluetooth.scanner.ScannerStartError: hci0 (B8:27:EB:D5:72:69): Failed to start Bluetooth: [org.bluez.Error.NotReady] Resource Not Ready
2023-01-05 19:35:53.998 INFO (MainThread) [homeassistant.components.bluetooth.scanner] hci0 (B8:27:EB:D5:72:69): Bluetooth scanner has gone quiet for 90s, restarting
2023-01-05 19:35:59.017 WARNING (MainThread) [bluetooth_auto_recovery.recover] Bluetooth adapter hci0 [B8:27:EB:D5:72:69] could not be reset due to timeout
2023-01-05 19:36:04.044 WARNING (MainThread) [bluetooth_auto_recovery.recover] Bluetooth adapter hci0 [B8:27:EB:D5:72:69] could not be reset due to timeout
2023-01-05 19:36:04.058 ERROR (MainThread) [homeassistant.components.bluetooth.scanner] hci0 (B8:27:EB:D5:72:69): Failed to restart Bluetooth scanner: hci0 (B8:27:EB:D5:72:69): Failed to start Bluetooth: [org.bluez.Error.NotReady] Resource Not Ready
Traceback (most recent call last):
File “/usr/src/homeassistant/homeassistant/components/bluetooth/scanner.py”, line 229, in _async_start
await self.scanner.start() # type: ignore[no-untyped-call]
File “/usr/local/lib/python3.10/site-packages/bleak/init.py”, line 156, in start
await self._backend.start()
File “/usr/local/lib/python3.10/site-packages/bleak/backends/bluezdbus/scanner.py”, line 191, in start
self._stop = await manager.active_scan(
File “/usr/local/lib/python3.10/site-packages/bleak/backends/bluezdbus/manager.py”, line 357, in active_scan
assert_reply(reply)
File “/usr/local/lib/python3.10/site-packages/bleak/backends/bluezdbus/utils.py”, line 20, in assert_reply
raise BleakDBusError(reply.error_name, reply.body)
bleak.exc.BleakDBusError: [org.bluez.Error.NotReady] Resource Not Ready

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/usr/src/homeassistant/homeassistant/components/bluetooth/scanner.py”, line 364, in _async_scanner_watchdog
await self._async_start()
File “/usr/src/homeassistant/homeassistant/components/bluetooth/scanner.py”, line 310, in _async_start
raise ScannerStartError(

Not sure what the problem is, hopefully someone here does.

Thanks

This topic was discussed a lot already.
I’m facing the same issue as you.
I just did a try to something but not yet got the result of it, let me explain:

I did an automation that detect that a sensor is unavailable (bluetooth is probably dead) for more than 5 minutes.

Simple trigger:

trigger:
  - platform: state
    entity_id:
      - sensor.xiaomi_temperature
    to: unknown
    for:
      hours: 0
      minutes: 5
      seconds: 0
  - platform: state
    entity_id:
      - sensor.xiaomi_temperature
    to: unavailable
    for:
      hours: 0
      minutes: 5
      seconds: 0

If this happens, I’m running a shell_script that I did and that seems to fix the issue without touching the integration when run in a shell window:

shell_command:
  restart_bluetooth: >
    /bin/hciconfig hci0 down
    sudo /sbin/rmmod btusb
    sudo /sbin/modprobe btusb
    /bin/hciconfig hci0 up

For this to run under the home assistant user, I had to do the following in SSH to allow it

sudo setcap 'cap_net_raw,cap_net_admin+eip' `which hcitool`
sudo setcap 'cap_net_raw,cap_net_admin+eip' `which hciconfig`

And give the right to do sudo for /sbin/rmmod and /sbin/modprobe

I’m not saying that this will solve the issue. Moreover, I think I might need to do some additional command like

sudo systemctl restart bluetooth
sudo systemctl restart hciuart

All this is only valid in Core, I don’t know how to transpose that in Docker or HassOS.
I hope that one of the 2 will happen:

  1. Bluetooth will remain stable
  2. My automation will do the necessary to recover

I’ll let you know, probably tomorrow as it never last more than 6-7 hours without that

Bluetooth adapter hci0 [xx:yy:zz:aa:bb:cc] could not be reset due to timeout
hci0 (xx:yy:zz:aa:bb:cc: Failed to restart Bluetooth scanner: hci0 (xx:yy:zz:aa:bb:cc): Failed to start Bluetooth: [org.bluez.Error.NotReady] Resource Not Ready

First test is a failure, none of my Bluetooth devices is going to unknown/unavailable and Bluetooth was down for a long time.
I need another condition to trigger the reset.

Bluetooth management socket connection lost: [Errno 9] Bad file descriptor
10:22:50 AM – (WARNING) runner.py - message first occurred at January 5, 2023 at 9:32:20 PM and shows up 386 times

Bluetooth adapter hci0 [xx:yy:zz:aa:bb:cc] could not be reset due to timeout
10:22:50 AM – (WARNING) components/bluetooth/util.py - message first occurred at January 5, 2023 at 9:32:20 PM and shows up 386 times

hci0 (xx:yy:zz:aa:bb:cc): Error stopping scanner: [org.bluez.Error.InProgress] Operation already in progress
10:22:45 AM – (ERROR) bluetooth - message first occurred at January 5, 2023 at 9:30:17 PM and shows up 387 times

What I have tried, was to trigger an update of the tracker when the status changed to away (which was a possible indication that the device failed) via device_tracker.see. But the result was that the tracker reported me at home, even when I was away.
Maybe that is an optionn to trigger your script?

Could be but, unfortunately for me, my BT devices are behaving exactly as normal and there is no other sensor that could prove them wrong and could be the indication that something is wrong.

I’m trying to create a command_line sensor as it seems that

hciconfig -a

is successful when BT is running fine (return 0), giving an output like so:

hci0:   Type: Primary  Bus: UART
        BD Address: xx:yy:zz:aa:bb:cc  ACL MTU: 1021:8  SCO MTU: 64:1
        UP RUNNING PSCAN
        RX bytes:20582 acl:0 sco:0 events:846 errors:0
        TX bytes:5104 acl:0 sco:0 commands:229 errors:0
        Features: 0xbf 0xfe 0xcf 0xfe 0xdb 0xff 0x7b 0x87
        Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3
        Link policy: RSWITCH SNIFF
        Link mode: PERIPHERAL ACCEPT
        Name: 'HomeAssistant'
        Class: 0x6c0000
        Service Classes: Rendering, Capturing, Audio, Telephony
        Device Class: Miscellaneous,
        HCI Version: 4.1 (0x7)  Revision: 0x0
        LMP Version: 4.1 (0x7)  Subversion: 0x2209
        Manufacturer: Broadcom Corporation (15)

but is failing (return 1) when we are in the failure state, with an output like so:

hci0:   Type: Primary  Bus: UART
        BD Address: xx:yy:zz:aa:bb:cc  ACL MTU: 1021:8  SCO MTU: 64:1
        UP RUNNING PSCAN
        RX bytes:1907142 acl:148 sco:0 events:70846 errors:0
        TX bytes:99284 acl:136 sco:0 commands:10394 errors:0
        Features: 0xbf 0xfe 0xcf 0xfe 0xdb 0xff 0x7b 0x87
        Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3
        Link policy: RSWITCH SNIFF
        Link mode: PERIPHERAL ACCEPT
Can't read local name on hci0: Connection timed out (110)

So it will be possible to detect that it is time to reset.

Automation
alias: Watchdog timer
description: ""
trigger:
  - platform: time_pattern
    minutes: /50
condition: []
action:
  - service: script.watchdog
    data:
      entity_id: sensor.digoo_temp
      time: 4800
mode: restart
Script
alias: Entity Watchdog
sequence:
  - service: persistent_notification.dismiss
    data_template:
      notification_id: Watchdog:{{entity_id}}
  - condition: template
    value_template: >-
      {{ ( now()|as_timestamp() -
      states[entity_id].last_updated|default(0,true)|as_timestamp()|default(0,true)
      ) > time }}
  - service: persistent_notification.create
    data_template:
      title: "Watchdog: {{ entity_id }}"
      message: >
        {%- if states[entity_id].last_updated %} Entity {{ entity_id }} has not
        been updated in {{ ( now()|as_timestamp() -
        states[entity_id].last_updated|default(0,true)|as_timestamp()|default(0,true)
        )|int }} seconds which is longer than the max allowed of {{ time }}. {%-
        else %} Entity {{ entity_id }} does not seem to exist. {%- endif %}
      notification_id: Watchdog:{{entity_id}}
  - service: switch.turn_on
    target:
      entity_id:
        - switch.sys_restart_gateway
    data: {}
mode: single

What about a watchdog automation and script to detect if sensor value hasn’t changed in a while like in the example above.

I thought of this for myself, as my temperature will obviously change constantly.
But that won’t solve the problem for a more “static” BT device, such as presence detection, which might be unchanged for a (very) long time.

But in my case, yes, constant temperature is a good sign :slight_smile:

EDIT: Do you think that the “last_update” will change for such a device? I don’t think so but I don’t have such a device. Maybe @spijkers09 can tell?

Do all devices not quit at once? You would only need to know when 1 stops to know the trackers also aren’t working.

Yes, they do.
But I’m trying to find a solution suitable for everyone (if any).

So I implemented the watchdog now. It is not doing anything but the notification.
It might probably solve my case.

But I’m the lucky one (for once) as I have a control sensor to trigger the reset, but maybe someone will not have such a device and would like to use the command_line sensor, maybe?

I’m running the 2 and will see if it is a coherent outcome between the 2 methods.

I ran into another issue I had to fix first. I had a ‘double’ device_tracker, 1 BT and 1 GPS (no idea where that came from, probably to much fxxing around).

I have 1 issue that does not make sense to me. In the terminal, I ran bluetoothctl devices and I see a number of devices, but my mobile is not one of them. Even though it is in known_devices.

bluetoothissue1

Even when the device tracker for my mobile has failed, the rest still appears functional if I use the bluetoothctl command

bluetoothissue2

So not sure what is wrong here and no idea how to further investigate

Since when I’m doing a hciconfig -a every minute, it did not fail, in 11 hours straight.
Weird. Too early to correlate though.

Ok, time for an update, both methods detected the failure.
The error occurred at 6:49, detected by hciconfig -a | grep Manufacturer at 6:51 and by the watchdog at 6:55.

This was the good news.

The bad news is that my script did not recover fully the Bluetooth correctly.
As a result, the graph of my temperature is better, as the sensor is unavailable but as it did not recover, it is missing for a long time.

So I’ll work on the script part and find a solution!

EDIT: I’ve a new script, I’ll put it to the test now.

Ok, I am at the point of giving up. I cannot use the hciconfig (as it is not installed on RPi 3).
I see a failure (in the log) and something that looks like recovery and using bluetoothctl, it looks like BT is seeing a number of devices.
But using a BLE scanner (on my mobile), I see mostly the same devices, but no RPi 3 (even after bluetoothctl discoverable on) and RPi3 does not see my device.
As my mobile is not one of the devices as shown using bluetoothctl, the debug logs do not indicate if/how/why it is no longer tracked.

So I will try to use the bluetooth LE tracker and see what happens, but I doubt if that will make a difference

I have it on mine.
Just install the package bluez with apt or apt-get

It is seing my iPhone but I had to go in the Bluetooth settings to scan for new devices before it was in the list of the bluetoothctl scan on.but all this is lost after a while anyway. Even if I did a trust on my phone’s bt address.

To restart my bluetooth on RPI3b+ I set up a shell command.

shell_command:
  bluetoothctl_off: bluetoothctl power off
  bluetoothctl_on: bluetoothctl power on
  
alias: Bluetoothctl Power Restart
sequence:
  - service: shell_command.bluetoothctl_off
    data: {}
  - delay:
      hours: 0
      minutes: 0
      seconds: 5
      milliseconds: 0
  - service: shell_command.bluetoothctl_on
    data: {}
mode: single 

Then run it from this script

Interesting approach. what do you use as a trigger for the script?

@Spiro, I think I tried that but I always had my logs flooded with these kinds of lines:

Bluetooth management socket connection lost: [Errno 9] Bad file descriptor
10:22:50 AM – (WARNING) runner.py - message first occurred at January 5, 2023 at 9:32:20 PM and shows up 386 times

Bluetooth adapter hci0 [xx:yy:zz:aa:bb:cc] could not be reset due to timeout
10:22:50 AM – (WARNING) components/bluetooth/util.py - message first occurred at January 5, 2023 at 9:32:20 PM and shows up 386 times

hci0 (xx:yy:zz:aa:bb:cc): Error stopping scanner: [org.bluez.Error.InProgress] Operation already in progress
10:22:45 AM – (ERROR) bluetooth - message first occurred at January 5, 2023 at 9:30:17 PM and shows up 387 times

Right now, my script is the following:

shell_command:
  restart_bluetooth: >
    sudo /bin/hciconfig hci0 down
    sudo systemctl stop bluetooth
    sudo systemctl stop hciuart
    sudo /sbin/rmmod btusb
    sudo /sbin/modprobe btusb
    sudo systemctl start hciuart 2>/dev/null
    sudo systemctl start bluetooth
    sleep 5
    sudo /bin/hciconfig hci0 up

I did it manually in a shell command prompt. It seems to work well. I hope it will be ok from HA, it should be. Fingers crossed.

I’ll try your bluetoothctl power off/on next time, I’m disabling my script for now and send myself a notification to try it.

As simple trigger for now, you can do a time pattern trigger, like every hour.

trigger:
  - platform: time_pattern
    hours: /1

There will always be time later to refine the trigger

To be honest my bluetooth rarely goes down. If it were a problem I’d add it to a script to be run by the watch man automation above. Occasionally I run it . Usually when I have been messing with bluetooth devices.