HAOS - Proxmox - Frigate - Coral not working

ant-thomas · June 24, 2023, 2:46pm

Hello,

Trying to get a Coral TPU working with Frigate under HAOS running via a Proxmox VM.
System
Dell 7040
Core i5 6500
40 GB RAM
Home Assistant 2023.5.2
Supervisor 2023.06.2
Operating System 10.1
Frontend 20230503.3

Using dual edge TPU on an M.2 to PCI-E adapter - I’m aware only one TPU will show/work, this isn’t an issue at the moment.
PCI Passthrough should be setup and working, but this could be the issue.

The device is passed through to HAOS and /dev/apex_0 exists under HAOS

Frigate sees the Coral device as a detector under the config part of the UI but gives errors about no EdgeTPU being detected in the logs

Unfortunately Frigate crashes after a minute or so with the following in the Frigate log

2023-06-24 14:38:36.468180251  [2023-06-24 15:38:36] detector.coral                 INFO    : Starting detection process: 227
2023-06-24 14:38:36.468183620  [2023-06-24 15:38:36] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as pci
2023-06-24 14:38:36.469141208  [2023-06-24 15:38:36] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
2023-06-24 14:38:36.470747302  Process detector:coral:
2023-06-24 14:38:36.470750193  Traceback (most recent call last):
2023-06-24 14:38:36.470751572    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
2023-06-24 14:38:36.470752697      delegate = Delegate(library, options)
2023-06-24 14:38:36.470755531    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
2023-06-24 14:38:36.470770200      raise ValueError(capture.message)
2023-06-24 14:38:36.470771530  ValueError
2023-06-24 14:38:36.470772502  
2023-06-24 14:38:36.470774169  During handling of the above exception, another exception occurred:
2023-06-24 14:38:36.470775164  
2023-06-24 14:38:36.470776174  Traceback (most recent call last):
2023-06-24 14:38:36.470777370    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2023-06-24 14:38:36.470796621      self.run()
2023-06-24 14:38:36.470797941    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2023-06-24 14:38:36.470799111      self._target(*self._args, **self._kwargs)
2023-06-24 14:38:36.470800243    File "/opt/frigate/frigate/object_detection.py", line 98, in run_detector
2023-06-24 14:38:36.470814185      object_detector = LocalObjectDetector(detector_config=detector_config)
2023-06-24 14:38:36.470815412    File "/opt/frigate/frigate/object_detection.py", line 52, in __init__
2023-06-24 14:38:36.470816584      self.detect_api = create_detector(detector_config)
2023-06-24 14:38:36.470817770    File "/opt/frigate/frigate/detectors/__init__.py", line 24, in create_detector
2023-06-24 14:38:36.470818811      return api(detector_config)
2023-06-24 14:38:36.470819972    File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 37, in __init__
2023-06-24 14:38:36.470821148      edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
2023-06-24 14:38:36.470834973    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
2023-06-24 14:38:36.470836198      raise ValueError('Failed to load delegate from {}\n{}'.format(
2023-06-24 14:38:36.470837290  ValueError: Failed to load delegate from libedgetpu.so.1.0

There is also the following under dmesg

RAM did not enable within timeout
Error in device open cb: -110

Any ideas or suggestions to fix this?
Feel like I’m nearly there since it is seen in HAOS under /dev

vlar · August 5, 2023, 6:47pm

I see your post being recent so I’m bumping it to see whether you had solved your issue.
I’m facing the same matter:

2023-08-05 18:46:08] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as pci
2023-08-05 18:46:08.117291167  Process detector:coral:
2023-08-05 18:46:08.117648925  [2023-08-05 18:46:08] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.

My LXC config file:

unprivileged: 1
lxc.mount.entry: /dev/apex_0 dev/apex_0 none bind,optional,create=file 0, 0
lxc.cgroup2.devices.allow: c 120:* rwm
lxc.apparmor.profile: unconfined
lxc.cgroup2.devices.allow: a
lxc.cap.drop:
lxc.mount.auto: cgroup:rw
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.cgroup2.devices.allow: c 189:* rwm

 ls /dev/apex_0 
/dev/apex_0

Mine is a PCIexpress with adaptator as well. Any luck on your end?

ant-thomas · August 6, 2023, 2:58pm

Yes, it is now working.

Problem is I left it for a couple of weeks as I didn’t have any more time to try and fix it.
When I did have some time I found it had sorted itself.

Unsure if it was some restarts or updates which prompted it to start working, but it is working well now.
Currently have 3 cameras that is is running detection on without issues.

vlar · August 7, 2023, 6:26pm

Thanks for your feedback.
I also spent more time and was able to figure it out.
I’m posting an update here so t may help others.
The issue came from the permission of the /dev/apex_0 file which was in the nobody user. I mapped the proxmox root to the LXC one and it worked. (not the most secured way, but the only one I could do)
Here is the LXC file:

lxc.mount.entry: /dev/apex_0 dev/apex_0 none bind,optional,create=file 0, 0
lxc.cgroup2.devices.allow: c 120:* rwm
lxc.apparmor.profile: unconfined
lxc.cgroup2.devices.allow: a
lxc.mount.auto: cgroup:rw
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup.devices.allow: c 243:* rwm
lxc.idmap: u 0 0 1
lxc.idmap: g 0 0 1
lxc.idmap: u 1 100000 65536
lxc.idmap: g 1 100000 65536