Whisper: performances in self-hosted for French

So, I’m testing voice assistant, and I’m not impressed by the performance of whisper, especially compared to Rhasspy that ran on a RPI4 :wink:

At least for me, anything below “medium” is basically unusable. I cannot reliably make whisper understand “éteindre la lumière dans le bureau”, by instance.

Some examples of what is recognized:

  • “et tindre la lumière dans le bureau.”
  • “Est-à-dire la lumière dans le bureau.”
  • “Et t’as la lumière dans le bureau.”

I’m Belgian, but I don’t think I have that a thick accent :joy:

“medium-int8” is better, but then it takes around 15s to get an answer.
I tried on different CPU’s

  • Intel(R) Core™ i5-3470S CPU @ 2.90GHz
  • Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz
  • Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz

with marginal differences.

I would be interested to get feedback on the experience of others on that matter.
Thanks.

4 Likes

You’re not alone.
I’m French and this is what I get when I try to say “Allume la Guirlande Blanche” with tiny-int8 model :

  • Allume, la hierlande blanche.
  • A l’une, la guerre lande de blanche.
  • A l’une la hierlande blanche.
  • A lume à la Guirlande Blanche.
  • Allume la hierande blanche.
5 Likes

For what it’s worth. I can’t get it to recognise anything in Dutch either. And it takes 15 seconds plus on my i5 as well.

It’d be nice if I could train the instance by correcting what it heard or something.

1 Like

Why did you stop using Rhasspy, koying? I was just trying to get it running they say Rhasspy 3 supports Wyoming protocol.

I used Rhasspy 2 before synesthesiam basically said he was moving on because he needed money, then I lost interest.
Now that he works for Nabu Casa, I’m a bit confused as to why he would be doing twice the work, tbh, once for NC and once for Rhasspy.

For now, I’ll let the dust settle a bit, seeing which one will stick (I’m assuming having both is not sustainable long-term, here)

According to the docs faster-whisper (which is used by the addon at least) is at default configured to use 4 cpu cores.
That would explain the marginal differences between different CPUs.

There is the environment variable OMP_NUM_THREADS - but I’m not sure how to use it. I already tried to add the variable to the docker container of the addon. I tried to modify whisper/rootfs/etc/s6-overlay/s6-rc.d/whisper/run as an temporary try but did not succeed.

Thanks, but I read somewhere else that the number of CPU’s didn’t make a real difference, either

Same here, today I just received my m5stack atom echo and was all excited to test the voice recognition, but I’m very disappointed.
Whisper understands hardly anything in french, not even simple queries like “Allume la lumière du salon”. After multiple tries and trying to talk very slowly (like talking to a child), I managed to turn on a light, but it took more than 15 seconds to process and my server CPU usage is over the top.
As far as I’m concerned, it’s unusable right now :disappointed:

Same issues in italian. It understand words but there is always some wrong letter and the command is refused. Speed also is terrible.

Hi have a question for not Nabu casa users: when assist was introduced some month ago, STT worked very well out of the box without any configuration (i guess using external STT, maybe google??). Now with latest updates it is no longer working and we need to install Whisper. Is it possible to use online STT as before for non Nabu casa users?

Same here. I tried Dutch. It eventually gets funny in how hard Whisper is trying to not understand me.
“Zet boom-kamer lichten uit”. I got similar results on a iPhone, laptop and Atom Echo.

Piper in Dutch also has poor results, while Belgium is much better.

1 Like

Hi,

I’m French to and it not working at all, Whisper gives me errors in logs

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service whisper: starting
s6-rc: info: service whisper successfully started
s6-rc: info: service discovery: starting
INFO:__main__:Ready
[18:48:26] INFO: Successfully send discovery information to Home Assistant.
s6-rc: info: service discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
INFO:wyoming_faster_whisper.handler: Allume la cuisine !
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-22' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 45, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 26, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 114, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-23' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 45, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 26, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 114, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

Home Assistant Geneic x86-64 installed on Intel NUC

Tested on laptop and from mobile throught Home Assistant mobile app

I tried to configure Piper and Whister and local Assist to English instead of French but it doesn’t work anymore

Same issue here, worse in French but unusable in both languages without going mad :upside_down_face:

During normal usage, the CPU load is between 25-40%, but it rises up to 83% causing a slowdown when struggling to recognise whisper simple voice command (using tiny-utf8).

Maybe this is because of the RPI4 performance, I haven’t yet tested with a more powerful machine or GPU.

ConnectionResetError: Connection lost
INFO:wyoming_faster_whisper.handler: Turn off the light.
INFO:wyoming_faster_whisper.handler: Turn off the light.
INFO:wyoming_faster_whisper.handler: Turn off the butt.
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-65' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 45, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 26, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 114, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')

It would be cool to be able to train the model.

I’ve mixed both of two worlds :

  • One normal pipeline (openwakeword, whisper, piper) in a docker stack with base model that I use with proximity microphone (smartphone, ESP32+INMP) with the base model (a lot of trouble with “éteind” word but if I say “arrête” it’s ok but for other words I don’t have that much trouble) and it’s about 1 second of delay (LXC with 2 cores and 4 GB RAM on a i4490 CPU).
  • One Rhasspy docker with MQTT broker connected to Home Assistant for room microphones that manage it in less than 1 sec.

So depending of the mic I’ve adapted the system. If I only wanted one system it was with whisper small model that needed 7seconds to work.
I would like to have only one system to manage but as you said there is more improvement needed for a usable whisper in french.
I’ll wait and I am a bit looking around for whisper JAX.

1 Like

The best case for accuracy in French is the original Whisper implementation from OpenAI. Unfortunately best case, even with the large model and no quantization, French has a relatively high error rate.

Note this is with large. Small, medium, etc are going to be much worse. Even large with faster-whisper is going to be worse because the model is quantized which reduces the accuracy further.

Whisper regardless of framework, etc isn’t going to get better than the error rates OpenAI published, and again that’s for the large model which is completely unacceptably slow on any CPU.

I have a server with a GTX 970, how can i make the installation of the Whisper on that server integrate with Home Assistant? Can you help me with that? @kristiankielhofner

The GTX 970 is just one generation behind what’s typically used. The GTX 970 is Maxwell and most software support targets Pascal (1xxx) and beyond.

I don’t have any Maxwell devices around anymore but with Willow we have had users report using them successfully because our main branch for Willow Inference Server currently targets CUDA 11. That said we are moving to a hard requirement for CUDA 12 (Pascal and up).

I know there are some efforts in HA, etc to get faster-whisper running smoothly on GPU (it supports it natively so it’s not that hard). The good news for you is faster-whisper out of the box targets CUDA 11 which is supported on Maxwell. For HA faster-whisper it’s “just” a matter of getting the Nvidia drivers running and passing the GPU to the docker container.

I actually have a GTX960 hanging around, so that might be an option.
Question is: Is it worth it? Would it be like 10x faster or more?

I’ve spent hours to make my old GTX660 (Kepler) to run in a docker environment with Whisper but I finally stopped it because with all CUDA and CUDNN I’ve tried, the only CUDNN compatible with my old GPU was 7.6 and it seems that faster whisper docker with CUDA use library from CUDNN 8. So if you wanna get sure make the CUDNN test with you GPU at at least version 8 (CUDNN and CUDA11).

The GTX 970 is just one generation behind what’s typically used. The GTX 970 is Maxwell and most software support targets Pascal (1xxx) and beyond.

I don’t have any Maxwell devices around anymore but with Willow we have had users report using them successfully because our main branch for Willow Inference Server currently targets CUDA 11. That said we are moving to a hard requirement for CUDA 12 (Pascal and up).

I know there are some efforts in HA, etc to get faster-whisper running smoothly on GPU (it supports it natively so it’s not that hard). The good news for you is faster-whisper out of the box targets CUDA 11 which is supported on Maxwell. For HA faster-whisper it’s “just” a matter of getting the Nvidia drivers running and passing the GPU to the docker container.