I wrote in another thread that I noticed HUGE differences while testing different models and providers with the OpenRouter integration.
Especially compared to the popular OpenAI models like gpt-??-mini.
My main model so far was gpt-4.1-mini, which has now been replaced with gpt-oss-120b hosted by groq (which is a cloud model hosting platform and not to be confused with Grok, the llm from xAI).
They are not the cheapest provider, but the one with the lowest latency that was shown on OpenRouter.
Their gpt-oss-120b pricing is still less than half of the price compared to gpt-4.1-mini from OpenAI.
Sadly OpenRouter (or the HA integration?) seem to have a lot problems.
Regardless of the model or provider I often get API errors or bad response errors in the conversation.
It looks like this isn’t just me, there are a lot other reports here in the forums and on Github.
So I found two integrations that can be used:
- One that is designed especially for the cloud provider that I mentioned above: GitHub - HunorLaczko/ha-groq-cloud-api: HACS custom integration for using Groq Cloud API in the Assist pipeline, reducing the workload on the Home Assistant server.
- And another one that was made for every OpenAI compatible API endpoint and which works flawless with cloud endpoints too: GitHub - skye-harris/hass_local_openai_llm: Home Assistant LLM integration for local OpenAI-compatible services (llamacpp, vllm, etc)
This one seems to be more up to date with the latest HA changes.
How fast is it?
First, I tried a simple question where the answer was in the training data and the models didn’t need to use tool calls: What’s the height of the eiffel tower?
- gpt-4.1-mini fells snappy here with about 2 seconds reaction time.
- gpt-oss-120b on groq took below 0.5 seconds in comparison.
That’s not the real reason to switch, right?
But let’s take a look at question with tool calls.
The next try was What outdoor temperature will be tomorrow betwee 8am and 9am?, which will use the Weather LLM script provided by TheFes.
- gpt-4.1-mini needed 7 seconds to get the answer, and didn’t feel snappy at all anymore.
- gpt-oss-120b on groq took less than 2 seconds in comparison
Fun part: gpt-oss-120 even used one tool call more compared to gpt-4.1-mini to ensure which date is tomorrow and was still way faster.
And a least, more complex example:
How much solar energy did we export this september compared to last september. Just give me the kwh for both months and calculate the difference. Also tell me the mean outdoor temperature for the garden thermometer for both months.
At least both models were successful and replied with the same, correct values.
- gtp-4.1-mini took about 35 seconds to complete
- gtp-oss-120b needed only about 7.5 seconds in comparison
A few words about the hass_local_openai_llm integration linked above:
It seems to be updated with the latest HA development.
You can create seperate entries for assistants or ai tasks.
It also allows to modify the promt and supports STT and TTS streaming.
Feel free to share your own experience in this thread. ![]()