OpenAI and also many other LLM companies aren’t optimized for low latency and fast time-to-first-token responses.
Here’s what I use and Nathan referred to.
Take a look here for some comparison response times to gpt-4.1-mini:
The best combination of a really fast provider, with a good tool calling model that has a large context size AND is affordable I could find is:
Model: gtp-oss-120b
Provider/LLM-hoster: http://groq.com
It provides an Open-AI compatible API endpoint and can therefore be used with this integration:
Easy to setup, as the configuration of this LLM integration is based on the official HA OpenAI integration, just with a few additional fields.
In Groq you need to register an account, add a payment method and create an API key.
It’s billed by usage afterwards (and you can set limits), so you don’t need to pay a larger amount pre-paid to test.