VPE with cloud AI

OpenAI and also many other LLM companies aren’t optimized for low latency and fast time-to-first-token responses.

Here’s what I use and Nathan referred to.
Take a look here for some comparison response times to gpt-4.1-mini:

The best combination of a really fast provider, with a good tool calling model that has a large context size AND is affordable I could find is:

Model: gtp-oss-120b
Provider/LLM-hoster: http://groq.com

It provides an Open-AI compatible API endpoint and can therefore be used with this integration:

Easy to setup, as the configuration of this LLM integration is based on the official HA OpenAI integration, just with a few additional fields.
In Groq you need to register an account, add a payment method and create an API key.

It’s billed by usage afterwards (and you can set limits), so you don’t need to pay a larger amount pre-paid to test.