New assist with LLM combination, lot of tokens?

So I tried the new combination of Home Assistant and ChatGPT to control the house. I see a lot of tokens being generated. I have just simple request like in the YouTube presentation. So “it is to dark can you raise the cover in living room” or “set living cover to 70 %”. On average after every request the tokens in api usage go up several thousands. So I had now 36 api request with got-4o and a usage of 339.000 context tokens and 1800 generated tokens. Is that usage normal? I have around 230 entities open to assist for usage, does the api send everything to chatGPT on api call to handle it? With that usage of token my monthly limit of 20$ will be gone after a short time as we use our voice assistants a lot for setting cover positions, toggle lights/tvs and other stuff.


I have the same question.
I have 16 request and already 200 000 generated tokens. I only turned lights of and on, and ask a simple question.
That could be realy expensive, If using with a atom echo. (What I was actually planning to do.)

Look for an integration call Fallback Conversation. It will try to handle the request locally before reaching out to the LLM.

Same here, have been using ChatGPT for a while now to populate nice weather reports for us in the morning and with almost no cost.

Having changed to default settings as advised in the blog post I notice it’s now using gpt-4o. A couple of hours playing and I have already racked up a couple of dollars costs.

From what I can see gpt 3.5 which is what I was using is a lot cheaper and Gemini 1.5 is even cheaper yet. However it appears that the only model that support caching (as in remembers the recent conversation or intent) is the costly 4o model.

So for the minute although teased a little by the caching of 4o I have reverted back to gpt 3.5 due to costs.

1 Like

Same problem here. I set gpt to recommended settings.

I gave some simple commands and asked some simple questions with really short answers from gpt and I have now:

13 API requests
130,000 tokens
totalling 0.66$

The openai website also shows me that gpt 4-o was used.

If you switch to 3.5 turbo, its the same amount of tokens, but much cheaper. And it still works great on the commands I have tried (just simple heat and light commands)

1 Like

Sure, 3.5 is ten times cheaper. But that’s not the point. The question is why it’s using so many tokens for simple tasks like e.g. switching lights .
Some videos show the using of gpt3.5 for less than 1cent a month, and gpt 4o for about 10 cent a month.
Same problem here btw.

OK the documentation of the openai integration says that gpt 3.5 turbo is used and not gpt 4o when using the recommended settings.

I set now to gpt 3.5 turbo and yeah it still uses 10000 tokens for turning on a light.

Even with 1/10 of the price it’s still 0.6 Cent per command. Still too expensive.

The openai website says that 1000 tokens are 750 words…

But don’t get me wrong. I know it’s a beta feature.

I am just interested how the calculation of the tokens work.


Looks like it depends on how many entities are opened to assist. I think it is sending all entire data to ChatGPT. I removed for testing a lot of shared entities and reduced the tokens. It is still a lot but not that many as before so I think it is related to the shared entities


oh ok, I have 242 entities exposed to assist. I will try removing some. thanks for the hint!

I would really like to see back end data for what is being sent in relation to the entities.

If i use 4o for the conversation.process (that I believe does not send exposed entity data) then its considerably cheaper to use and for my small amount of automations and scripts that use it, it nothing to worry about.

The real costs come when HA sends the entity data via assist.

Does anyone know how we can intercept what data is sent?
Does it send entity state and all attributes, last triggered etc as this would soon add up i guess. From what I read 1 token is equal to approx 4 characters.

Ok, thats make totals sense since it needs the infos to control something. Would be great if it were possible to control which devices/enititys are exposed to a certain assistant. So one could expose just the entitys which are supposed to control via GPT4o and just expose all to gpt3.5, gemini, or HA Cloud.

I reduced my exposed entities from 340 to 100. Still Using 5000 Tokens for a simple request. I disabled all sensors. But this cant be the way. An assistant is living from data :smile:

I think the problem is possibly due to the way Home Assistant is serialising the entity data in order to provide it to the LLM. Is it possible to view the data sent by HA and see if we could optimise the mapping template used?

I haven’t played around with the new settings, so please bare with me, if this is totally off. :slight_smile:

To me it sounds, like all of your requests are send to ChatGPT, where I thought only the ones are sent, that HA can’t handle itself?

My understanding is (as I said, haven’t played around, so can’t say for sure):

  • turn the livingroom lights on => handled by HA
  • open the blinds in the office and turn the light on => not clear enough, gets send to ChatGPT

Is that what’s happening for you, or is all data send to ChatGPT, even if HA could handle it by itself? That would explain why there is such a high consumption of tokens.

The requests are sent to whatever conversation agent you have set in the assist settings based on what assistant you sent the request to. As far as I am aware there is no option (as yet) to have HA calculate what conversation agent to use. I believe there may be a HACS add-on similar to this but haven’t checked it out.

You can obviously set up as many conversation agents as you want and even use the same model but have each one with different settings.

I have been using chatGPT for a while now and never had any real costs involved (penny’s only) but on the conversation agent I have set up that has “control home assist” set to yes this appear to send lots of data to the AI each and every time I use it. It appears to be the more entities you have exposed the greater the cost.

If you use a AI Conversation Agent, then everytime you do a requests all informations (Like entities) are send to that Agent (OpenAI etc.).
More Entities, more Tokens used.

There is a HACS Integration called Fallback Agent.
There you can Set 2 Conversation Agents. In my case i set HA First, OpenAI as second.

Now all requests goes first to the HA Agent and only when he cant handle it, then it goes to OpenAI.

That’s the one I was thinking of, interested to know if it adds much latency when passed to OpenAi ?

Try it out: GitHub - m50/ha-fallback-conversation: HomeAssistant Assist Fallback Conversation Agent

And in the OpenAI Prompt i use:

Available Devices:

{% for entity in exposed_entities -%}
{{ entity.entity_id }},{{ }},{{ entity.state }},{{entity.aliases | join('/')}}
{% endfor -%}

So only the exposed entities are send.

For STT i recommend the vosk addon with a large model (Not the Default one which the vosk addon used). Makes STT nearly Instant with a good detection (better then STT Whisper).