My Journey to a reliable and enjoyable locally hosted voice assistant

I linked your llm integration repo somewhere in this forums a few weeks ago (as I use it myself),
and already wondered why there’s no official thread from you about this integration.

So welcome and thank you. :slightly_smiling_face:

3 Likes

Glad it’s working well for you :slight_smile:

I have noted an odd issue with it since the 2026.1 update, which looks to cause an issue with llama.cpp backend (only tested so far) when using voice inputs and Assist tooling is enabled… Im not sure if anyone else has encountered it so far (I know a lot of people don’t update immediately) but I’ve identified whats changed in the tooling definitions and will put out a fix release tomorrow :slight_smile:

1 Like

Fantastic! I was a bit hesitant to put that into the bugfix release there as well, in case there was some issue with some model or server combination I hadn’t tested… but the testing I had done myself on it seemed promising enough that it was worth taking the risk on.

Because the date and time are injected at the end of the conversation, it should also mean your LLM generations stay fast, as we arent breaking the cache early on to maintain it! Compare the before-and-after response times of asking “what is the time” :slight_smile:

Thanks :slight_smile: I shared it on reddit when first released and in a few comments there since, but Im otherwise the type of introvert that doesnt maintain much of an online presence, so that’s my excuse for why theres no official thread :slight_smile:

3 Likes

Kinda trivial question, but in your prompt sometimes you use # and sometimes you use ## for comments. Does it actually care?

That is typical hierarchy for markdown, making it nested header. I’m not sure what difference it makes for the LLM, but I asked Gemini and it says that consistent hierarchy helps with its self-attention and the multiple hierarchy helps with understanding things are related.

Even if it doesn’t help the LLM though, it makes it much easier to see and manage the hierarchy of the prompt which is a benefit of its own.

2 Likes

LLMs do not differentiate. They read it all. As long as it’s well structured it’ll figure it out. You can often times use comments in code to instruct the llm but cause machine readable stuff to ignore the text.

1 Like