My Journey to a reliable and enjoyable locally hosted voice assistant

crzynik · January 28, 2026, 1:54pm

Avoid Prompt Bloat - rebuilding my system prompt

I built my prompt through months of adding functionality and finding undesirable behavior and then making adjustments to fix each behavior. This worked to effectively get the behavior I wanted but at the cost of bloat and slowing down responses (though less meaningfully on my 3090).

My prompt at its most was 4300 tokens, and had considerable redundancy. I sought out to improve this prompt by building a stronger identity and core behavior at the beginning with the goal of getting closer to desired output by default without needing to spell everything out with examples. It does still require some specific examples though, which seems to be the norm for smaller models like the 30B model I am currently running.

My goal was to rewrite the prompt to be more efficient while also maintaining all of the same functionality and desired behavior. I think I might be more picky than most but I want most responses to be succinct to avoid any unnecessary speech in the response that does not include the desired information.

I was able to successfully reduce my prompt from ~4300 tokens down to ~1300 tokens while maintaining and in some cases increasing the desired behavior and consistency.

This has also reduced the response time on my 3090 from an average of 2 seconds down to an average of 1 second.