Azure AI Foundry conversation

I would like to share a new conversation agent integration, Azure OpenAI SDK Conversation.

Initially, it comes with the following features:

  • Azure AI Foundry Integration: Natively connects to Azure AI Foundry endpoints, using the latest API versions.
  • Stateful Conversations: Includes a unique “MCP (Master Control Program)” server that maintains conversation context, allowing for more natural and
    intelligent follow-up interactions.
  • Robust Tool Calling: The agent can reliably call Home Assistant services to control your devices, with flexible targeting by entity, device, or area.
  • Local Intent Handling: For fast, offline execution of simple commands (like turning lights on/off) without needing to call the Azure API.
  • Vocabulary Normalization: Understands a wider range of commands by normalizing synonyms (e.g., “deactivate” becomes “turn off”).
  • Advanced Logging & Debugging: Detailed logging options for requests, responses, and system prompts to help with troubleshooting.
  • Dynamic Prompts with Jinja2: Use Jinja2 templates in your system prompt to provide the model with real-time Home Assistant entity states.
  • Optional Web Search: Can be configured to use Bing to fetch real-time information from the web.

More info can be found in the README.md.

It would be great if you could check it out with your preferred Azure AI Foundry models. You could get a free trial to test it. I’m waiting for your feedback, especially regarding conversation
quality, model compatibility, and any feature suggestions you may have.

How is this different from the official OpenAI Conversation integration?
This component was written from scratch to be Azure-native and to introduce a more advanced, stateful architecture. While the official integration
provides a direct, stateless connection to OpenAI, this one uses a stateful middleware (the MCP server) to enable more complex conversational memory
and features not found in the core integration.

Please provide more details on how you’re using MCP and the difference between the core integration.

Why would one want this instead? (why the mcp route and EXACTLY HOW does it work.) because honestly a hacked openai integration that points at Azure OAI endpoints work just fine if you have your tools setup (see Friday’s Party for what I mean.)

Note, Im not against MCP, in fact I’m a huge fan but I am having trouble seeing what this buys me and would hate for someone to add complexity they don’t need. (I’ve already got multistage rag and tools). Also HA has an MCP client builtin so if it’s just MCP over SSE I need I’ll use that?

What problem are you specifically solving?

Hi Nathan,
first thing first: this is my first vibe coding project. As an old programmer I’m more than happy to use LLM for rapid prototyping. I spent some hour chatting with LLM about your question, that it was actually really great! After some answers telling that my MCP is doing magic, I ended up with the following answer that onestly depicts my goals… LLM content follow :wink:
Thank you for the excellent, detailed questions. You’ve hit on some key points, and it’s clear the README needs to better explain the project’s
specific goals and current state, as this is very much a work in progress.

The core problem this integration aims to solve, especially in its current phase, is the lack of visibility and data in the conversation pipeline.
Before we can build a truly efficient and intelligent agent, we first need to understand what’s actually happening.

The Primary Goal: Introspection and Data-Driven Optimization

Right now, the main feature of this component is to act as a power-user tool for introspection. Its primary goal is to make it easier to know what’s
going on under the hood by:

  1. Logging Every Request and Response: It creates a detailed log of the exact payloads sent to and received from the LLM. This provides a clear dataset
    of your interactions.
  2. Integrating and Logging the Local Intent Handler: The component logs when a command is successfully handled by Home Assistant’s local intent system.
    This is crucial because it tells us which commands didn’t need the LLM at all.

Having lots of entities makes it nearly impossible to know which ones are frequently used in voice commands, which are barely used, and which are
never used at all. By analyzing the logs this component produces, the long-term goal is to intelligently and automatically filter the context to get
smaller payloads. The current implementation is the data-collection phase for that future optimization.

The Future Vision: What This Buys You

The data we’re collecting will power several advanced features that a standard integration can’t offer:

  • Data-Driven Context Filtering: Once we know which entities you actually talk to, we can stop sending the state of hundreds of irrelevant ones, leading
    to the massive token savings we’ve been discussing.
  • Dynamic Vocabulary and STT Correction: The component is designed to build a vocabulary of your specific, successfully used entity names. This has two
    benefits:
    1. It can heighten the priority for local intent recognition, making direct commands faster and more reliable.
    2. It will help address Speech-to-Text (STT) errors. By knowing that light.kitchen_light is a valid and frequently used entity, the system can learn
      to favor “kitchen” over a misheard “chicken,” ensuring your commands are understood correctly.

Clarifying the Other Features

  • MCP and Deltas: the current MCP implementation sends only state deltas on subsequent turns, which
    sacrifices conversational memory for token efficiency. This is a pragmatic, temporary trade-off. The long-term vision is to replace this simple delta
    system with the intelligent, data-driven context filtering described above, powered by the insights gathered from the logs.
  • Early Wait: This feature is another part of the focus on user experience. For commands that don’t require a detailed response from the LLM (like most
    “turn on/off” actions), it allows the UI to respond instantly while the agent completes its work in the background, making the system feel much
    faster.

So, to answer “Why would one want this instead?”: you would choose this component if you are a power user who wants deep insight into your agent’s
operations and wants to be part of building a truly intelligent, context-aware agent that learns from your specific usage patterns. We are currently
building the framework to make that possible, and feedback from users like you is critical to guide the next steps.

Respectfully I’d suggest considering a different approach… offer that filtering the entities on its own does not offer any value. Summarization yields much better results.

Yes, trimming ents. shrinks the context set but smaller context isn’t the answer. In fact richer denser context is… Because the the llm itself can tell me what it needs. Yes we need better context management tools but not used probably isn’t the measure I’m considering… I do like the logging angle though.

Here’s. My random thoughts

And what happens when you go for rich not small.

Hi Nathan,
I’ll use your idea of maximizing data sent to LLM. Let me share the next development phases.

Phase 1: Complete Context Management

My next implementation will focus on maintaining full conversation context with:

Dynamic System Prompt via Jinja2

System prompts is generated using Jinja2 templates, enabling:python.useinstructor+2

  • Dynamic generation based on user context, available entities, and room-specific data
  • Conditional logic directly in templates
  • Template modularity and reusability
  • Integration with validation frameworks like Pydanticpython.useinstructor

This approach ensures every interaction remains contextually relevant without manual prompt engineering. The context will be generated at every request, no delta calculation.

Configurable Sliding Window

I’ll implement a configurable sliding window for managing request/response history:

  • Recent messages always available to the LLM
  • Efficient memory management with token limits
  • User-adjustable window size based on needs
  • Let user reset context
  • tagged context for different purposes

The sliding window pattern is the optimal balance between context preservation and resource constraints.matterai+2
I’ll do some research on content disposition for better llm handling.
Keep LangGraph migration ready:

  • Keep agent logic in isolated modules (not tightly coupled to HA state management)
  • Use typed state objects for context (easier to port to LangGraph StateDict later)
  • Implement logging hooks for introspection (LangGraph has built-in observability)

Phase 2: Usage-Based Entity Filtering

After initial data collection, I’ll implement entity filtering based on actual usage patterns:home-assistant+2

Data-Driven Analysis

The system will track:

  • Frequently used entities: always in context
  • Rarely used entities: included conditionally
  • Unused entities: automatically excluded

This directly reduces payload size sent to the LLM, improving response times and reducing token costs.

Intelligent Filtering Strategies

  • Area-based: prioritize entities in frequently used rooms
  • Integration-based: prefer frequently controlled integrations
  • Temporal patterns: consider daily/weekly usage rhythms

Phase 3: Automatic Summarization

When payload/tokens reach a configurable threshold, I’ll implement automatic historical summarization:galileo+2

Mechanism

  • Automatic trigger: at token limit approach
  • Contextual summarization:
    ** LLM generates summaries preserving critical informationlinkedin+1
    ** local summarization?
  • User-transparent: completely invisible to end users
  • user prompt/LLM response only (Jinja2 context always fresh)
  • Hybrid strategy:
    • Recent messages: full fidelity
    • Mid-range history: condensed
    • Older context: heavily summarized or removedapxml+2

This maintains conversational coherence across long sessions while reducing API costs by 80-90%.mem0

Implementation Benefits

For Power Users

  • Complete visibility into interaction data
  • Full configurability per use case
  • Data-driven optimization decisionshome-assistant

For Performance

For System Intelligence

  • Learns user-specific entity relevancebyteplus+1
  • Adapts to individual usage patterns
  • Self-optimizes over timestrandsagents+1

Development Roadmap

  1. Phase 1 (immediate): Full context + Jinja2 + Sliding Window
  2. Phase 2 (post-data collection): Usage-based entity filtering
  3. Phase 3 (advanced optimization): Automatic summarization with token management

This incremental approach validates each component before adding complexity.home-assistant+1

Looking forward to your thoughts.

Best,
Carlo