An earlier architecture for the Park Whisperer conversational agent using the Azure OpenAI Assistants API — a stateful, persistent assistant model with managed threads and runs, distinct from the chat-completions-with-tools approach used in production. 16 function tools backed by GPT-4o-mini, with vector search over a Cosmos DB knowledge base, all configured as Infrastructure as Code and updated via a Python deployment script.
The Assistants API uses a fundamentally different execution model from chat completions. The assistant object is persistent and server-managed. Conversations are Threads. Each LLM invocation is a Run on a thread — which may pause mid-execution, requiring the client to submit tool results and resume.
update-assistant.py pushing from agent-config.json.queued → in_progress → requires_action → completedrequires_action with a submit_tool_outputs action. Client executes the function calls, then submits results to resume the Run.completed, client retrieves the assistant's messages from the thread. The full exchange is stored server-side in the Thread.
All agent configuration — system prompt, model selection, and all 16 tool definitions — lives in
version-controlled JSON files. update-assistant.py reads the config and pushes it to
the Azure OpenAI service, treating the assistant configuration as deployable infrastructure.
entity_id and date filters
beta.assistants.update(), snapshots result
Prints all 16 function names before updating — visual confirmation of what's being pushed
Saves updated state to agent-assistant-config.json for drift detection
API version: 2024-05-01-preview — Assistants API was still in beta
agent_config.js)
| Function | Category | Router Bot Endpoint | Description |
|---|---|---|---|
| get_current_wait_times | Live | /v1/currentWaits | Real-time attraction wait times, updated every 10 minutes. Optional entity_id filter. |
| get_current_down_attractions | Live | /v1/currentDown | Attractions experiencing downtime. Critical for plan-change guidance. |
| get_park_schedule | Live | /v1/dailySchedules | Park operating hours for planning arrival and departure. |
| get_lightning_lane_availability | Lightning Lane | /v1/lightningLane | LL return time availability. Filters: queue_type (PAID_RETURN_TIME | RETURN_TIME), sold_out_only. |
| get_top_wait_times | Ranked | /v1/topTenWaits | Ranked longest waits over daily/weekly/monthly periods. |
| get_top_restaurant_waits | Ranked | /v1/topTenRestaurants | Ranked restaurant wait times for dining crowd avoidance. |
| get_sellout_events | Lightning Lane | /v1/selloutEvents | LL return time sellout events. Shows which slots are no longer available. |
| get_top_sellouts | Lightning Lane | /v1/topSellouts | Attractions ranked by sellout speed. Indicates most-in-demand LL windows. |
| get_historical_wait_patterns | History | /v1/waitSnapshots | Wait time history for a specific attraction + date. Required params: entityName, date. |
| get_downtime_summary | History | /v1/downSummary | Daily downtime summary. Multi-day queries call once per date and aggregate. |
| get_park_advisories | Live | /v1/advisories | Current park advisories, alerts, and special event announcements. No parameters. |
| get_purchase_options | Lightning Lane | /v1/purchaseOptions | Individual LL and Premier Pass pricing and availability. |
| get_park_shows | Live | /v1/parkShows | Show schedules and entertainment listings. |
| get_release_events | Lightning Lane | /v1/releaseEvents | LL release timing events — when return times become available. |
| get_park_entities | Reference | /v1/parkEntities | Park entity reference data — IDs, names, types for all attractions and parks. |
| search_park_knowledge | Vector | /v1/searchKnowledge | Cosmos DB vector search over park_knowledge container. Required param: query. Optional: type, limit (default 3). |
The search_park_knowledge tool queries a Cosmos DB container with vector embeddings.
The agent is instructed to use this tool when guests ask for advice, strategies, tips, or recommendations —
as opposed to live operational data. Nine semantic knowledge types are defined and searchable by type filter.
The most compelling property of the Assistants API is that the conversation history lives server-side in Thread objects. The client only needs to store a thread_id — not reconstruct or send the full message history on every turn. For a multi-turn park planning conversation that accumulates dozens of messages and multiple tool call results, this matters: passing the full history in the messages array on every call adds tokens and latency proportional to conversation length.
In practice, the Thread-based model works well for long-running planning sessions where a guest asks follow-up questions across many turns. The server-side state also means if the client drops the connection mid-turn, the Run can continue and the result is retrievable later — a resilience property that the stateless chat completions model doesn't have.
The Assistants API Run lifecycle requires polling: create a Run, poll until requires_action or completed, execute tools, submit outputs, poll again. Each poll is an HTTP round-trip. A conversation turn requiring three tool calls involves: create run → poll → submit three tool outputs → poll → retrieve messages. This polling loop adds latency and complexity compared to streaming chat completions where the tool call response arrives in the stream and the agentic loop runs in the same function execution.
The production system uses streaming chat completions inside an Azure Function, which allows token-by-token streaming to the client — users see the response building in real time. The Assistants API (at beta/2024-05-01-preview) did not support streaming for runs with tool calls, which made the UX feel slower even when total latency was similar.
get_park_shows and get_release_events were present in this v4 Assistants version that weren't in early production iterations. Show schedules are straightforward — the agent can tell a guest when Fantasmic plays tonight. The release events tool tracks when Lightning Lane return times become available throughout the day, which is a more advanced use case: guests asking "when should I try to book TRON Lightning Lane?" can get data-driven timing recommendations rather than generic advice.
The 16th tool, get_park_entities, provides the reference entity data (IDs, names, types) that other tools use for filtering. In practice, the agent rarely needs to call this explicitly — it uses entity IDs from the system context — but having it as a callable tool means the agent can resolve an ambiguous attraction name to an ID before calling other tools. This resolved a class of "attraction not found" errors when guests used informal names.
Separating the agent configuration into version-controlled JSON files (agent-config.json, agent-functions.json) and deploying via script is the most durable practice from this version. It meant that changes to the system prompt or tool definitions were tracked in git, reviewable as diffs, and reproducible — rather than being made directly in the Azure portal where changes leave no audit trail.
The agent-assistant-config.json snapshot pattern (reading back the deployed state after update) is equivalent to a Terraform state file: it records what was actually deployed, allowing drift detection between what's in agent-config.json and what the Azure OpenAI service has. This pattern carried forward into the production deployment pipeline, where function app settings are read back and validated after each deploy.
Three concrete reasons drove the migration to the stateless chat completions approach for production:
The trade-off: the production system must send the full conversation history on every turn, which uses more tokens for long sessions. But the streaming UX improvement and reduced operational complexity were judged worth that cost.