Production · Claude Sonnet · Live Tools · SWA Proxy

Conversational
Park Agent

A natural-language Q&A interface that routes through the same Azure Function, SLM classifier, and tool library used by the scheduled content pipelines. Ask "What should I rope-drop tomorrow?" and Claude Sonnet pulls live wait patterns, LL sellout history, and weather forecast from the same PostgreSQL RAG and real-time tools that generate the daily bulletin — then answers in plain conversational text instead of a multi-platform social post.

1
Shared Function
Claude
Sonnet 4.5
auto
Pipeline Routing
4
History Turns
15+
Live Tools
SWA
Proxy Auth

One Function Serves Both Scheduled Pipelines and Live Q&A

The agentChat Azure Function (POST /api/agentChat) uses the same SLM classifier, the same 15+ live tools, and the same PostgreSQL RAG as the scheduled content pipelines. What changes is the system prompt and the output format: pipelines use PIPELINE_SYSTEM_PROMPT and get section-marked blog/social/Reels output; the chat agent uses CHAT_SYSTEM_PROMPT and gets plain conversational prose. The tool lookup, the pre-fetch layer config, and the agentic loop are identical code paths.

HTTP Trigger
POST /api/agentChat — same Azure Function as pipeline runner SWA proxy: /api/pipeline/proxy/api/agentChat → park-knowledge-updater Auth: x-functions-key via PARK_KNOWLEDGE_KEY app setting (SWA-side) Supports ?agentBackend=staging|prod URL param → localStorage override
Pipeline Resolution
_pipeline_from_question() keyword-matches the user question → content_type weather/rain/storm → weather_midday · rope drop/first ride → rope_drop_strategy lightning lane/LL/sellout → ll_intelligence_report top ten/ranking/historical/trend → daily_trend_report wait/line/down/status/reliable → operations_bulletin (default) content_type in request body → explicit override, skip keyword matching
Context Retrieval
query_park_intelligence() — same PostgreSQL RAG as pipeline prefetch SLM classifier (Phi-3 Mini / Phi-4-mini fallback) selects retrieval strategy _query_categories() resolves knowledge_types from pipeline config for filtering max_context_chars truncation prevents RAG blowing up the context window _compact_query_result(): strips vector/embedding fields, keeps content/title/facts
Live Tools (Conditional)
_needs_live_tools() decides whether to call live endpoints Historical queries (last week/trend/average) → skip live tools, use RAG only Live queries (current/right now/today/status) → _pipeline_tools() subset 15+ tools: currentWaits, currentDown, lightningLane, dailySchedules, parkShows, weather, selloutEvents, waitSnapshots… Tool calls run in ThreadPoolExecutor (parallel) — same loop as pipeline generation
LLM Generation
Claude Sonnet 4.5 (CHAT_MODEL_ID — higher reasoning than Haiku for Q&A) Bedrock Prompt Cache: enabled when context > 1200 chars (cache_control=ephemeral) CHAT_SYSTEM_PROMPT + SOCIAL_REPLY_PROMPT: conversational, no section markers Prior conversation messages sent as history (max 4 turns, 500 chars each) return_usage: true → token counts in response for cost audit
Response
Plain conversational text — no ---BLOG--- / ---INSTAGRAM--- markers traceEvents[] lists tools called (for debug display in UI) metadata.tokensUsed: input_tokens + output_tokens for session cost tracking _strip_token_footer() removes cost annotations from the visible reply text

Same Model, Different Instructions — Different Output Shape

The only thing separating a scheduled pipeline response from a chat response is the system prompt. Same model, same tools, same data. The system prompt changes what Claude thinks its job is.

Pipeline System Prompt
Identity: Park Whisperer — content production mode

Anti-hallucination: Use ONLY data in LIVE DATA section

Output contract: Follow voice/tone in the content prompt below. Section markers required (---BLOG---, ---FACEBOOK---, etc.)

Persona tone: Voice-neutral — each pipeline's Cosmos prompt owns its own personality

Temperature: 0 — format fidelity over creativity
Chat System Prompt
Identity: Park Whisperer — knowledgeable insider friend

Anti-hallucination: Same rule, same wording

Output contract: Plain conversational text only. No section markers. No markdown outlines. 1-3 short paragraphs. Paste-ready for social DMs/comments.

Persona tone: Warm, direct, enthusiastic — hard-coded in CHAT_SYSTEM_PROMPT

Additional contract: SOCIAL_REPLY_PROMPT appended — bans "Pro move:", "TL;DR:", labels, tables
Park Whisperer Agent — Live
👤
Should I rope-drop TRON or Seven Dwarfs Mine Train tomorrow morning?
🎢
TRON, no question. Seven Dwarfs has the fastest-selling Individual Lightning Lane at Walt Disney World — it routinely sells out before 7:30 AM even on slower crowd days. TRON's wait also climbs fast, but the LL pricing gives you a bit more wiggle room than SDMT.

Rope-drop TRON, then immediately book SDMT LL on your phone while you're in the TRON queue. That's the sequence that actually gets you both in the first two hours.
query_intelligence · get_lightning_lane_availability · 382 in · 91 out tokens
👤
What if it rains?
🎢
Both TRON and Seven Dwarfs are covered/indoor, so light rain doesn't affect either. The real risk with rain is outdoor queues like Splash Mountain and Jungle Cruise getting temporarily paused — which actually shortens indoor ride lines as crowds shift. If storms are forecast for early morning, TRON's outdoor loading platform can delay operations. Check the forecast around 6 AM and pivot to Seven Dwarfs first if there's active lightning risk.
get_weather_forecast · get_rides_at_risk · history: 1 turn · 441 in · 104 out tokens

Question Keywords → Pipeline Type → Tool Subset

_pipeline_from_question() keyword-matches the incoming question to the most relevant pipeline type. That pipeline type determines which tools are available and which knowledge_types filter the RAG query — the same config Cosmos stores for the scheduled pipeline.

Question KeywordsRouted PipelineTools Activated
weatherrainstormforecastheatindoor weather_midday WEATHER_TOOLS: get_weather_forecast, get_weather_intelligence, get_rides_at_risk, get_ride_sensitivity…
rope dropride firstfirst rideopeningearly entry rope_drop_strategy ROPE_DROP_TOOLS: currentWaits, lightningLane, parkSchedule, purchaseOptions, entityContext…
lightning lanell multipassselloutsold outreturn time ll_intelligence_report LL-focused subset: lightningLane, selloutEvents, topSellouts, releaseEvents…
top tenrankinghistoricaltrendaveragebest timehow often daily_trend_report TREND_TOOLS: waitSnapshots, weeklyAvgWait, monthlyAvgWait, dailyWaitSummary, query_intelligence…
anomalypredictcorrelationwhy ai_deep_insights INSIGHTS_TOOLS: heavier query_intelligence queries + structured weekly aggregates
waitlinedownclosedstatusreliable (default) operations_bulletin BULLETIN_TOOLS: currentWaits, currentDown, lightningLane, shows, advisories, entityContext…

Building the Chat Layer on the Pipeline Foundation

Why not build a separate chat function instead of sharing one?+

The knowledge retrieval, SLM classifier, and tool definitions already existed for the scheduled pipelines. A separate function would mean duplicating the RAG query logic, the tool wrappers, the SLM client with its confidence gate and fallback path, and the Bedrock LangChain setup — approximately 800 lines of tested code that would then need to stay in sync whenever the pipeline infrastructure changed.

Using the same function also means the chat agent benefits automatically from improvements to the pipeline layer — when a new tool is added to ROPE_DROP_TOOLS, the rope-drop question routing in chat gets it for free. When the Phi-3 Mini confidence threshold is tuned for better classification, both pipelines and chat benefit simultaneously.

Why does the chat agent skip live tools for historical questions?+

_needs_live_tools() checks the question for historical keywords ("last week", "average", "trend", "how often", "best time", "total downtime") and short-circuits to RAG-only mode. This isn't a cost optimization — it's accuracy. If someone asks "what's the average wait for Space Mountain on weekday mornings?" and the agent calls get_current_wait_times, it gets today's snapshot and might report a number that's uncharacteristically high or low. The PostgreSQL knowledge base holds precomputed day-of-week/hour aggregates specifically for this question type; they're more accurate for the question being asked.

Live tools are activated only when the question has explicit live-data indicators: "current", "right now", "today", "status", "available", "weather". For operations_bulletin and weather pipeline routes, any of these triggers the live tool set. For trend/insights routes, even those keywords fall back to RAG because aggregated patterns are more reliable than a single current reading.

How does Bedrock Prompt Cache reduce cost on multi-turn conversations?+

When the assembled context (system prompt + RAG results + tool definitions) exceeds 1200 characters, the agent adds cache_control: ephemeral to the system message. Bedrock caches the tokenized representation of those blocks and deducts cached tokens from billing at a lower rate on subsequent calls within the same session window.

For a multi-turn conversation where the user asks follow-up questions ("What if it rains?" after asking about rope-drop strategy), the tool definitions and RAG context don't change — the entire input is the same except for the new question and history. Without caching, each follow-up incurs the full input token cost for the tool definitions (~2000 tokens) even though they're identical to the previous call. With prompt caching, those tokens are billed as cache reads at ~10% of the standard input rate.

Why add SOCIAL_REPLY_PROMPT on top of CHAT_SYSTEM_PROMPT?+

The interface's right panel is a "Reply Blocks" artifact composer — the intended use case is a social media manager or community manager asking "what should I reply to this guest question?" and copying the answer into Instagram comments or Twitter DMs. That use case has specific failure modes that CHAT_SYSTEM_PROMPT alone doesn't prevent.

Without SOCIAL_REPLY_PROMPT, Sonnet tends to format replies as labeled outlines: "Top Priority: ...", "Pro move: ...", "TL;DR: ...". Those labels look like AI output in a social reply and get deleted by any human editor before posting. SOCIAL_REPLY_PROMPT explicitly bans those patterns and enforces: plain text only, 1-3 paragraphs, bullets only when they make the reply paste-ready, no markdown, no metadata. The output arrives pre-formatted for social copy-paste with no editing required.

How is the SWA proxy routing configured?+

The chat UI is served by Azure Static Web Apps (Park-Whisperer-Web). Direct browser calls to the Function App would expose the API key in client-side JavaScript. Instead, all API calls go to /api/pipeline/proxy/api/agentChat — a SWA API function (pipelineProxy) that holds PARK_KNOWLEDGE_KEY in its environment and proxies the request to the Function App with the key injected server-side.

The ?agentBackend=staging URL parameter (or the PW_AGENT_BACKEND localStorage key) redirects traffic to /api/agentChat?agentTarget=staging, which the proxy resolves to the staging Function App. This allows production UI to test against a staging backend without deploying a separate frontend — useful for validating new tool definitions or routing changes before promoting to production.

Multi-Model Content → Pipeline Architecture ↗ Knowledge RAG ↗ All Projects ↗