Production · Phi-3 Mini · LangChain · AWS Bedrock

SLM Content
Generation Pipeline

A scheduled content factory that uses a fine-tuned Phi-3 Mini SLM (deployed to Azure Container Apps) and a LangChain agentic loop backed by Claude on AWS Bedrock to generate multi-platform park content — social posts, blogs, video scripts, morning briefings, evening recaps — fully automatically. The SLM classifies query intent for tool selection; a parallel pre-fetch layer pulls knowledge from the PostgreSQL RAG before the LLM sees a single token; tool calls execute in parallel via ThreadPoolExecutor; and every pipeline type has a curated, type-specific tool subset.

10
Pipeline Types
30+
LangChain Tools
Parallel Tool Calls
3
Content Passes
0
Hallucinated Data
Claude
Generation Model

How the Pipeline Works End-to-End

Each pipeline type runs on a schedule (operations bulletin every morning, weather updates 3× daily, trend reports nightly, etc.). A Container App Job launches the pipeline runner, which assembles context from the knowledge RAG + real-time tools, then calls Claude via LangChain to produce formatted multi-platform content.

Trigger
Azure Container App Job (scheduled) Pipeline registry resolves content_type → tool set + prompt template Pipeline config loaded from Cosmos DB (prompt_source: cosmos or hardcoded) Config loaded fresh per run — no stale prompt risk
Pre-fetch Layer
PIPELINE_PREFETCH[content_type] config All fetches run in parallel before LLM call Knowledge queries: queryIntelligence → PostgreSQL RAG (semantic + structured) Real-time tool calls: current waits, LL status, weather, park schedule Results injected into prompt as structured context — not tool calls during generation KPI fact pack: kpi_daily_digest + kpi_crowd_index pre-loaded for gating
KPI Gate
should_gate_for_fact_pack() evaluates KPI data quality Gates require: crowd_score present, top_10_waits populated, operational_date matches today Gate reason logged: aws_auth_failure / generation_failed / insufficient_kpi_data Gate failure → pipeline skips generation, logs reason, no empty content published
Agentic Loop
LangChain + ChatBedrock (Claude claude-3-5-sonnet / claude-3-haiku) Tool calls execute in parallel via ThreadPoolExecutor Type-specific tool subset per pipeline (BULLETIN_TOOLS, WEATHER_TOOLS, etc.) System prompt: temperature=0, structured output format enforced Actual token counts from usage_metadata (not estimate) Tool failures return "No data available" — never hallucinate missing data
Multi-Pass Generation
Pass 1: data synthesis (all tool results → structured summary) Pass 2: platform-specific formatting (Twitter/X, Instagram, Facebook, blog) Pass 3 (Reels): video script + captions for Instagram Reels / TikTok Section markers parsed from response → split into per-platform content artifacts Generation plan built from pipeline config (enabled channels per content_type)
Publish
blog_publisher.py writes artifacts to content store Structured content artifact (ContentArtifact) with per-platform sections Video job triggered: video_job_trigger → park-video-worker Container App Job Audit trail logged: generation time, model, token counts, gate status

10 Scheduled Content Pipelines

operations_bulletin

Daily Operations Bulletin

Full park ops snapshot: current waits, LL status, closures, dining, shows. Targets social + blog. Heaviest pre-fetch — 6 knowledge queries + 7 real-time tools in parallel.

BULLETIN_TOOLS14 tools
weather_morning / weather_midday / weather_evening

Weather Intelligence Updates

3× daily weather-focused content: NWS forecast, storm risk, ride sensitivity, rides at risk of closure. All three use identical WEATHER_TOOLS but different prompt tone (morning plan vs. midday adjustment vs. evening recap).

WEATHER_TOOLS9 tools
rope_drop_strategy

Rope Drop Strategy

Pre-open park strategy: first attractions to target, LL prioritization, weather impact on opening conditions. Uses forecast + current LL + purchase options + entity proximity context.

ROPE_DROP_TOOLS10 tools
daily_trend_report

Daily Trend Report

End-of-day analysis: wait time trends, sellout/release patterns, weekly comparisons. Uses TREND_TOOLS including query_intelligence for historical context from the knowledge RAG.

TREND_TOOLSknowledge RAG
yesterday_recap

Yesterday Recap

Narrative recap of prior day's operations: headline waits, LL drama, downtime incidents, weather impact. Structured by pre-fetched daily actuals from precomputed_docs.

TREND_TOOLSprecomputed_docs
morning_briefing

Morning Briefing

Pre-visit preparation content: what to expect today based on historical day-of-week patterns, LL opening strategy, weather outlook. MORNING_TOOLS blend real-time current state with pattern knowledge.

MORNING_TOOLS10 tools
ai_deep_insights

AI Deep Insights

Long-form analytical content: week-over-week comparisons, crowd pattern explanations, predictive guidance. INSIGHTS_TOOLS combine query_intelligence (RAG) with structured daily aggregates for multi-layer analysis.

INSIGHTS_TOOLSknowledge + structured
evening_wrap

Evening Wrap

Day-end narrative: today's operational highlights, LL sellout counts, top waits, crowd vs. historical baseline. Pre-fetches today's KPI digest and crowd index for data-driven hooks.

BULLETIN_TOOLSKPI facts

All Context Assembled Before the LLM Sees a Token

Every pipeline has a PIPELINE_PREFETCH config that defines exactly which knowledge queries and real-time tool calls to fire in parallel before the LLM call. This pattern eliminates agentic round-trips for predictable data needs — the LLM receives a complete context block and can focus on generation, not data retrieval.

The operations_bulletin pipeline pre-fetches 6 semantic knowledge queries and 7 live tool calls in parallel. Entity names are embedded into query strings (not just type filters) so BM25 ranking in the knowledge store surfaces the right headliner rides — not low-priority attractions written later in the upsert cycle. Template variables like {last_same_dow} and {date} are resolved at runtime from an ET-anchored date context before queries are dispatched.

Knowledge · popularity_ranking
"TRON Rise Resistance Seven Dwarfs Mine Train Guardians Cosmic Rewind Slinky Dog Dash Flight Passage Smugglers Run Tower Terror popularity ranking top wait times"
mode: knowledge · limit: 5 · entity names embedded for BM25 accuracy
Knowledge · ll_sellout_timeline + lightning_lane_volatility
"Rise Resistance TRON Seven Dwarfs Mine Train Guardians Cosmic Rewind Slinky Dog Dash Lightning Lane sellout fastest demand volatility"
mode: knowledge · limit: 5
Structured · dining (same-DOW historical baseline)
"top 10 restaurants by wait time date={last_same_dow}"
mode: structured · limit: 10 · resolves to prior-week same-day for accurate DOW patterns
Real-time tools (parallel)
get_current_wait_times · get_current_down_attractions · get_lightning_lane_status · get_upcoming_shows · get_entity_context · get_top_restaurants · get_restaurant_wait_times
All 7 execute concurrently · failures return "No data available" (never block)

Phi-3 Mini as the Query Classifier

A fine-tuned Phi-3 Mini model deployed to Azure Container Apps handles the query classification step that decides which retrieval strategy to use for each tool call. The same classifier serves both user-facing RAG queries and the automated content pipeline tool calls. The SLM runs at sub-200ms with minimal inference cost; Phi-4-mini on Azure OpenAI serves as the automatic fallback.

Routing Logic
SHA-256 hash of query → deterministic routing (same question = same path) SLM_ROLLOUT_PERCENTAGE env var (0–100) controls traffic split SLM_ROLLOUT_PERCENTAGE=0 → 100% Phi-4-mini (safe default) SLM_ROLLOUT_PERCENTAGE=100 → all traffic to Phi-3 Mini Gradual rollout: 10% → 25% → 50% → 100% as confidence is validated
Confidence Gate
SLM returns confidence: 0.0–1.0 SLM_CONFIDENCE_THRESHOLD = 0.85 (default) confidence ≥ 0.85 → use SLM result, stamp classification_source="slm" confidence < 0.85 → fall back to Phi-4-mini, stamp "slm_fallback_low_confidence" + slm_confidence value Fallback preserves full classification for logging — both paths are auditable
Error Fallback
Timeout (2000ms default), HTTP error, invalid strategy → SLM error 2 retries with exponential backoff (100ms, 200ms) before raising SLMClassificationError caught → Phi-4-mini fallback, stamp "slm_fallback_error" SLM container can be stopped for maintenance — zero pipeline disruption
Serving Infrastructure
Azure Container Apps — scales to zero when idle HTTP endpoint: POST /classify Reusable httpx.Client with connection pooling (max 20 connections) Phi-3 Mini: 3.8B parameters, ~3× faster than Phi-4-mini for classification task Same classification schema as Phi-4-mini — drop-in replacement

30+ LangChain Tools — Two Backend Systems

Every tool wraps one of two HTTP endpoints. LangChain tool docstrings are the only documentation the model sees for tool selection — they are written as precise data-query specifications, not conversational descriptions. Tool failures return "No data available: [reason]" and are never propagated as exceptions into the generation context.

ToolBackendUsed ByKey Data
query_intelligenceRAG park-knowledge-updater → PostgreSQL RAG TREND_TOOLS, INSIGHTS_TOOLS Historical patterns, knowledge articles. Supports mode=semantic/structured/knowledge, knowledge_types filter to avoid full-library scan.
get_current_wait_timesReal-time router-bot /api/ai/v1/currentWaits BULLETIN, MORNING, ROPE_DROP Live wait times all attractions, operational status
get_lightning_lane_statusReal-time router-bot /api/ai/v1/lightningLane BULLETIN, MORNING, ROPE_DROP LL availability, pricing, sellout status per ride
get_weather_intelligenceReal-time router-bot /api/ai/weather/intelligence All tool sets AI-analyzed weather with crowd/wait impact predictions, storm risk
get_weather_forecastReal-time router-bot /api/ai/weather/forecast WEATHER_TOOLS, ROPE_DROP Hourly NWS forecast, storm risk %, wind risk %, comfort level
get_rides_at_riskReal-time router-bot WEATHER_TOOLS, ROPE_DROP Outdoor rides with closure risk given current/forecast weather
get_ride_sensitivityReal-time router-bot WEATHER_TOOLS, ROPE_DROP Historical weather sensitivity profile per attraction (rain/wind/lightning)
get_down_summaryReal-time router-bot BULLETIN, TREND, INSIGHTS Today's downtime incidents, total hours, impacted attractions
get_top_wait_aggregatesReal-time router-bot TREND_TOOLS, INSIGHTS_TOOLS Today's ranked top-10 waits from precomputed_docs
get_daily_sellout_summaryReal-time router-bot TREND_TOOLS, INSIGHTS_TOOLS Today's LL sellout counts per attraction
get_daily_release_summaryReal-time router-bot TREND_TOOLS, INSIGHTS_TOOLS Today's LL release/restock events — high count = extreme demand cycling
get_entity_contextReal-time router-bot → park_knowledge parkEntities BULLETIN, ROPE_DROP, evening_wrap Attraction metadata: land, park, type, proximity. Used for spatial narrative.
get_weekly_aggregationReal-time router-bot TREND_TOOLS, INSIGHTS_TOOLS Week-over-week comparative wait/downtime metrics

LangChain + Claude on AWS Bedrock

The previous implementation used boto3 directly against the Bedrock Agent runtime (RETURN_CONTROL flow), which serialized tool calls and had no actual token counting. The current implementation uses LangChain's ChatBedrock (bedrock-runtime:InvokeModel directly), parallel tool execution, and real token counts from usage_metadata.

Previous

boto3 + Bedrock Agent runtime

RETURN_CONTROL tool calls were serial. Token counting was chars÷4 estimate. Output format required retries. AWS service overhead per iteration.

Current

LangChain + ChatBedrock direct

bedrock-runtime:InvokeModel directly — no Agent service overhead. Tool calls fire in parallel via ThreadPoolExecutor. Real token counts from usage_metadata. Output format enforced at temperature=0.

Tool Parallelism

ThreadPoolExecutor per turn

When the model emits multiple tool_calls in a single turn, all execute concurrently. Results collected via as_completed(). Wall time = slowest tool, not sum.

Failure Mode

Explicit "No data available"

Every tool wraps its HTTP call in a try/except. Timeout, HTTP error, or exception → return "No data available: [reason]". Model instructed to omit that section rather than invent data.

Token Tracking

usage_metadata per call

Input tokens, output tokens, and total tracked per generation call. Used for cost audit trail and pipeline-level token budget enforcement.

Model Selection

Claude 3.5 Sonnet / 3 Haiku

Content-type-specific model selection: long-form insight generation uses Sonnet; real-time bulletin updates use Haiku for lower latency and cost.

3-Call Architecture: Data → Platforms → Video

Content generation uses multiple sequential LLM calls to separate concerns. Pass 1 synthesizes raw tool data into a structured summary. Pass 2 reformats for each enabled platform. Pass 3 (Reels/TikTok) builds the video script. Each pass uses different temperature and prompt directives.

Pass 1 — Data Synthesis
All pre-fetched context + agentic tool results → structured summary temperature=0 for factual accuracy Crowd index, top wait times, LL drama, closures, weather impact Section markers delimit data categories for Pass 2 re-use
Pass 2 — Platform Format
Platform generation plan: enabled channels from pipeline config Channels: Twitter/X (280 char), Instagram (caption + hashtags), Facebook (long form), blog (full article) build_enabled_call2_format_block() builds per-channel format instructions Section markers in response → split into per-platform artifacts _normalize_generation_channels() resolves channel config per content type
Pass 3 — Reels/Video Script
Triggered only when video channel enabled for content type Template: get_facebook_reel_template() → structured hook/body/CTA script _resolve_reels_call3_override() selects weather/recap/morning variants Output: video_job_trigger → park-video-worker Container App Job renders the reel Reels call3 override allows content-type-specific video templates
Platform Config
SOCIAL_CHANNEL_ORDER defines canonical output order platform_prompt_config.py centralizes all channel format specs Content type drives which channels are enabled (not all types post to all platforms) Context-aware: has_dashboard flag adds portal-link CTAs to applicable channels

Engineering Choices

Why pre-fetch context instead of letting the agent call tools?+

For scheduled pipeline types (not interactive chat), the data requirements are fully predictable. Pre-fetching in parallel has two major advantages: total wall time equals the slowest single call (not sum of all calls), and the LLM receives a complete context block rather than making tool decisions under generation pressure. The agentic loop still fires for unexpected data needs, but the pre-fetch layer covers 90%+ of what each pipeline type needs. The PIPELINE_PREFETCH config makes it explicit and testable — a new engineer can see exactly what data flows into each pipeline type before reading any LLM code.

How does the KPI gate prevent publishing on bad data days?+

should_gate_for_fact_pack() inspects the pre-fetched KPI fact pack before any LLM call. Required conditions: crowd_score is present and numeric, top_10_waits has at least one entry, and operational_date matches today (Eastern Time). If any condition fails, the pipeline logs a gate reason (insufficient_kpi_data, aws_auth_failure, generation_failed) and skips generation entirely. This prevents publishing content built from stale or partial data — which would be worse than publishing nothing. The gate reason is auditable in the pipeline run log.

Why embed entity names into knowledge query strings rather than just type filters?+

PostgreSQL BM25-style ranking (and the previous Cosmos query layer) scores documents by text relevance. If you query knowledge_types=["popularity_ranking"] with a generic question, the ranker returns low-priority attractions written last in the upsert cycle. Headliner rides (TRON, Rise of the Resistance, Seven Dwarfs Mine Train) that were upserted earlier score lower on recency-based metrics. By embedding headliner names directly into the query string — "TRON Rise Resistance Seven Dwarfs Mine Train Guardians Cosmic Rewind popularity ranking" — the text matching layer surfaces the right rides first regardless of upsert order. This was discovered through operational observation: bulletin content was featuring Remy's Ratatouille Adventure over Space Mountain because Remy was written later and scored higher on recency.

Why switch from boto3 Bedrock Agent runtime to LangChain ChatBedrock?+

The Bedrock Agent runtime (RETURN_CONTROL loop) serialized tool calls: the agent emitted one tool call, waited for the response, then decided whether to call another. This meant a 3-tool bulletin build took 3 sequential 45-second HTTP calls = 135 seconds. LangChain's ChatBedrock calls bedrock-runtime:InvokeModel directly — we own the agentic loop. When Claude emits 5 tool calls in a single turn, we fire all 5 in parallel via ThreadPoolExecutor and collect results. The same bulletin build now takes ~45 seconds (cost of the slowest tool). The secondary benefit is output format control: with the Bedrock Agent, getting structured section-marked output reliably required retries. At temperature=0 with explicit format instructions in the system prompt, Claude's first response is always correctly formatted.

How does same-day-of-week dining data work?+

Restaurant wait patterns are strongly day-of-week dependent (Magic Kingdom Saturdays vs. Tuesdays have completely different dining pressure). The structured knowledge query "top 10 restaurants by wait time date={last_same_dow}" resolves {last_same_dow} to the prior week's same day-of-week (e.g., for a Thursday bulletin, it fetches last Thursday's restaurant data). This gives the LLM a meaningful historical dining baseline that reflects the same crowd pattern as today — much more accurate than a trailing 7-day average, which blends weekday and weekend patterns.

Knowledge RAG ↗ Digital Twin Platform ↗ Azure Functions ↗ All Projects ↗