Personal Production Platform · Solo-Built · Azure · GCP · AWS

Park Whisperer
A Full-Stack AI Platform, Built Solo

The Park Whisperer is a personal production project — a complete AI platform for Walt Disney World intelligence, built and operated alone across three cloud providers. It started as a simple data collector in 2023 and has grown into a system that ingests satellite weather data, trains ML models, runs a multi-agent knowledge RAG, generates and publishes daily content to three platforms, and produces short-form video with no human involvement in the workflow. Every component here has a case study.

Case studies

Cloud providers

~10 min

Shortest refresh cadence

<$100

Monthly operating cost

286

Entities tracked

24/7

Production uptime

Live in Production

The platform is running now at park-whisperer.com

All components described below are deployed and active. The chat agent, real-time ride data, live weather JSON, and daily content pipelines are operational. You can explore the live site directly.

Park Agent Chat ↗ Live Ride Waits ↗ Live Weather ↗ AI-Generated Blog ↗ LL Sellout Intel ↗ KPI Dashboard ↗

System Architecture

Every Layer of the Platform

Each box is a real, deployed component. Colors indicate the cloud layer. Arrows show the primary data flow from raw sources through intelligence outputs.

External Data Sources

ThemeParks.wiki API · 286 WDW entities

NOAA GOES-18 GLM · Lightning (L2 LCFA)

FAA METAR · 17 FL stations · aviationweather.gov

NWS / SPC APIs · Hourly forecasts · Alerts

UWyo Radiosonde · CAPE / LI / K-Index

↓

Ingestion Layer · every 5–12 min

park-data-ingest (Azure Container Apps)

lightning-watcher-optimized (GCP Cloud Run)

weather-updater-v6 (GCP Cloud Run)

weather-updater-atmos (12hr · GCP Cloud Run)

outage-tracker (Azure Container Apps · 5min)

↓

Storage Layer

PostgreSQL Flexible Server (Azure · park ops + RAG + pgvector)

BigQuery (GCP · weather_updater · tbl_lightning_whisperer · ml_training_features)

Azure Blob Storage (video assets · content archive)

GCS (ML models · weather JSON API · insights)

↓

ML & Intelligence Layer · every 20–30 min

weather-ml-trainer (on-demand · 6 scikit-learn models)

weather-ml-predictor (20min · nowcast + spatial boost)

weather-insights-generator (20min · risk narratives)

weather-json-publisher (15min · storm state machine)

↓

Knowledge & RAG Layer · Azure Functions

Knowledge ETL pipeline (1.6M+ records → pgvector HNSW)

Phi-3 Mini SLM classifier (Container Apps · query intent routing)

queryIntelligence function (hybrid BM25 + vector search · 34+ knowledge types)

6 Azure Function endpoints (intelligence · forecast · ride-impact · weather)

↓

Agent & Content Generation Layer · scheduled pipelines

SLM content pipeline (10 types · LangChain + Claude via Bedrock)

Multi-model content gen (Phi-3 classify → Haiku tools → Sonnet output)

Conversational park agent (Azure Functions · shared tool library)

park-pipeline-worker (Azure Container App · blog publisher)

↓

Video Production Layer · per publish run

ElevenLabs TTS (per-character timestamps)

Caption generator (word-sync · semantic color coding)

Pillow brand overlay (v2 branding · parallax depth)

FFmpeg encoder (CRF 18 · H.264)

park-video-worker (Azure Container App Job)

↓

Outputs · published or served

Instagram · TikTok · YouTube Shorts (Reels auto-published)

Blog posts (park intelligence articles)

Weather JSON API (public GCS · 5-min refresh · storm state)

Park Agent Chat (Azure Static Web App · natural language Q&A)

Component Index

Every deployed service — with its case study

Service	Cloud	Cadence	What it does	Case Study
park-data-ingest	Azure	10 min	ThemeParks.wiki API → PostgreSQL. 286 entities, wait times, schedules, LL sellout events.	Park Data Ingest ↗
lightning-watcher-optimized	GCP	10 min	GOES-18 GLM NetCDF → BigQuery. Spatial filter, haversine park attribution, dedup insert.	Lightning Tracker ↗
weather-updater-v6	GCP	10 min	METAR ingestion from 17 FL stations → BigQuery weather_updater_v3. Present weather decode, derived fields.	Weather ML ↗
weather-updater-atmos	GCP	12 hr	Tampa Bay radiosonde (Station 72210) → BigQuery. CAPE, CIN, Lifted Index, K-Index, PWAT.	Weather ML ↗
weather-ml-trainer	GCP	on-demand	Trains 6 scikit-learn models on 44K+ observations. Serializes to GCS pkl files.	Weather ML ↗
weather-ml-predictor	GCP	20 min	Loads models from GCS, spatial ST_DISTANCE query, applies boost, publishes nowcast JSON.	Weather ML ↗
weather-json-publisher	GCP	15 min	Fuses GLM + METAR + NWS + SPC into 5-state storm machine. Publishes live.json to GCS.	Lightning Tracker ↗
outage-tracker	Azure	5 min	Correlates live ride DOWN events with weather risk scores. Opens/closes outage log entries.	Lightning Tracker ↗
Knowledge ETL + pgvector	Azure	scheduled	Builds 1.6M+ record knowledge base. OpenAI embeddings → PostgreSQL HNSW index.	Knowledge RAG ↗
queryIntelligence (Azure Function)	Azure	on-demand	Phi-3 Mini classifies intent → routes to 3 query strategies (vector, numeric, hybrid). 34+ knowledge types.	Knowledge RAG ↗
SLM content pipeline	Azure AWS	scheduled	LangChain agent + Claude on Bedrock. 10 pipeline types, 30+ tools, 3-pass generation, zero hallucinated data.	SLM Pipeline ↗
park-video-worker	Azure	per publish	ElevenLabs TTS → word-sync captions → Pillow brand overlay → FFmpeg encode → publish.	AI Video Team ↗
Park Agent Chat (SWA)	Azure	on-demand	Natural language Q&A via Azure Static Web App proxy. Shares function + tool library with content pipelines.	Park Agent ↗

Why It's Built This Way

Key architectural decisions across the platform

Single Azure Function serves agent + content pipelines

The conversational park agent and all 10 scheduled content pipelines share the same Azure Function, SLM classifier, and tool library. There's no forked code between interactive and batch use cases. The query routing logic was designed to work equally well whether triggered by a user question or a pipeline cron event.

Park Agent Chat ↗

PostgreSQL over Cosmos DB for RAG and ops data

The platform migrated to Cosmos DB at 5.9M documents and migrated back after discovering the NoSQL round-trip was adding $394/month with no benefit. PostgreSQL with pgvector handles both the vector search workload and the operational data — fixed cost, no per-query billing, simpler architecture.

Data Pipeline Evolution ↗

GOES-18 satellite over commercial lightning networks

Commercial ground-based lightning detection is licensed data with per-strike pricing. NOAA GOES-18 GLM data is free, 24/7, and covers Central Florida with optimal nadir accuracy. Pre-computing park attribution at ingest time means all downstream consumers get park-specific flash counts with a simple WHERE clause — no runtime spatial math.

Lightning Tracker ↗

Multi-cloud by workload fit, not by preference

Azure hosts the operational data (PostgreSQL), agent functions, and content pipelines because the function platform and Container Apps cost model is well-suited to event-driven workloads. GCP hosts the weather ML because BigQuery's SQL-native ML and geospatial functions (ST_DISTANCE) make the spatial storm tracking query trivial. AWS Bedrock provides Claude access for content generation. Each cloud does what it's best at.

About ↗

Word-sync video captions from per-character TTS timestamps

ElevenLabs' convert_with_timestamps returns per-character timing that the caption generator uses to compute exact word boundaries. This drives frame-accurate word-pop captions where each word highlights as it's spoken — numbers purple, ride names mint, status coral. The alternative (WebVTT or forced-alignment) would require a second AI call or external tool.

AI Video Team ↗

Storm state machine requires confirming signals, not single-source triggers

The CLEAR → SEVERE state machine fuses GLM flash data, METAR flags, NWS alerts, storm tracking vectors, NWS hourly probabilities, and SPC categorical outlook. No single source triggers a state change alone — each has failure modes (stale soundings, automation errors, alert latency) that would produce false alerts if used in isolation.

Lightning Tracker ↗

Park WhispererA Full-Stack AI Platform, Built Solo

Every Layer of the Platform

Every deployed service — with its case study

Key architectural decisions across the platform

Every Park Whisperer component, in depth

Park Whisperer
A Full-Stack AI Platform, Built Solo