Personal Production Platform · Solo-Built · Azure · GCP · AWS

Park Whisperer
A Full-Stack AI Platform, Built Solo

The Park Whisperer is a personal production project — a complete AI platform for Walt Disney World intelligence, built and operated alone across three cloud providers. It started as a simple data collector in 2023 and has grown into a system that ingests satellite weather data, trains ML models, runs a multi-agent knowledge RAG, generates and publishes daily content to three platforms, and produces short-form video with no human involvement in the workflow. Every component here has a case study.

10
Case studies
3
Cloud providers
~10 min
Shortest refresh cadence
<$100
Monthly operating cost
286
Entities tracked
24/7
Production uptime
Live in Production
The platform is running now at park-whisperer.com
All components described below are deployed and active. The chat agent, real-time ride data, live weather JSON, and daily content pipelines are operational. You can explore the live site directly.

Every Layer of the Platform

Each box is a real, deployed component. Colors indicate the cloud layer. Arrows show the primary data flow from raw sources through intelligence outputs.

External Data Sources
ThemeParks.wiki API · 286 WDW entities
NOAA GOES-18 GLM · Lightning (L2 LCFA)
FAA METAR · 17 FL stations · aviationweather.gov
NWS / SPC APIs · Hourly forecasts · Alerts
UWyo Radiosonde · CAPE / LI / K-Index
Ingestion Layer · every 5–12 min
park-data-ingest (Azure Container Apps)
lightning-watcher-optimized (GCP Cloud Run)
weather-updater-v6 (GCP Cloud Run)
weather-updater-atmos (12hr · GCP Cloud Run)
outage-tracker (Azure Container Apps · 5min)
Storage Layer
PostgreSQL Flexible Server (Azure · park ops + RAG + pgvector)
BigQuery (GCP · weather_updater · tbl_lightning_whisperer · ml_training_features)
Azure Blob Storage (video assets · content archive)
GCS (ML models · weather JSON API · insights)
ML & Intelligence Layer · every 20–30 min
weather-ml-trainer (on-demand · 6 scikit-learn models)
weather-ml-predictor (20min · nowcast + spatial boost)
weather-insights-generator (20min · risk narratives)
weather-json-publisher (15min · storm state machine)
Knowledge & RAG Layer · Azure Functions
Knowledge ETL pipeline (1.6M+ records → pgvector HNSW)
Phi-3 Mini SLM classifier (Container Apps · query intent routing)
queryIntelligence function (hybrid BM25 + vector search · 34+ knowledge types)
6 Azure Function endpoints (intelligence · forecast · ride-impact · weather)
Agent & Content Generation Layer · scheduled pipelines
SLM content pipeline (10 types · LangChain + Claude via Bedrock)
Multi-model content gen (Phi-3 classify → Haiku tools → Sonnet output)
Conversational park agent (Azure Functions · shared tool library)
park-pipeline-worker (Azure Container App · blog publisher)
Video Production Layer · per publish run
ElevenLabs TTS (per-character timestamps)
Caption generator (word-sync · semantic color coding)
Pillow brand overlay (v2 branding · parallax depth)
FFmpeg encoder (CRF 18 · H.264)
park-video-worker (Azure Container App Job)
Outputs · published or served
Instagram · TikTok · YouTube Shorts (Reels auto-published)
Blog posts (park intelligence articles)
Weather JSON API (public GCS · 5-min refresh · storm state)
Park Agent Chat (Azure Static Web App · natural language Q&A)

Every deployed service — with its case study

Service Cloud Cadence What it does Case Study
park-data-ingest
Azure 10 min ThemeParks.wiki API → PostgreSQL. 286 entities, wait times, schedules, LL sellout events. Park Data Ingest ↗
lightning-watcher-optimized
GCP 10 min GOES-18 GLM NetCDF → BigQuery. Spatial filter, haversine park attribution, dedup insert. Lightning Tracker ↗
weather-updater-v6
GCP 10 min METAR ingestion from 17 FL stations → BigQuery weather_updater_v3. Present weather decode, derived fields. Weather ML ↗
weather-updater-atmos
GCP 12 hr Tampa Bay radiosonde (Station 72210) → BigQuery. CAPE, CIN, Lifted Index, K-Index, PWAT. Weather ML ↗
weather-ml-trainer
GCP on-demand Trains 6 scikit-learn models on 44K+ observations. Serializes to GCS pkl files. Weather ML ↗
weather-ml-predictor
GCP 20 min Loads models from GCS, spatial ST_DISTANCE query, applies boost, publishes nowcast JSON. Weather ML ↗
weather-json-publisher
GCP 15 min Fuses GLM + METAR + NWS + SPC into 5-state storm machine. Publishes live.json to GCS. Lightning Tracker ↗
outage-tracker
Azure 5 min Correlates live ride DOWN events with weather risk scores. Opens/closes outage log entries. Lightning Tracker ↗
Knowledge ETL + pgvector
Azure scheduled Builds 1.6M+ record knowledge base. OpenAI embeddings → PostgreSQL HNSW index. Knowledge RAG ↗
queryIntelligence (Azure Function)
Azure on-demand Phi-3 Mini classifies intent → routes to 3 query strategies (vector, numeric, hybrid). 34+ knowledge types. Knowledge RAG ↗
SLM content pipeline
Azure AWS scheduled LangChain agent + Claude on Bedrock. 10 pipeline types, 30+ tools, 3-pass generation, zero hallucinated data. SLM Pipeline ↗
park-video-worker
Azure per publish ElevenLabs TTS → word-sync captions → Pillow brand overlay → FFmpeg encode → publish. AI Video Team ↗
Park Agent Chat (SWA)
Azure on-demand Natural language Q&A via Azure Static Web App proxy. Shares function + tool library with content pipelines. Park Agent ↗

Key architectural decisions across the platform

Single Azure Function serves agent + content pipelines
The conversational park agent and all 10 scheduled content pipelines share the same Azure Function, SLM classifier, and tool library. There's no forked code between interactive and batch use cases. The query routing logic was designed to work equally well whether triggered by a user question or a pipeline cron event.
PostgreSQL over Cosmos DB for RAG and ops data
The platform migrated to Cosmos DB at 5.9M documents and migrated back after discovering the NoSQL round-trip was adding $394/month with no benefit. PostgreSQL with pgvector handles both the vector search workload and the operational data — fixed cost, no per-query billing, simpler architecture.
GOES-18 satellite over commercial lightning networks
Commercial ground-based lightning detection is licensed data with per-strike pricing. NOAA GOES-18 GLM data is free, 24/7, and covers Central Florida with optimal nadir accuracy. Pre-computing park attribution at ingest time means all downstream consumers get park-specific flash counts with a simple WHERE clause — no runtime spatial math.
Multi-cloud by workload fit, not by preference
Azure hosts the operational data (PostgreSQL), agent functions, and content pipelines because the function platform and Container Apps cost model is well-suited to event-driven workloads. GCP hosts the weather ML because BigQuery's SQL-native ML and geospatial functions (ST_DISTANCE) make the spatial storm tracking query trivial. AWS Bedrock provides Claude access for content generation. Each cloud does what it's best at.
Word-sync video captions from per-character TTS timestamps
ElevenLabs' convert_with_timestamps returns per-character timing that the caption generator uses to compute exact word boundaries. This drives frame-accurate word-pop captions where each word highlights as it's spoken — numbers purple, ride names mint, status coral. The alternative (WebVTT or forced-alignment) would require a second AI call or external tool.
Storm state machine requires confirming signals, not single-source triggers
The CLEAR → SEVERE state machine fuses GLM flash data, METAR flags, NWS alerts, storm tracking vectors, NWS hourly probabilities, and SPC categorical outlook. No single source triggers a state change alone — each has failure modes (stale soundings, automation errors, alert latency) that would produce false alerts if used in isolation.
Explore All Case Studies

Every Park Whisperer component, in depth

All Case Studies ↗ About ↗
Park Agent Chat ↗ Knowledge RAG ↗ Weather ML Pipeline ↗ Lightning Tracker ↗ SLM Content Pipeline ↗ AI Video Team ↗ Park Data Ingest ↗ Data Pipeline Evolution ↗ Weather Whisperer ↗ Multi-Model Content ↗ Ride Whisperer v6 ↗ Ride Whisperer (Prophet) ↗