The Weather Whisperer ML pipeline ingests METAR aviation weather reports from 17 Central Florida stations, fuses them with upper-air atmospheric soundings (CAPE, Lifted Index, K-Index), and produces six trained models that generate 2-hour thunderstorm, precipitation, wind, fog, and venue-impact nowcasts every 20 minutes — purpose-built to tell park visitors what weather means for the rides they're about to board.
The intelligence pipeline transforms raw FAA METAR observations into park-actionable predictions through four distinct processing stages, each running as an independent GCP Cloud Run Job on its own cadence.
TS, VCTS, LTG for thunderstorms; BR/FG for fog), computes derived fields (dewpoint depression, precip rolling windows), and upserts to BigQuery weather_updater_v3. Also flags adverse events: adverse_thunderstorm, adverse_high_wind_event, adverse_fog_mist.ml_training_features — a BigQuery view that assembles hourly feature vectors from 8+ years of historical METAR and atmospheric data. Trains 6 independent scikit-learn models and serializes each to GCS as a pickled .pkl file. Training runs on-demand after data accumulation milestones or when model drift is detected. The thunderstorm model uses all 7 atmospheric indices plus METAR trends; the fog model uses only 6 surface-level features (atmospheric soundings don't add predictive value for radiation fog).All models use Random Forest classifiers with class-weight balancing to handle the natural imbalance of severe weather events (thunderstorms occur on roughly 15–20% of summer afternoons). ROC AUC scores reflect test set evaluation on a 20% holdout using stratified splitting.
| Model | Target | ROC AUC | Accuracy | Key Feature Groups |
|---|---|---|---|---|
|
thunderstorm_nowcast
RandomForest · n=150 · depth=12 · balanced
|
target_thunderstorm |
72% | CAPE, Lifted Index, K-Index, Total Totals, PWAT, pressure change, dewpoint depression | |
|
precipitation_prediction
RandomForest · balanced
|
target_precipitation |
99% | Precip rolling windows, dewpoint depression, visibility, hourly precip | |
|
high_wind_prediction
RandomForest · balanced
|
target_high_wind ≥25 kt gusts |
100% | Wind speed, wind gust change 1h, pressure change, temperature change 3h | |
|
fog_prediction
RandomForest · n=100 · depth=8 · balanced
|
target_fog |
92% | Dewpoint depression, visibility, wind speed, hour of day (radiation fog window) | |
|
venue_impact_prediction
RandomForest · KISM primary features
|
Operational impact at WDW | 81% | KISM-specific features (nearest WDW station), multi-hour precip, atmospheric indices | |
|
high_impact_venue_prediction
RandomForest · high-impact threshold
|
High-severity operational impact | 95% | KISM + CAPE thresholds, historical high-impact event labels |
Fog model intentionally uses no atmospheric sounding data — the model learned that ground-truth surface METAR features (dewpoint depression <2°F, wind <5 kt, visibility trending down) were more predictive for Florida radiation fog than upper-air instability indices, which describe convective initiation rather than surface-based fog formation.
The thunderstorm nowcast model uses 17 input features across four categories. The atmospheric sounding features (CAPE, LI, K-Index) consistently rank as the top predictors — surface METAR alone cannot capture the pre-storm thermodynamic instability that drives Florida convection.
cape_j_kgcin_j_kglifted_indexk_indextotal_totals_indexshowalter_indexpwat_inchesavg_temp_fdewpoint_depressionavg_pressure_hpapressure_change_1hpressure_change_3htotal_precip_inavg_wind_speed_ktmax_wind_gust_ktnearby_tstm_countnearby_precip_countupwind_tstm_counthour_of_daytemp_change_1htemp_change_3hhour_of_day captures this pattern — storm probability peaks sharply in afternoon hours.Pure single-station METAR features miss a critical signal: a station's atmosphere can still be clear while a storm 50 miles away is confirmed active and propagating toward it. The spatial boost uses BigQuery's geography functions to measure real-time storm proximity across the entire 17-station network and inject that awareness into the model output.
A multiplicative boost (probability × multiplier) would have diminishing returns on already-elevated probabilities — a 60% base probability multiplied by 1.5 gives 90%, whereas a base of 5% gives only 7.5%. The effect is too variable depending on where in the probability range the model lands.
An additive offset applies a fixed increment regardless of starting point, which more faithfully represents the idea that "a confirmed storm 50 miles away adds approximately X% to your probability of seeing one in the next 2 hours." The hard clamps at 0.6 maximum boost and 0.95 maximum final probability prevent overflow and preserve model calibration at the extremes.
Tampa Bay atmospheric soundings are collected twice daily (00Z and 12Z) and have a 25-hour valid window. A sounding from the previous morning can show high CAPE values that are accurate for that time but stale by noon the next day, when skies are clear and sea-breeze dynamics have not yet initialized.
Without a suppressor, the BigQuery views would emit HIGH thunderstorm risk scores on clear sunny mornings because the cached CAPE value from 18 hours prior is still loaded. The clear-sky suppressor in v_weather_current_enhanced gates CAPE and Lifted Index contributions: if the current METAR shows no present weather, visibility ≥5 miles, and no precipitation in the last 3 hours, atmospheric contributions are zeroed out. This prevents false HIGH alerts on clear mornings while preserving them when surface conditions corroborate the upper-air instability.
The morning dewpoint suppression is a companion rule: the dewpoint depression risk score contribution is gated to afternoon hours (11:00–21:00 local), since high dewpoint readings at 06:00 AM don't indicate convective risk — they indicate overnight humidity that will dry out with morning solar heating.
Thunderstorm events at any given FAA station occur roughly 70–80 days per year in Central Florida — the highest-frequency severe weather environment in the continental US. But the class imbalance is still significant: hourly observations are collected roughly 24 × 365 = 8,760 times per year, of which perhaps 500–700 contain a thunderstorm flag. That's a ~6–8% positive class rate.
With one year of data this would produce approximately 500–700 positive samples for training — sufficient for a basic model but too thin for reliable probability calibration, especially for the multi-feature atmospheric interactions that drive convection. Eight years gives approximately 4,000–5,600 positive samples, enough to reliably learn the atmospheric state patterns that precede Florida afternoon thunderstorms (specifically: high CAPE + low CIN + elevated K-Index + afternoon local time + sea-breeze convergence signals in the surface observations).
The class_weight='balanced' parameter also compensates by up-weighting the minority (thunderstorm) class during training, but this only helps if there are enough minority samples for the tree structure to learn meaningful splits. Eight years of data provides that depth.
The venue_impact_prediction and high_impact_venue_prediction models are KISM-specific — KISM (Kissimmee Gateway Airport, 3.5 miles from Magic Kingdom) is the most spatially relevant FAA station for WDW operations. The venue impact target is derived from historical correlation between KISM weather events and observed ride closure rates in the park operations database.
A thunderstorm at KISM doesn't automatically mean ride closures. The correlation depends on storm intensity (CAPE-derived), proximity (KISM vs. peripheral stations), and which rides are operating (outdoor coasters close at different thresholds than indoor dark rides). The venue impact models learn this relationship directly from the historical data rather than encoding fixed rules, producing a probability output that reflects the empirical operational impact of past weather events on the same property.
Coverage spans a ~100-mile radius around WDW. KISM is designated the primary venue-impact station. Multi-station spatial queries are the foundation of the storm tracking system.