Weather ML Intelligence Pipeline

Data Pipeline

From Raw Aviation Reports to Operational Predictions

The intelligence pipeline transforms raw FAA METAR observations into park-actionable predictions through four distinct processing stages, each running as an independent GCP Cloud Run Job on its own cadence.

Stage 1 · Every 10 min

METAR Ingestion — weather-updater-v6

Polls 17 Central Florida FAA stations via aviationweather.gov. Parses raw METAR strings: decodes present weather groups (TS, VCTS, LTG for thunderstorms; BR/FG for fog), computes derived fields (dewpoint depression, precip rolling windows), and upserts to BigQuery weather_updater_v3. Also flags adverse events: adverse_thunderstorm, adverse_high_wind_event, adverse_fog_mist.

aviationweather.gov API BigQuery weather_updater_v3 METAR present-weather decode

Stage 2 · Every 12 hrs

Atmospheric Soundings — weather-updater-atmos

Fetches upper-air radiosonde data from the University of Wyoming soundings archive for Tampa Bay station 72210. Parses CAPE (convective available potential energy), CIN (convective inhibition), Lifted Index, K-Index, Total Totals Index, Showalter Index, and PWAT (precipitable water). These 7 atmospheric parameters are the most predictive features for the thunderstorm model — they describe the thermodynamic state of the atmosphere above the surface, where convection is initiated.

UWyo soundings archive (Station 72210) CAPE / CIN / Lifted Index K-Index · Total Totals · PWAT

Stage 3 · On-demand

Model Training — weather-ml-trainer

Reads from ml_training_features — a BigQuery view that assembles hourly feature vectors from 8+ years of historical METAR and atmospheric data. Trains 6 independent scikit-learn models and serializes each to GCS as a pickled .pkl file. Training runs on-demand after data accumulation milestones or when model drift is detected. The thunderstorm model uses all 7 atmospheric indices plus METAR trends; the fog model uses only 6 surface-level features (atmospheric soundings don't add predictive value for radiation fog).

RandomForestClassifier · GradientBoostingRegressor 44,000+ training samples GCS pickled model storage

Stage 4 · Every 20 min

ML Predictions — weather-ml-predictor

Loads pickled models from GCS, queries current conditions + spatial storm context via BigQuery, and generates prediction payloads for all 17 stations. Applies a spatial boost adjustment to the thunderstorm model when nearby or upwind stations report active storms. Publishes per-station nowcast JSON and venue-impact JSON to public GCS. Total execution time: <30 seconds.

JSON to GCS (public, no auth) Spatial boost via ST_DISTANCE <30s execution, <$0.01/run

Six Trained Models

Model Performance at 44,000+ Observations

All models use Random Forest classifiers with class-weight balancing to handle the natural imbalance of severe weather events (thunderstorms occur on roughly 15–20% of summer afternoons). ROC AUC scores reflect test set evaluation on a 20% holdout using stratified splitting.

Model	Target	ROC AUC	Accuracy	Key Feature Groups
thunderstorm_nowcast RandomForest · n=150 · depth=12 · balanced	`target_thunderstorm`	0.750	72%	CAPE, Lifted Index, K-Index, Total Totals, PWAT, pressure change, dewpoint depression
precipitation_prediction RandomForest · balanced	`target_precipitation`	0.990	99%	Precip rolling windows, dewpoint depression, visibility, hourly precip
high_wind_prediction RandomForest · balanced	`target_high_wind` ≥25 kt gusts	1.000	100%	Wind speed, wind gust change 1h, pressure change, temperature change 3h
fog_prediction RandomForest · n=100 · depth=8 · balanced	`target_fog`	0.900	92%	Dewpoint depression, visibility, wind speed, hour of day (radiation fog window)
venue_impact_prediction RandomForest · KISM primary features	Operational impact at WDW	0.847	81%	KISM-specific features (nearest WDW station), multi-hour precip, atmospheric indices
high_impact_venue_prediction RandomForest · high-impact threshold	High-severity operational impact	0.950	95%	KISM + CAPE thresholds, historical high-impact event labels

Fog model intentionally uses no atmospheric sounding data — the model learned that ground-truth surface METAR features (dewpoint depression <2°F, wind <5 kt, visibility trending down) were more predictive for Florida radiation fog than upper-air instability indices, which describe convective initiation rather than surface-based fog formation.

Feature Engineering

What the Models Actually See

The thunderstorm nowcast model uses 17 input features across four categories. The atmospheric sounding features (CAPE, LI, K-Index) consistently rank as the top predictors — surface METAR alone cannot capture the pre-storm thermodynamic instability that drives Florida convection.

Atmospheric Soundings (Upper-Air)

cape_j_kg

Energy for convective initiation

cin_j_kg

Inhibition — lid suppressing storms

lifted_index

Parcel stability; <-2 = unstable

k_index

Moisture depth + lapse rate

total_totals_index

Thunderstorm composite index

showalter_index

Shallow convection threshold

pwat_inches

Total column water vapor

Surface METAR Observations

avg_temp_f

Surface temperature

dewpoint_depression

temp_f − dewpoint_f; <5°F = humid

avg_pressure_hpa

Station pressure (altimeter)

pressure_change_1h

Falling = approaching system

pressure_change_3h

Medium-term trend

total_precip_in

Hourly accumulation

avg_wind_speed_kt

Sustained wind

max_wind_gust_kt

Peak gust in window

Spatial Storm Context

nearby_tstm_count

Stations with TS within 100 mi

nearby_precip_count

Stations with precip within 100 mi

upwind_tstm_count

TS stations within 150 mi (all dirs)

Computed via BigQuery ST_DISTANCE CROSS JOIN across all 17 stations per prediction cycle

Temporal Context

hour_of_day

Critical for FL sea-breeze window

temp_change_1h

Rapid cooling = outflow boundary

temp_change_3h

Diurnal heating trend

Florida peak convective window: 15:00–20:00 UTC (11 AM–4 PM local). hour_of_day captures this pattern — storm probability peaks sharply in afternoon hours.

Spatial Storm Tracking

How Nearby Storms Modify the Prediction

Pure single-station METAR features miss a critical signal: a station's atmosphere can still be clear while a storm 50 miles away is confirmed active and propagating toward it. The spatial boost uses BigQuery's geography functions to measure real-time storm proximity across the entire 17-station network and inject that awareness into the model output.

Three-Ring Spatial Context Around Each Station

100 mi — Nearby Ring

+20% boost per active TS station

Immediate threat vector
Any confirmed TS in window
Drives ACTIVE storm state

150 mi — Upwind Ring

+15% boost per active TS station

Approaching storm signal
Includes all bearing directions
Typical Florida sea-breeze reach

Cross-JOIN Architecture

BigQuery ST_DISTANCE

17 × 17 = 289 distance pairs
Computed fresh every 20 min
Filters to latest obs per station

spatial_boost = min(0.6, nearby_tstm × 0.20 + upwind_tstm × 0.15)
final_prob = min(0.95, model_prob + spatial_boost)

# Example: 1 nearby storm + 1 upwind storm
# spatial_boost = min(0.6, 0.20 + 0.15) = 0.35
# If base model says 9% → boosted to 44% (MODERATE risk level)

Why the boost is additive rather than multiplicative+

A multiplicative boost (probability × multiplier) would have diminishing returns on already-elevated probabilities — a 60% base probability multiplied by 1.5 gives 90%, whereas a base of 5% gives only 7.5%. The effect is too variable depending on where in the probability range the model lands.

An additive offset applies a fixed increment regardless of starting point, which more faithfully represents the idea that "a confirmed storm 50 miles away adds approximately X% to your probability of seeing one in the next 2 hours." The hard clamps at 0.6 maximum boost and 0.95 maximum final probability prevent overflow and preserve model calibration at the extremes.

The clear-sky suppressor: why CAPE alone can't trigger a HIGH risk+

Tampa Bay atmospheric soundings are collected twice daily (00Z and 12Z) and have a 25-hour valid window. A sounding from the previous morning can show high CAPE values that are accurate for that time but stale by noon the next day, when skies are clear and sea-breeze dynamics have not yet initialized.

Without a suppressor, the BigQuery views would emit HIGH thunderstorm risk scores on clear sunny mornings because the cached CAPE value from 18 hours prior is still loaded. The clear-sky suppressor in v_weather_current_enhanced gates CAPE and Lifted Index contributions: if the current METAR shows no present weather, visibility ≥5 miles, and no precipitation in the last 3 hours, atmospheric contributions are zeroed out. This prevents false HIGH alerts on clear mornings while preserving them when surface conditions corroborate the upper-air instability.

The morning dewpoint suppression is a companion rule: the dewpoint depression risk score contribution is gated to afternoon hours (11:00–21:00 local), since high dewpoint readings at 06:00 AM don't indicate convective risk — they indicate overnight humidity that will dry out with morning solar heating.

Why 8 years of data matters for a binary classification problem+

Thunderstorm events at any given FAA station occur roughly 70–80 days per year in Central Florida — the highest-frequency severe weather environment in the continental US. But the class imbalance is still significant: hourly observations are collected roughly 24 × 365 = 8,760 times per year, of which perhaps 500–700 contain a thunderstorm flag. That's a ~6–8% positive class rate.

With one year of data this would produce approximately 500–700 positive samples for training — sufficient for a basic model but too thin for reliable probability calibration, especially for the multi-feature atmospheric interactions that drive convection. Eight years gives approximately 4,000–5,600 positive samples, enough to reliably learn the atmospheric state patterns that precede Florida afternoon thunderstorms (specifically: high CAPE + low CIN + elevated K-Index + afternoon local time + sea-breeze convergence signals in the surface observations).

The class_weight='balanced' parameter also compensates by up-weighting the minority (thunderstorm) class during training, but this only helps if there are enough minority samples for the tree structure to learn meaningful splits. Eight years of data provides that depth.

Venue impact models: translating meteorology to operational decisions+

The venue_impact_prediction and high_impact_venue_prediction models are KISM-specific — KISM (Kissimmee Gateway Airport, 3.5 miles from Magic Kingdom) is the most spatially relevant FAA station for WDW operations. The venue impact target is derived from historical correlation between KISM weather events and observed ride closure rates in the park operations database.

A thunderstorm at KISM doesn't automatically mean ride closures. The correlation depends on storm intensity (CAPE-derived), proximity (KISM vs. peripheral stations), and which rides are operating (outdoor coasters close at different thresholds than indoor dark rides). The venue impact models learn this relationship directly from the historical data rather than encoding fixed rules, producing a probability output that reflects the empirical operational impact of past weather events on the same property.

Station Network

17-Station Central Florida Coverage

Coverage spans a ~100-mile radius around WDW. KISM is designated the primary venue-impact station. Multi-station spatial queries are the foundation of the storm tracking system.

KISM

Kissimmee Gateway Airport

Primary WDW venue station — 3.5 mi from Magic Kingdom. Used for all venue-impact models.

KMCO

Orlando International

Regional anchor — 15 mi NE of WDW. High-reliability obs, good upper-air consistency.

KORL

Orlando Executive

Urban Orlando urban heat island reference. 12 mi NE.

KLAL

Lakeland Linder

West vector — storm track from Tampa Bay sea-breeze convergence zone.

KTPA

Tampa International

65 mi west. Critical west-approach upwind station; Tampa Bay sea-breeze origin.

KSFB

Sanford Orlando

North vector — I-4 corridor storms propagating south.

KVDF

Tampa Executive

SW approach. Sea-breeze collision zone between Gulf and Atlantic moisture.

KMLB

Melbourne Orlando Intl

East vector — Atlantic sea-breeze. Storms initiate at the east coast and move west.

+ 9 more

Daytona · Ocala · Gainesville · Sebring · Vero Beach · Punta Gorda · Fort Myers · Sarasota · Flagler

Outer ring for long-range spatial context (100–150 mi upwind detection).