Every 20 seconds, the GOES-18 Geostationary Lightning Mapper detects transient luminosity from lightning across the Western Hemisphere. This case study traces the complete pipeline from satellite NetCDF files to park-specific spatial attribution, storm state transitions, and per-ride closure probability — and explains the non-obvious design decisions that make satellite lightning data operationally useful for a theme park visitor.
Standard lightning detection services (Vaisala's NLDN, Earth Networks) are ground-based networks that triangulate stroke location from electromagnetic signals. GOES-18's GLM sensor is fundamentally different: it's an optical photometer operating at 777.4 nm that detects the light emitted by the lightning channel itself — from 35,786 km above the equator.
The GLM produces three product levels: L2 Events (individual optical pulses, ~10ms), L2 Groups (spatially and temporally connected events), and L2 Flashes (the full merged discharge event from initial leader to final return stroke, potentially covering multiple groups). This pipeline uses the Flash product (LCFA files).
Using the Event level would produce 5–50× more data points per real storm — each lightning bolt produces multiple optical pulses, and the event product records each one separately. This inflates flash counts dramatically and makes the flash-rate-per-hour metric unusable as a storm intensity proxy. The Flash product merges all optical pulses from a single discharge into one record with an aggregate energy value, which is the meteorologically meaningful unit — it maps to what a human observing the storm would count as "one bolt."
The consequence for the park intelligence system: when the storm state machine checks flash_rate_per_hour >= 30 to trigger SEVERE state, that threshold is calibrated to Flash-level counts. Using Event-level data with the same threshold would trigger SEVERE for any active cell.
Commercial ground-based lightning networks (NLDN, Earth Networks) are licensed data with per-strike or subscription pricing. At the volume this system ingests — potentially hundreds of flashes per hour during active Florida summer storms — the cost would be prohibitive compared to the free NOAA GOES-18 data feed.
More importantly, ground networks have coverage gaps. Network efficiency (the percentage of actual strokes that are detected) varies by geographic area, antenna spacing, and soil conductivity. Central Florida's geography over lakes and wetlands causes variable detection efficiency. The GLM has no such gaps — it has uniform optical coverage across the entire CONUS domain from 35,000+ km altitude, with Central Florida well within the optimal nadir zone where spatial accuracy is best.
The trade-off: GLM's ~8–12 km positional accuracy is coarser than ground-based networks which can locate strokes to <500 m. For a 20-mile radius park filter, 8–12 km error is operationally acceptable — a flash that's actually 15 miles away but is measured as 14 miles away due to sensor accuracy is still correctly classified as "within range."
Each row is one Level-2 Flash event that passed the spatial filter. The most important design
decision in the schema is the addition of nearest_park and distance_miles:
computed at ingest time so consumers never need to run geospatial math at query time.
| Field | Type | Meteorological / Engineering Meaning |
|---|---|---|
| strike_id | STRING | Stable unique ID per flash. Derived from source NetCDF filename + flash index. Safe for deduplication across ingestion runs — idempotent upsert key. |
| timestamp | TIMESTAMP | UTC flash time. Central Florida is UTC-4 (EDT) in storm season. Peak convective hours: 1500–2000 UTC (11 AM–4 PM local). The service does not write rows for zero-flash windows — missing hours mean no activity, not missing data. |
| latitude longitude |
FLOAT | Flash centroid in WGS84. Positional accuracy ~8–12 km at nadir. Central Florida is well within the GOES-18 optimal coverage zone (closer to satellite sub-point = better spatial resolution). |
| energy | FLOAT | Radiant energy in joules. Strong proxy for convective intensity — high-energy flashes indicate vigorous updrafts and dense ice crystal zones in the mixed-phase layer. Nullable: treat as enrichment, not a required field. Used to compute avg_energy in the summary. |
| nearest_park | STRING | Pre-computed at ingest: which WDW park center (Magic Kingdom, EPCOT, Hollywood Studios, Animal Kingdom) is spatially closest. Enables instant per-park flash count queries without any runtime geospatial computation. |
| distance_miles | FLOAT | Haversine distance from flash centroid to nearest park center (0.00–20.00 mi). Pre-computed at ingest. The primary proximity metric used by the storm state machine (nearest_flash_miles <= 10 → ACTIVE, <= 20 → APPROACH). |
| type | STRING | Always "flash" — reserved for future product levels (group, event) if needed. |
| source_file | STRING | Source GOES-18 NetCDF filename. Embeds exact satellite scan time in filename pattern (GLM_L2_LCFA_G18_*.nc). Used for deduplication audit and provenance tracing. |
The WDW resort complex spans ~40 square miles with four major parks separated by up to 3 miles. The spatial system uses two layers of filtering: a rectangular bounding box for fast pre-filtering, and precise haversine distance calculations against each park's exact center coordinate.
The naive approach to answering "which flashes are near Magic Kingdom right now?" is to join tbl_lightning_whisperer against a park coordinates table and compute haversine distance at query time. For a 30-minute window with 200 flashes, this is fast. But the publisher runs every 15 minutes and queries this table every cycle, and the storm state machine also queries it — potentially dozens of computations per run.
Pre-computing at ingest means every query against the table is a simple WHERE distance_miles <= X AND nearest_park = 'Magic Kingdom' — a standard range scan on numeric columns. No spatial join, no function call per row, no repeated coordinate lookups. BigQuery can use column-level statistics to execute these queries efficiently at any scale.
The nearest_park assignment also serves a display and analysis function: park-level flash counts ("5 flashes within 10 miles of Animal Kingdom in the last 30 minutes") are an instantly queryable aggregation. Without pre-computation, this would require a subquery per park per query.
Florida summer thunderstorms are typically fast-developing, isolated convective cells (not organized squall lines). A sea-breeze cell can go from initiation to first flash in 15–25 minutes, and typical storm motion across Central Florida is 10–20 mph — primarily east-to-west in the afternoon as Gulf sea-breeze moisture meets Atlantic sea-breeze moisture over the I-4 corridor.
At 15 mph storm motion, a cell 20 miles away will reach the park in approximately 80 minutes. This provides enough lead time for guests to receive a meaningful warning: the system's storm state transitions from APPROACH to ACTIVE as the storm closes within 10 miles, giving guests roughly 40 minutes of warning before the storm arrives. A larger radius (50 miles) would produce excessive false alerts for storms that dissipate before reaching the park. A smaller radius (10 miles) would produce warnings with insufficient lead time for organized park response.
The 10-mile threshold for the ACTIVE trigger is calibrated to park lightning safety protocols: many outdoor venues begin shelter-in-place procedures when lightning is within 8–10 miles. The 10-mile threshold allows the system to recommend seeking shelter approximately in-step with what the park's own safety team is doing.
nearest_park (minimum distance park) and distance_miles. Discard if distance to nearest park exceeds 20 miles.strike_id from the NetCDF filename + flash index. This makes BigQuery upserts idempotent — re-running the same 10-minute window after a transient failure never creates duplicates.tbl_lightning_whisperer. Publisher queries this table every 15 minutes for the last 30 minutes, computing flash rate, nearest flash distance, and per-park totals.The publisher computes a unified storm state every 15 minutes by fusing GLM flash data, METAR thunderstorm flags from 17 stations, NWS alert status, storm tracking vectors, NWS hourly probability-of-thunderstorm forecasts, and SPC categorical outlook risk levels. No single source triggers a state change alone — the system requires confirming signals.
Each individual data source has failure modes that would produce false alerts if used alone. METAR TS flags depend on the station observer (some stations are automated and can miss events). The NWS alert API has latency after the forecast office issues a warning. The GLM can capture residual flashes from a dissipating cell that's already past the park. The SPC categorical outlook is a daily issuance that doesn't know about rapidly-developing individual cells.
By requiring multiple confirming signals for ACTIVE and SEVERE states, the system avoids triggering high-urgency recommendations from any single stale or anomalous data point. The exception is SEVERE — a Tornado Warning from the NWS (a human forecaster decision) is treated as immediately actionable without requiring GLM confirmation, because the cost of a missed tornado warning far exceeds the cost of a false severe alert.
The storm_is_near logic also applies a conservative default: if the tracking distance is unknown (null), it defaults to treating the storm as near. Unknown proximity is more dangerous than falsely assuming proximity, because a missed close storm is worse than an unnecessary alert for a distant one.
During CLEAR and WATCH states, the publisher runs on its standard 15-minute Cloud Scheduler cadence. When storm state reaches APPROACH or higher, the system triggers an adaptive fast-poll mode: the GCS JSON output is published with a cache_control: public, max-age=60 header (60-second browser cache) instead of the normal 300-second cache. Consuming front-ends polling the GCS endpoint will automatically pick up state changes within 1 minute of publication instead of 5 minutes.
This is not additional API calls — it's cache header tuning. The publisher still runs on the same Cloud Scheduler trigger. The change ensures that once a cell enters the 20-mile radius and triggers APPROACH, the park visitor's front-end sees the state update within the next 60-second browser refresh cycle rather than waiting up to 5 minutes for a stale cached response.
The ride weather profiles encode empirical closure thresholds derived from historical closure data.
Each profile uses GLM flash rate (flash_rate_per_hour) and nearest flash distance
(nearest_flash_miles) as the primary triggers — not just METAR thunderstorm flags,
which are binary and lack the spatial precision needed for per-ride decisions.
| Category | Example Rides | TS Closure % | GLM Trigger (Flash/hr) | GLM Trigger (Distance) | Primary Factor |
|---|---|---|---|---|---|
| Meet & Greet Outdoor | Meet Ariel at Her Grotto | 37.5% | ≥5 flash/hr | ≤10 mi | Lightning |
| Indoor Shows Weather-Aware | Canada Far and Wide, Awesome Planet, PhilharMagic | 13–29% | ≥10 flash/hr | ≤10 mi | Severe thunderstorm |
| Outdoor Walkthrough | Swiss Family Treehouse, Dumbo, Tomorrowland Speedway | 11–14% | ≥5 flash/hr | ≤10 mi | Wind + lightning |
| Open-Air Attractions | WDW Railroad, Journey of Water, Kali River Rapids | 9–10% | ≥5 flash/hr | ≤10 mi | Lightning |
| Outdoor Coasters (moderate) | Seven Dwarfs Mine Train, Barnstormer, Astro Orbiter | 4–8% | ≥5 flash/hr | ≤10 mi | Lightning + wind |
| Outdoor Coasters (low) | Magic Carpets, Rock 'n' Roller Coaster, Nemo | 3–4% | ≥10 flash/hr | ≤10 mi | Severe lightning only |
| Indoor Dark Rides | Space Mountain, Haunted Mansion, Pirates, Rise | 0% | Not applicable | Not applicable | Operational only |
The outage tracker (weather-ride-outage-tracker) is a separate Cloud Run Job that runs every 5 minutes. It queries the live rides-down view (v_rides_currently_down) for rides currently flagged DOWN during park operating hours, then compares each downed ride's closure time against the simultaneous weather snapshot from v_weather_current_enhanced at KISM.
If the thunderstorm_risk_score was ≥55 or wind_risk_score was ≥55 at the time the ride went down, the closure is tagged as likely weather-driven (LIKELY_WEATHER_LIGHTNING, LIKELY_WEATHER_WIND, or LIKELY_WEATHER_GENERAL) and written to tbl_weather_ride_outage_log with a UUID derived from the entity ID and down timestamp — making the insert idempotent.
The log is used to build the historical closure correlation that feeds the ride weather sensitivity percentages in the profiles table. Over time, each ride accumulates a record of how frequently it was down under various weather conditions, which provides the empirical foundation for updating the sensitivity thresholds.
An outage is opened when the ride goes down during active weather. The tracker checks open outages every 5 minutes and closes them (sets ride_up_at) when the ride reappears in the active rides list. This produces duration_minutes per weather closure — an operational metric that quantifies how long lightning-driven closures typically last by ride category.
METAR is_thunderstorm is a binary flag derived from present-weather groups in the METAR string (TS, VCTS, LTG). It's recorded by the station observer or automated sensor and reflects the weather at or near the station at observation time. It tells you "a thunderstorm is happening at KISM" but not how active the cell is, how close the lightning is, or how rapidly flashes are occurring.
GLM flash rate provides three continuous signals that the binary METAR flag cannot: rate (flashes per hour as a storm intensity proxy), proximity (nearest flash in miles as a spatial threat measure), and trajectory (the storm's distance is decreasing over successive readings). A flash rate of 2/hr with nearest flash at 18 miles is a different situation from 25/hr at 6 miles — both would set is_thunderstorm=True at the station, but the risk profiles and recommended actions are entirely different.
For the outdoor attractions with specific GLM thresholds (e.g., Meet Ariel closes at 5+ flash/hr within 10 miles), the flash rate and distance combination provides the precision necessary to distinguish "we should close this" from "we should monitor this." Binary METAR flags alone cannot support this level of specificity.
Central Florida's lightning season follows Florida's convective season: roughly April through October, driven by the Gulf-to-Atlantic sea-breeze convergence and high ambient moisture. Lightning occurrence outside this window is minimal — isolated events exist but the storm frequency is too low to justify continuous data ingestion costs.
The lightning-watcher-optimized Cloud Run job is triggered by trigger-lightning-watcher-optimized Cloud Scheduler, which is paused November through March with a single gcloud command. This eliminates the associated BigQuery write and compute costs during the off-season without requiring any code changes or infrastructure teardown. Re-enabling for storm season is a one-command operation. The BigQuery table persists across seasons with all historical data intact.
The storm state machine handles the off-season gracefully: with the lightning table returning empty results (no recent flashes), the flash_rate_per_hour is 0 and nearest_flash_miles is null. The null distance defaults to the conservative "treat as near" logic only for the ACTIVE state computation, but without any accompanying METAR thunderstorm flags or NWS warnings, the state correctly remains CLEAR through WATCH depending on forecast conditions.