Satellite Remote Sensing · GOES-18 GLM · Geospatial · Real-Time · BigQuery

GOES-18 Geostationary Lightning Mapper
From Satellite Optical Sensor to Ride Closure Prediction

Every 20 seconds, the GOES-18 Geostationary Lightning Mapper detects transient luminosity from lightning across the Western Hemisphere. This case study traces the complete pipeline from satellite NetCDF files to park-specific spatial attribution, storm state transitions, and per-ride closure probability — and explains the non-obvious design decisions that make satellite lightning data operationally useful for a theme park visitor.

20 sec
GLM file cadence
20 mi
Park proximity filter
~8–12 km
GLM spatial accuracy
5 states
Storm state machine
4 parks
Per-park attribution
5 min
Outage tracker cadence

Why Satellite Lightning Instead of Ground Networks

Standard lightning detection services (Vaisala's NLDN, Earth Networks) are ground-based networks that triangulate stroke location from electromagnetic signals. GOES-18's GLM sensor is fundamentally different: it's an optical photometer operating at 777.4 nm that detects the light emitted by the lightning channel itself — from 35,786 km above the equator.

🛰
GOES-18 Geostationary Lightning Mapper
NOAA / NASA · Level 2 LCFA Product
ProductLevel 2 Flash Clusters (LCFA) — highest deduplication
File cadence~20 seconds per NetCDF scan cycle
Detection methodOptical sensor at 777.4 nm (oxygen A-band)
Spatial accuracy~8–12 km nadir, stable over CONUS
Coverage24/7, cloud-opaque ✓, no ground network gaps
Energy fieldFlash radiant energy in joules (convective intensity proxy)
Geostationary — no gaps, no outages Detects cloud-to-cloud and cloud-to-ground
📡
NWS / NWS Alerts (companion source)
api.weather.gov — Orange + Osceola counties
ProductActive alerts GeoJSON (area polygons)
Latency2–5 min after NWS issues warning
Events trackedSevere Thunderstorm Warning, Tornado Warning, Tornado Watch, Flash Flood Warning, + 10 more
County filterOrange + Osceola only (WDW sits on both)
IntegrationFeeds storm state machine as a confirming signal alongside GLM
Human forecaster judgment Polygon-based area coverage
GLM Level 2 Flash: what a "flash" means and why it's the right product level+

The GLM produces three product levels: L2 Events (individual optical pulses, ~10ms), L2 Groups (spatially and temporally connected events), and L2 Flashes (the full merged discharge event from initial leader to final return stroke, potentially covering multiple groups). This pipeline uses the Flash product (LCFA files).

Using the Event level would produce 5–50× more data points per real storm — each lightning bolt produces multiple optical pulses, and the event product records each one separately. This inflates flash counts dramatically and makes the flash-rate-per-hour metric unusable as a storm intensity proxy. The Flash product merges all optical pulses from a single discharge into one record with an aggregate energy value, which is the meteorologically meaningful unit — it maps to what a human observing the storm would count as "one bolt."

The consequence for the park intelligence system: when the storm state machine checks flash_rate_per_hour >= 30 to trigger SEVERE state, that threshold is calibrated to Flash-level counts. Using Event-level data with the same threshold would trigger SEVERE for any active cell.

Why the satellite beats ground networks for this use case+

Commercial ground-based lightning networks (NLDN, Earth Networks) are licensed data with per-strike or subscription pricing. At the volume this system ingests — potentially hundreds of flashes per hour during active Florida summer storms — the cost would be prohibitive compared to the free NOAA GOES-18 data feed.

More importantly, ground networks have coverage gaps. Network efficiency (the percentage of actual strokes that are detected) varies by geographic area, antenna spacing, and soil conductivity. Central Florida's geography over lakes and wetlands causes variable detection efficiency. The GLM has no such gaps — it has uniform optical coverage across the entire CONUS domain from 35,000+ km altitude, with Central Florida well within the optimal nadir zone where spatial accuracy is best.

The trade-off: GLM's ~8–12 km positional accuracy is coarser than ground-based networks which can locate strokes to <500 m. For a 20-mile radius park filter, 8–12 km error is operationally acceptable — a flash that's actually 15 miles away but is measured as 14 miles away due to sensor accuracy is still correctly classified as "within range."

tbl_lightning_whisperer: What Every Row Represents

Each row is one Level-2 Flash event that passed the spatial filter. The most important design decision in the schema is the addition of nearest_park and distance_miles: computed at ingest time so consumers never need to run geospatial math at query time.

Field Type Meteorological / Engineering Meaning
strike_id STRING Stable unique ID per flash. Derived from source NetCDF filename + flash index. Safe for deduplication across ingestion runs — idempotent upsert key.
timestamp TIMESTAMP UTC flash time. Central Florida is UTC-4 (EDT) in storm season. Peak convective hours: 1500–2000 UTC (11 AM–4 PM local). The service does not write rows for zero-flash windows — missing hours mean no activity, not missing data.
latitude
longitude
FLOAT Flash centroid in WGS84. Positional accuracy ~8–12 km at nadir. Central Florida is well within the GOES-18 optimal coverage zone (closer to satellite sub-point = better spatial resolution).
energy FLOAT Radiant energy in joules. Strong proxy for convective intensity — high-energy flashes indicate vigorous updrafts and dense ice crystal zones in the mixed-phase layer. Nullable: treat as enrichment, not a required field. Used to compute avg_energy in the summary.
nearest_park STRING Pre-computed at ingest: which WDW park center (Magic Kingdom, EPCOT, Hollywood Studios, Animal Kingdom) is spatially closest. Enables instant per-park flash count queries without any runtime geospatial computation.
distance_miles FLOAT Haversine distance from flash centroid to nearest park center (0.00–20.00 mi). Pre-computed at ingest. The primary proximity metric used by the storm state machine (nearest_flash_miles <= 10 → ACTIVE, <= 20 → APPROACH).
type STRING Always "flash" — reserved for future product levels (group, event) if needed.
source_file STRING Source GOES-18 NetCDF filename. Embeds exact satellite scan time in filename pattern (GLM_L2_LCFA_G18_*.nc). Used for deduplication audit and provenance tracing.

The Geographic Reference System for Walt Disney World

The WDW resort complex spans ~40 square miles with four major parks separated by up to 3 miles. The spatial system uses two layers of filtering: a rectangular bounding box for fast pre-filtering, and precise haversine distance calculations against each park's exact center coordinate.

Two-Stage Spatial Filter
Stage 1 — Bounding Box Pre-filter
LAT: 28.21° – 28.56° N
LON: -81.74° – -81.40° W
Area: ~21 × 23 miles
Applied to raw GLM data before any haversine math. Eliminates ~95% of CONUS flashes immediately. Only flashes inside this box proceed to Stage 2.
Stage 2 — Per-Park Haversine Check (≤ 20 mi)
Magic Kingdom28.4189, -81.5811
EPCOT28.3747, -81.5494
Hollywood Studios28.3574, -81.5580
Animal Kingdom28.3553, -81.5902
KISM reference28.2897, -81.4372
Why pre-computing nearest_park and distance_miles at ingest is critical+

The naive approach to answering "which flashes are near Magic Kingdom right now?" is to join tbl_lightning_whisperer against a park coordinates table and compute haversine distance at query time. For a 30-minute window with 200 flashes, this is fast. But the publisher runs every 15 minutes and queries this table every cycle, and the storm state machine also queries it — potentially dozens of computations per run.

Pre-computing at ingest means every query against the table is a simple WHERE distance_miles <= X AND nearest_park = 'Magic Kingdom' — a standard range scan on numeric columns. No spatial join, no function call per row, no repeated coordinate lookups. BigQuery can use column-level statistics to execute these queries efficiently at any scale.

The nearest_park assignment also serves a display and analysis function: park-level flash counts ("5 flashes within 10 miles of Animal Kingdom in the last 30 minutes") are an instantly queryable aggregation. Without pre-computation, this would require a subquery per park per query.

The 20-mile radius: how it was calibrated to Florida storm behavior+

Florida summer thunderstorms are typically fast-developing, isolated convective cells (not organized squall lines). A sea-breeze cell can go from initiation to first flash in 15–25 minutes, and typical storm motion across Central Florida is 10–20 mph — primarily east-to-west in the afternoon as Gulf sea-breeze moisture meets Atlantic sea-breeze moisture over the I-4 corridor.

At 15 mph storm motion, a cell 20 miles away will reach the park in approximately 80 minutes. This provides enough lead time for guests to receive a meaningful warning: the system's storm state transitions from APPROACH to ACTIVE as the storm closes within 10 miles, giving guests roughly 40 minutes of warning before the storm arrives. A larger radius (50 miles) would produce excessive false alerts for storms that dissipate before reaching the park. A smaller radius (10 miles) would produce warnings with insufficient lead time for organized park response.

The 10-mile threshold for the ACTIVE trigger is calibrated to park lightning safety protocols: many outdoor venues begin shelter-in-place procedures when lightning is within 8–10 miles. The 10-mile threshold allows the system to recommend seeking shelter approximately in-step with what the park's own safety team is doing.

GOES-18 NetCDF → BigQuery in 5 Steps

01
NOAA GOES-18 Feed
Every ~20 seconds, NOAA posts a new GLM L2 LCFA NetCDF to their public HTTPS endpoint. Each file covers one scan cycle. During active storms, 30–60 files accumulate between 10-minute ingestion runs.
02
Bounding Box Pre-filter
After downloading, all flash records outside the 28.21–28.56°N / -81.74 to -81.40°W box are discarded in memory before any haversine math. Eliminates ~99% of records per file during non-storm periods.
03
Haversine + Park Attribution
For each surviving flash, compute haversine distance to all 4 park centers. Assign nearest_park (minimum distance park) and distance_miles. Discard if distance to nearest park exceeds 20 miles.
04
Strike ID Generation
Generate a stable strike_id from the NetCDF filename + flash index. This makes BigQuery upserts idempotent — re-running the same 10-minute window after a transient failure never creates duplicates.
05
BigQuery INSERT
Batch-insert surviving flashes to tbl_lightning_whisperer. Publisher queries this table every 15 minutes for the last 30 minutes, computing flash rate, nearest flash distance, and per-park totals.

Fusing Lightning + METAR + NWS Into Five Actionable States

The publisher computes a unified storm state every 15 minutes by fusing GLM flash data, METAR thunderstorm flags from 17 stations, NWS alert status, storm tracking vectors, NWS hourly probability-of-thunderstorm forecasts, and SPC categorical outlook risk levels. No single source triggers a state change alone — the system requires confirming signals.

CLEAR
No GLM activity · No METAR TS · No NWS watch · SPC no-risk · NWS 6hr <30%
WATCH
SPC Slight Risk+ · NWS 6hr ≥30% · NWS watch issued · SPC TSTM background
APPROACH
Radar tracking APPROACHING · NWS warning issued · GLM <20 mi · Any METAR TS · SPC Enhanced+ · NWS 3hr ≥50%
ACTIVE
METAR TS within 35 mi · Max risk ≥75 + near storm · GLM <10 mi w/ flash rate >0 · NWS warning + any GLM · NWS 1hr ≥70%
SEVERE
Tornado Warning issued · GLM flash rate ≥30/hr (intense convection)
Why the state machine requires confirming signals rather than single-source triggers+

Each individual data source has failure modes that would produce false alerts if used alone. METAR TS flags depend on the station observer (some stations are automated and can miss events). The NWS alert API has latency after the forecast office issues a warning. The GLM can capture residual flashes from a dissipating cell that's already past the park. The SPC categorical outlook is a daily issuance that doesn't know about rapidly-developing individual cells.

By requiring multiple confirming signals for ACTIVE and SEVERE states, the system avoids triggering high-urgency recommendations from any single stale or anomalous data point. The exception is SEVERE — a Tornado Warning from the NWS (a human forecaster decision) is treated as immediately actionable without requiring GLM confirmation, because the cost of a missed tornado warning far exceeds the cost of a false severe alert.

The storm_is_near logic also applies a conservative default: if the tracking distance is unknown (null), it defaults to treating the storm as near. Unknown proximity is more dangerous than falsely assuming proximity, because a missed close storm is worse than an unnecessary alert for a distant one.

Adaptive fast-poll: what happens to refresh cadence during active storms+

During CLEAR and WATCH states, the publisher runs on its standard 15-minute Cloud Scheduler cadence. When storm state reaches APPROACH or higher, the system triggers an adaptive fast-poll mode: the GCS JSON output is published with a cache_control: public, max-age=60 header (60-second browser cache) instead of the normal 300-second cache. Consuming front-ends polling the GCS endpoint will automatically pick up state changes within 1 minute of publication instead of 5 minutes.

This is not additional API calls — it's cache header tuning. The publisher still runs on the same Cloud Scheduler trigger. The change ensures that once a cell enters the 20-mile radius and triggers APPROACH, the park visitor's front-end sees the state update within the next 60-second browser refresh cycle rather than waiting up to 5 minutes for a stale cached response.

Translating Flash Rate into Per-Ride Closure Probability

The ride weather profiles encode empirical closure thresholds derived from historical closure data. Each profile uses GLM flash rate (flash_rate_per_hour) and nearest flash distance (nearest_flash_miles) as the primary triggers — not just METAR thunderstorm flags, which are binary and lack the spatial precision needed for per-ride decisions.

Category Example Rides TS Closure % GLM Trigger (Flash/hr) GLM Trigger (Distance) Primary Factor
Meet & Greet Outdoor Meet Ariel at Her Grotto
37.5%
≥5 flash/hr ≤10 mi Lightning
Indoor Shows Weather-Aware Canada Far and Wide, Awesome Planet, PhilharMagic
13–29%
≥10 flash/hr ≤10 mi Severe thunderstorm
Outdoor Walkthrough Swiss Family Treehouse, Dumbo, Tomorrowland Speedway
11–14%
≥5 flash/hr ≤10 mi Wind + lightning
Open-Air Attractions WDW Railroad, Journey of Water, Kali River Rapids
9–10%
≥5 flash/hr ≤10 mi Lightning
Outdoor Coasters (moderate) Seven Dwarfs Mine Train, Barnstormer, Astro Orbiter
4–8%
≥5 flash/hr ≤10 mi Lightning + wind
Outdoor Coasters (low) Magic Carpets, Rock 'n' Roller Coaster, Nemo
3–4%
≥10 flash/hr ≤10 mi Severe lightning only
Indoor Dark Rides Space Mountain, Haunted Mansion, Pirates, Rise
0%
Not applicable Not applicable Operational only
The outage tracker: correlating real-time ride status with weather signals+

The outage tracker (weather-ride-outage-tracker) is a separate Cloud Run Job that runs every 5 minutes. It queries the live rides-down view (v_rides_currently_down) for rides currently flagged DOWN during park operating hours, then compares each downed ride's closure time against the simultaneous weather snapshot from v_weather_current_enhanced at KISM.

If the thunderstorm_risk_score was ≥55 or wind_risk_score was ≥55 at the time the ride went down, the closure is tagged as likely weather-driven (LIKELY_WEATHER_LIGHTNING, LIKELY_WEATHER_WIND, or LIKELY_WEATHER_GENERAL) and written to tbl_weather_ride_outage_log with a UUID derived from the entity ID and down timestamp — making the insert idempotent.

The log is used to build the historical closure correlation that feeds the ride weather sensitivity percentages in the profiles table. Over time, each ride accumulates a record of how frequently it was down under various weather conditions, which provides the empirical foundation for updating the sensitivity thresholds.

Outage lifecycle

An outage is opened when the ride goes down during active weather. The tracker checks open outages every 5 minutes and closes them (sets ride_up_at) when the ride reappears in the active rides list. This produces duration_minutes per weather closure — an operational metric that quantifies how long lightning-driven closures typically last by ride category.

Why GLM flash rate is used instead of METAR is_thunderstorm for ride decisions+

METAR is_thunderstorm is a binary flag derived from present-weather groups in the METAR string (TS, VCTS, LTG). It's recorded by the station observer or automated sensor and reflects the weather at or near the station at observation time. It tells you "a thunderstorm is happening at KISM" but not how active the cell is, how close the lightning is, or how rapidly flashes are occurring.

GLM flash rate provides three continuous signals that the binary METAR flag cannot: rate (flashes per hour as a storm intensity proxy), proximity (nearest flash in miles as a spatial threat measure), and trajectory (the storm's distance is decreasing over successive readings). A flash rate of 2/hr with nearest flash at 18 miles is a different situation from 25/hr at 6 miles — both would set is_thunderstorm=True at the station, but the risk profiles and recommended actions are entirely different.

For the outdoor attractions with specific GLM thresholds (e.g., Meet Ariel closes at 5+ flash/hr within 10 miles), the flash rate and distance combination provides the precision necessary to distinguish "we should close this" from "we should monitor this." Binary METAR flags alone cannot support this level of specificity.

Seasonal operation: why the ingestion service is paused November–March+

Central Florida's lightning season follows Florida's convective season: roughly April through October, driven by the Gulf-to-Atlantic sea-breeze convergence and high ambient moisture. Lightning occurrence outside this window is minimal — isolated events exist but the storm frequency is too low to justify continuous data ingestion costs.

The lightning-watcher-optimized Cloud Run job is triggered by trigger-lightning-watcher-optimized Cloud Scheduler, which is paused November through March with a single gcloud command. This eliminates the associated BigQuery write and compute costs during the off-season without requiring any code changes or infrastructure teardown. Re-enabling for storm season is a one-command operation. The BigQuery table persists across seasons with all historical data intact.

The storm state machine handles the off-season gracefully: with the lightning table returning empty results (no recent flashes), the flash_rate_per_hour is 0 and nearest_flash_miles is null. The null distance defaults to the conservative "treat as near" logic only for the ACTIVE state computation, but without any accompanying METAR thunderstorm flags or NWS warnings, the state correctly remains CLEAR through WATCH depending on forecast conditions.

Weather ML Pipeline (companion) → Downstream: Ride Forecasting ↗ Park Agent Chat ↗ All Projects ↗