Weather Whisperer — BigQuery ML

Architecture

5-Layer GCP Stack

External weather APIs → Cloud Run ingestion jobs → BigQuery raw tables → curated views → JSON publisher → GCS public bucket → Azure web dashboards. Each layer is independently scalable and separately schedulable via Cloud Scheduler.

Data Sources

aviationweather.gov (METAR) University of Wyoming (Radiosonde) api.weather.gov (NWS) 3 external APIs

Ingestion

Cloud Run Jobs metar-ingestion-job atmospheric-ingestion-job forecast-ingestion-job Cloud Scheduler

Analytics / ML

BigQuery weather_analytics dataset 5 raw tables (partitioned + clustered) 5 curated views Ride impact ML models

Publishing

weather-json-publisher insights-generator-job (daily) manifest.json generation 15-min publish cycle

Storage / CDN

GCS public bucket CORS-enabled station/{id}/latest.json rides/{id}/impact_latest.json CDN-ready

Consumption

Azure Static Web Apps Azure Web Apps Direct HTTPS from GCS Cross-cloud JSON fetch

IaC / Ops

Terraform (15 files) Secret Manager (API keys) 3 service accounts (least-privilege) Preflight + validation scripts

Data Flow

From Raw Observation to Ride Recommendation

How METAR bytes become an OPERATE/MONITOR/CLOSE decision for a roller coaster.

Multi-Source Ingestion — 3 Parallel Jobs

METAR job (every 15 min): fetches surface observations from aviationweather.gov — temperature, dewpoint, pressure, wind, precipitation, visibility. Derives rolling 3hr/6hr precip, 1hr pressure change, 1hr wind change, and boolean adverse-condition flags (thunderstorm, fog, high wind, heavy rain).

Atmospheric job (every 12 hr): fetches University of Wyoming radiosonde upper-air soundings — CAPE, CIN, Lifted Index, K-Index, Total Totals, storm-relative helicity, precipitable water. These instability indices drive the atmospheric stability component of the ride impact score.

Forecast job (every hour): fetches NWS point forecasts — temperature, feels-like, humidity, precip rate, cloud cover, wind, weather condition category.

Cloud Run Jobs Cloud Scheduler Exponential backoff

BigQuery Raw Tables — Partitioned + Clustered

All 3 sources land in dedicated BigQuery tables partitioned by time and clustered by station ID. Partition filters prevent full table scans — essential for cost control at scale. The 5 schemas cover 26 fields in metar_observations, 12 in weather_forecasts, and 22 instability metrics in wyoming_atmospheric_profiles.

DAY partitioning station_id clustering 5 tables

ML Ride Impact Scoring — Daily Insights Job

The insights-generator-job (runs daily at 06:00 UTC) joins current METAR observations with each ride's weather profile (ride_weather_profiles table: per-ride wind/rain/temperature/lightning thresholds and weather sensitivity score). Produces 5 component impact scores — wind, rain, lightning, temperature, atmospheric stability — normalized 0–1. Aggregates to overall_impact_score and emits a string recommendation: OPERATE, MONITOR, or CLOSE with a natural-language reason.

ride_weather_profiles ride_impact_scores 5 impact components OPERATE / MONITOR / CLOSE

Curated BigQuery Views

5 views over the raw tables produce pre-joined, web-ready documents: latest station conditions, latest regional geojson, last-6-hour weather events, latest atmospheric sounding, and next-24-hour forecast. Views serve as the source of truth for the publisher — one query per view, no ad-hoc SQL in the publish job.

curated_station_latest curated_region_latest curated_forecast_next_24h curated_atmosphere_latest

JSON Publisher → GCS → Azure Dashboards

The publisher job (every 15 min) queries each curated view and serializes to structured JSON files in the public GCS bucket. CORS configured for web access. Azure Static Web Apps and Web Apps fetch directly from GCS over HTTPS — cross-cloud, no proxy, no CDN cost unless needed. A manifest.json at the bucket root lists all objects with last-modified timestamps for client-side freshness checks.

GCS public bucket CORS enabled manifest.json Azure web apps

ML Model Design

5-Component Ride Impact Score

Each ride has a ride_weather_profiles record encoding its physical thresholds and sensitivity. The impact model joins live conditions against these profiles to produce normalized 0–1 component scores and a final OPERATE / MONITOR / CLOSE recommendation.

💨

Wind Impact

Current wind speed (knots) vs. per-ride wind_threshold_kt. Gust detection via 1-hour change flag.

0 → 1 score

🌧️

Rain Impact

Hourly precipitation rate vs. per-ride rain_threshold_in_hr. Rolling 3hr/6hr totals for sustained rain detection.

0 → 1 score

⚡

Lightning Impact

Thunderstorm boolean flag from METAR adverse conditions. Per-ride lightning_distance_mi threshold. Atmospheric instability (CAPE, K-Index) as leading indicator.

0 → 1 score

🌡️

Temperature Impact

Current temp vs. per-ride temperature_min_f and temperature_max_f operating envelope. Feels-like temperature from NWS forecasts.

0 → 1 score

📡

Atmospheric Impact

Radiosonde-derived instability indices: CAPE (J/kg), Lifted Index, Showalter Index, Total Totals, storm-relative helicity. Early warning layer before surface conditions change.

0 → 1 score

🎢

Overall Score → Decision

Weighted aggregate of 5 components, scaled by per-ride weather_sensitivity_score (0–1). Threshold bands: OPERATE (low), MONITOR (medium), CLOSE (high).

OPERATE · MONITOR · CLOSE

ride_weather_profiles Schema — Per-Ride Thresholds

Field	Type	Description
ride_id	STRING	Unique ride identifier
ride_type	STRING	roller_coaster · water_ride · outdoor_show · …
wind_threshold_kt	FLOAT64	Max operating wind speed (knots)
lightning_distance_mi	FLOAT64	Closure radius for lightning (miles)
rain_threshold_in_hr	FLOAT64	Max operating precip rate (in/hr)
temperature_min_f	FLOAT64	Minimum operating temperature (°F)
temperature_max_f	FLOAT64	Maximum operating temperature (°F)
weather_sensitivity_score	FLOAT64	Impact weight multiplier (0–1)
outdoor_exposure	BOOL	Significant outdoor exposure flag

BigQuery Schemas

5 Tables — Partitioned + Clustered

All raw tables use DAY partitioning on the time field and clustering on station_id — queries with time and station filters never scan more than the relevant partition slices.

metar_observations — 26 fields, partitioned by observation_time+

Field	Type	Description
station_id	STRING	ICAO station identifier (e.g. KORL)
observation_time	TIMESTAMP	UTC observation time — partition key
temp_f / dewpoint_f	FLOAT64	Temperature and dewpoint (°F)
relative_humidity	FLOAT64	Relative humidity (%)
pressure_hpa	FLOAT64	Station pressure (hPa)
wind_speed_kt / wind_gust_kt	FLOAT64	Wind speed and gust (knots)
precip_rolling_3hr / 6hr	FLOAT64	Rolling precipitation totals (derived)
pressure_change_1hr	FLOAT64	1-hour pressure trend (derived)
is_thunderstorm / is_fog / is_mist	BOOL	Condition flags from METAR present weather
adverse_heavy_rain / thunderstorm / fog / high_wind	BOOL	Adverse event flags for quick filtering

wyoming_atmospheric_profiles — 22 instability fields, partitioned by sounding_timestamp+

Field	Type	Description
cape_j_kg	FLOAT64	Convective Available Potential Energy (J/kg) — thunderstorm potential
cin_j_kg	FLOAT64	Convective Inhibition (J/kg) — cap strength
lifted_index	FLOAT64	Lifted Index — negative = unstable, <-6 = severe
k_index	FLOAT64	K-Index — thunderstorm probability (>40 = high)
total_totals_index	FLOAT64	Total Totals — >55 indicates severe potential
srh_j_kg	FLOAT64	Storm-Relative Helicity (J/kg) — tornado potential
pwat_inches	FLOAT64	Precipitable water — flood potential
lcl / lfc / el pressure	FLOAT64	Lifted condensation level, free convection level, equilibrium level (hPa)
thickness_1000_500_m	FLOAT64	1000-500mb thickness — rain vs snow discriminator

weather_forecasts — 12 fields, partitioned by forecast_timestamp+

Field	Type	Description
forecast_timestamp	TIMESTAMP	Forecast valid time (UTC) — partition key
temperature_f / apparent_temperature_f	FLOAT64	NWS temperature and feels-like (°F)
precipitation_intensity_mm_hr	FLOAT64	Forecast precip rate (mm/hr)
cloud_cover_percent	FLOAT64	Cloud cover (%)
weather_condition_category	STRING	NWS categorical condition string

Curated Views + GCS Output

5 Views → 7 GCS JSON Paths

Each curated view is a stable, pre-joined query over the raw tables. The publisher serializes each to a structured JSON file at a predictable GCS path — web apps fetch by path, no API layer needed.

curated_station_latest

Latest METAR for each monitored station — temp, wind, precip, adverse flags, rolling totals.

→ station/{station_id}/latest.json

curated_region_latest

All stations aggregated to a regional GeoJSON feature collection — map layer for dashboard overlays.

→ region/latest.geojson

curated_event_window_last_6h

Weather events (adverse conditions) in the past 6 hours — event log feed for the operational dashboard.

→ events/last_6h.json

curated_atmosphere_latest

Latest radiosonde sounding — CAPE, CIN, instability indices, precipitable water. Severe weather outlook.

→ atmosphere/latest.json

curated_forecast_next_24h

NWS hourly forecast for the next 24 hours — temperature, precipitation, cloud cover, condition category.

→ forecast/next_24h.json

ride_impact_scores

Per-ride ML impact scores — 5 component scores + overall score + OPERATE/MONITOR/CLOSE + reason string.

→ rides/{ride_id}/impact_latest.json

manifest.json

Bucket root manifest listing all objects with last-modified timestamps. Client-side freshness check before fetching.

→ manifest.json (root)

Cloud Run Jobs

5 Scheduled Jobs

All compute runs as Cloud Run Jobs — no always-on instances, billed per execution. Cloud Scheduler drives the cadence; each job is independently deployable via Terraform.

Every 15 minutes

metar-ingestion-job

Fetches current METAR observations from aviationweather.gov for all monitored Central Florida stations. Derives rolling precip, pressure change, and adverse condition booleans before writing to BigQuery.

Cloud Run→ BigQuery

Every 12 hours

atmospheric-ingestion-job

Fetches University of Wyoming radiosonde upper-air soundings (00Z + 12Z). Parses 22 atmospheric stability indices — CAPE, CIN, Lifted Index, K-Index, helicity, precipitable water. Writes to wyoming_atmospheric_profiles.

Cloud Run→ BigQuery

Every 1 hour

forecast-ingestion-job

Fetches NWS point forecasts for monitored stations. Parses hourly temperature, precip rate, cloud cover, wind, condition categories for the next 24-hour window.

Cloud Run→ BigQuery

Every 15 minutes

weather-json-publisher

Queries all 5 curated BigQuery views and serializes results to structured JSON. Uploads to GCS public bucket with CORS headers. Updates manifest.json. Azure web apps fetch directly from GCS on the next page load.

Cloud RunBigQuery →GCS

Daily — 06:00 UTC

insights-generator-job

Joins current METAR + atmospheric data with ride_weather_profiles. Computes 5 per-component impact scores, aggregates to overall_impact_score, emits OPERATE/MONITOR/CLOSE recommendations with natural-language reason strings to ride_impact_scores table.

Cloud RunBigQuery ML

Infrastructure & Operations

15 Terraform Files · 6-Step Deployment

Full Terraform IaC with separate tfvars for production. 6 numbered scripts take a fresh GCP project to running production in ~30 minutes.

Terraform

15 IaC Files

main.tf, bigquery.tf, bigquery_views.tf, storage.tf, cloud_run_jobs.tf, cloud_scheduler.tf, service_accounts.tf, iam.tf. Production tfvars template included.

Service Accounts

3 Least-Privilege SAs

weather-ingestion-sa: BigQuery dataEditor + jobUser + SecretManager accessor.
weather-publisher-sa: BigQuery dataViewer + GCS objectAdmin.
weather-ml-trainer-sa: BigQuery dataEditor + GCS objectViewer.

Step 01

01_preflight_check.sh

Validates GCP auth, required APIs enabled, gcloud + Terraform + Docker installed, project billing active. Exits with clear list of blockers before touching any cloud resource.

Step 02–03

Infrastructure + Applications

02_deploy_infrastructure.sh: Terraform apply (~10 min). 03_deploy_applications.sh: Docker build + Cloud Run job deployments (~15 min).

Step 04–05

Validate + Monitor

04_validate_deployment.sh: triggers all jobs, checks BigQuery row counts, verifies GCS objects. 05_setup_monitoring.sh: deploys CloudWatch dashboard JSON + alert policies.

Cost

$12–35/month

BigQuery $5–15 (storage + queries with partition filters). Cloud Run $5–10 (per-execution only). GCS $1–3. Cloud Scheduler $0.10. Alert threshold: >$100/month triggers notification.

Metric	Target	Alert Threshold	Severity
Job success rate	> 99%	< 95%	P1
METAR data freshness	< 20 min	> 30 min	P1
Job execution duration	< 2 min	> 5 min	P2
BigQuery storage	< 10 GB	> 50 GB	P3
GCS object count	~200	< 50 or > 1000	P2
Monthly cost	< $35	> $100	P2
HTTP 429 rate (external APIs)	< 1/hour	> 10/hour	P2 — exponential backoff active

Design Decisions

Key Engineering Choices

Why BigQuery for ML rather than a dedicated model serving layer?+

The ride impact scoring problem has a small feature set (9 input fields per ride × 5 weather variables) and a deterministic scoring function. BigQuery ML handles this as a SQL join + arithmetic, not a trained neural network. Using BigQuery keeps everything in one platform — ingest, store, transform, score, and export — with no model serving infrastructure to maintain. The "ML" here is structured feature engineering and threshold-based scoring, which is exactly the right tool when the physics is well-understood (wind knots vs. a roller coaster's known closure threshold) and explainability is required (the recommendation_reason string must be human-readable for operators).

Why radiosonde data (upper-air soundings) in addition to surface METAR?+

CAPE, CIN, and Lifted Index from upper-air soundings are leading indicators of convective development — they signal thunderstorm potential 3–6 hours before surface conditions deteriorate. METAR alone would only react to thunderstorms after they're already overhead. The atmospheric component of the ride impact score gives operations staff advance warning: a high CAPE + low CIN sounding at 12Z means the afternoon convective risk is elevated even if current surface conditions look fine. The University of Wyoming archives soundings at 00Z and 12Z — the 12-hour ingestion schedule matches this availability window exactly.

Why publish JSON to GCS instead of an API endpoint?+

The consuming application is hosted on Azure. A GCP API Gateway in front of BigQuery would add ~$50–100/month in API call costs and a round-trip to GCP on every dashboard load. Instead, the publisher job runs every 15 minutes and writes pre-computed JSON to a CORS-enabled public GCS bucket. Azure web apps fetch the JSON directly — no GCP API calls at read time. GCS standard storage for ~200 objects costs under $1/month. The manifest.json lets the client skip fetching if objects haven't changed since the last poll. This architecture decouples the compute (GCP) from the consumption (Azure) with no persistent connection required.

Why partition + cluster BigQuery tables? What does it save?+

METAR observations accumulate at ~96 records/day per station (every 15 min). At 10 monitored stations, that's ~960 rows/day, ~350,000 rows/year — small in absolute terms, but the curated views query these tables on every publisher run (every 15 min = 96 times/day). Without partitioning, BigQuery would scan the full table on every query. DAY partitioning on observation_time with clustering on station_id means the curated_station_latest view only scans today's partition (~960 rows) and skips all other stations via clustering — reducing per-query bytes scanned from GB to KB. BigQuery charges $5/TB scanned; partition + clustering drops the query cost to near zero.

Why Cloud Run Jobs instead of Cloud Functions or always-on services?+

Cloud Run Jobs are the right primitive for batch/scheduled workloads that run for seconds to minutes and then exit. Each ingestion job runs for <2 minutes every 15–60 minutes — an always-on container or Cloud Function with a long timeout would be idle >95% of the time. Cloud Run Jobs bill per CPU-second of execution, not per allocation time. Cloud Functions have a 9-minute max timeout — the atmospheric ingestion job parses complex sounding data and could approach this. Cloud Run Jobs have a 1-hour timeout with no per-invocation startup overhead once the container is cached. Cloud Scheduler triggers each job on its own schedule independently.

Weather Whisperer
BigQuery ML for Ride Operations

5-Layer GCP Stack

From Raw Observation to Ride Recommendation

Multi-Source Ingestion — 3 Parallel Jobs

BigQuery Raw Tables — Partitioned + Clustered

ML Ride Impact Scoring — Daily Insights Job

Curated BigQuery Views

JSON Publisher → GCS → Azure Dashboards

5-Component Ride Impact Score

Wind Impact

Rain Impact

Lightning Impact

Temperature Impact

Atmospheric Impact

Overall Score → Decision

ride_weather_profiles Schema — Per-Ride Thresholds

5 Tables — Partitioned + Clustered

5 Views → 7 GCS JSON Paths

curated_station_latest

curated_region_latest

curated_event_window_last_6h

curated_atmosphere_latest

curated_forecast_next_24h

ride_impact_scores

manifest.json

5 Scheduled Jobs

metar-ingestion-job

atmospheric-ingestion-job

forecast-ingestion-job

weather-json-publisher

insights-generator-job

15 Terraform Files · 6-Step Deployment

15 IaC Files

3 Least-Privilege SAs

01_preflight_check.sh

Infrastructure + Applications

Validate + Monitor

$12–35/month

Operational SLOs

Key Engineering Choices

Weather WhispererBigQuery ML for Ride Operations

5-Layer GCP Stack

From Raw Observation to Ride Recommendation

Multi-Source Ingestion — 3 Parallel Jobs

BigQuery Raw Tables — Partitioned + Clustered

ML Ride Impact Scoring — Daily Insights Job

Curated BigQuery Views

JSON Publisher → GCS → Azure Dashboards

5-Component Ride Impact Score

Wind Impact

Rain Impact

Lightning Impact

Temperature Impact

Atmospheric Impact

Overall Score → Decision

ride_weather_profiles Schema — Per-Ride Thresholds

5 Tables — Partitioned + Clustered

5 Views → 7 GCS JSON Paths

curated_station_latest

curated_region_latest

curated_event_window_last_6h

curated_atmosphere_latest

curated_forecast_next_24h

ride_impact_scores

manifest.json

5 Scheduled Jobs

metar-ingestion-job

atmospheric-ingestion-job

forecast-ingestion-job

weather-json-publisher

insights-generator-job

15 Terraform Files · 6-Step Deployment

15 IaC Files

3 Least-Privilege SAs

01_preflight_check.sh

Infrastructure + Applications

Validate + Monitor

$12–35/month

Operational SLOs

Key Engineering Choices

Weather Whisperer
BigQuery ML for Ride Operations