Speed and Heading Profiling for Maritime Analytics
Speed and Heading Profiling for Maritime Analytics is a deterministic, compute-bound workflow that transforms raw, temporally irregular Automatic Identification System (AIS) positional fixes into validated speed-over-ground (SOG) and course-over-ground (COG) trajectories. Within the AIS Vessel Tracking & Route Automation pillar, this operation serves as the foundational kinematic layer for downstream spatial analytics, hydrodynamic modeling, and regulatory compliance monitoring. The operational intent is singular: ingest high-volume AIS telemetry, compute geodesically accurate kinematic derivatives, apply deterministic quality filters, and output partitioned, cloud-ready profiles without exceeding memory budgets or introducing projection-induced distortion.
Operational Schema & Ingestion Architecture
Raw AIS streams arrive as decoded NMEA payloads containing positional fixes, vessel metadata, and navigational status. For production profiling, the ingestion layer must enforce a strict, typed schema before any kinematic computation occurs. The minimal viable schema includes mmsi (UInt32), timestamp (datetime with UTC timezone), lat/lon (Float64), sog (Float32), cog (Float32), and nav_status (UInt8). Missing or malformed fields must be coerced or flagged, not imputed, to preserve auditability and traceability across regulatory audits.
Memory constraints dictate that full-trajectory loading into in-memory GeoDataFrames is unsustainable at regional or basin scale. Instead, ingestion should leverage columnar, memory-mapped formats. Apache Parquet with Polars provides the necessary throughput and lazy evaluation capabilities. Coordinate Reference System (CRS) handling is equally critical: AIS data is natively WGS84 (EPSG:4326). Reprojecting to planar systems (e.g., UTM) for speed/heading computation introduces edge-case distortion, seam discontinuities, and unnecessary compute overhead. Geodesic calculations on native lat/lon coordinates preserve accuracy across all latitudes and eliminate CRS transformation latency.
Ingestion pipelines must align with streaming ingestion architectures. When integrating with Real-Time AIS Stream Ingestion Pipelines, the profiling layer should operate on micro-batches (5–15 minute windows) partitioned by mmsi and temporal index. This guarantees bounded memory usage, enables horizontal scaling across distributed workers, and maintains strict temporal ordering required for derivative calculations.
Vectorized Geodesic Kinematics
Kinematic profiling requires precise computation of inter-fix distance, elapsed time, and directional change. Row-wise iteration is unacceptable in production; all operations must be vectorized. The following implementation uses pyproj.Geod for WGS84 geodesic distance, numpy for directional math, and polars for memory-efficient columnar execution. By operating directly on native arrays, we bypass the overhead of geometry object instantiation while maintaining sub-meter positional fidelity.
import polars as pl
import numpy as np
from pyproj import Geod
from datetime import timedelta
# Initialize WGS84 geodesic calculator (thread-safe, stateless)
geod = Geod(ellps="WGS84")
def compute_kinematics(df: pl.DataFrame) -> pl.DataFrame:
"""
Vectorized SOG/COG computation for AIS micro-batches.
Assumes input is sorted by mmsi, timestamp.
"""
# Shift arrays for delta computation
lat_prev = df["lat"].shift(1)
lon_prev = df["lon"].shift(1)
ts_prev = df["timestamp"].shift(1)
# Temporal delta in seconds
dt_seconds = (df["timestamp"] - ts_prev).dt.total_seconds()
# Geodesic distance (meters) and forward azimuth (degrees)
# pyproj.Geod.inv accepts array-like inputs natively
fwd_az, _, dist_m = geod.inv(
lon_prev.to_numpy(), lat_prev.to_numpy(),
df["lon"].to_numpy(), df["lat"].to_numpy()
)
# Compute SOG in knots (1 knot = 0.514444 m/s)
sog_calc = (dist_m / np.maximum(dt_seconds, 1e-6)) / 0.514444
# Normalize COG to [0, 360) and handle azimuth wrap-around
cog_calc = np.mod(fwd_az, 360.0)
# Replace invalid/NaN deltas with nulls to preserve schema integrity
valid_mask = dt_seconds > 0.0
sog_out = np.where(valid_mask, sog_calc, np.nan)
cog_out = np.where(valid_mask, cog_calc, np.nan)
return df.with_columns([
pl.Series("sog_calc", sog_out, dtype=pl.Float32),
pl.Series("cog_calc", cog_out, dtype=pl.Float32),
pl.Series("dt_sec", dt_seconds, dtype=pl.Float32)
])
# Execution pattern for lazy evaluation
# lazy_frame = pl.scan_parquet("s3://ais-raw/year=2024/month=10/*.parquet")
# profiled = lazy_frame.sort("mmsi", "timestamp").group_by("mmsi").map_batches(compute_kinematics)
The pyproj geodesic inverse method provides rigorous WGS84 ellipsoidal calculations, avoiding the spherical approximations that degrade accuracy at high latitudes or across long baselines. For comprehensive geodesic implementation standards, refer to the official pyproj documentation.
Deterministic Quality Filtering & Anomaly Suppression
Raw AIS telemetry contains GPS drift, multipath errors, and transmission artifacts that corrupt kinematic derivatives. Production profiling must apply deterministic filters before downstream routing:
- Stationary Suppression: Vessels at anchor or moored exhibit GPS jitter that generates false SOG/COG spikes. Apply a threshold filter (
sog_calc < 0.5 knots) and flagnav_status == 1(at anchor) to suppress kinematic updates during stationary periods. - Temporal Gap Enforcement: AIS dropout or satellite handoff creates discontinuous trajectories. Enforce a maximum temporal delta (
dt_sec <= 3600). Gaps exceeding this threshold terminate the current trajectory segment and require a new initialization. - Heading Rate-of-Change Limit: Commercial vessels cannot execute instantaneous course changes. Apply a rolling filter:
abs(cog_calc - lag(cog_calc, 1)) <= 45°over a 5-minute window. Violations indicate GPS multipath or sensor fault and are interpolated or nullified. - SOG/COG Consistency Check: Cross-validate computed SOG against raw AIS-reported SOG. Discrepancies exceeding 20% trigger a quality flag (
q_flag = 1) for manual review or downstream anomaly routing.
These filters operate as deterministic boolean masks, ensuring zero stochastic behavior and full reproducibility across pipeline runs.
Partitioned Output & Downstream Routing
Once kinematic profiles pass quality validation, they must be serialized into cloud-optimized formats that support predicate pushdown and partition pruning. The output schema appends sog_calc, cog_calc, dt_sec, and q_flag to the original record set. Data is partitioned hierarchically: mmsi/year/month/day. This structure enables efficient time-range queries and aligns with standard data lakehouse architectures.
Partitioned Parquet files are written using Polars’ sink_parquet method, which streams results directly to object storage without materializing intermediate DataFrames in RAM. This approach maintains a constant memory footprint regardless of basin-scale dataset size. The resulting profiles feed directly into behavioral classification engines, where kinematic signatures are mapped to operational states such as transit, maneuvering, or fishing. For implementation details on trajectory classification, consult the Segmenting Vessel Routes by Behavior documentation.
Production deployments should enforce strict schema evolution policies and retain raw telemetry alongside profiled outputs for forensic auditing. When combined with robust ingestion, deterministic filtering, and partitioned storage, this profiling layer delivers the high-fidelity kinematic foundation required for modern coastal and marine spatial analytics. For additional guidance on columnar execution optimization, review the Polars execution model documentation.