Applying MLLW to Coastal Survey Data

Coastal survey datasets routinely arrive referenced to arbitrary local benchmarks, NAVD88, or Mean Sea Level (MSL), creating immediate vertical misalignment with nautical charting standards and coastal engineering baselines. Applying MLLW to Coastal Survey Data requires deterministic vertical datum transformation, precise horizontal-vertical CRS coupling, and memory-constrained processing architectures capable of handling multi-terabyte bathymetric rasters and point clouds. This workflow isolates the operational transformation step: ingesting raw elevation/bathymetry grids, aligning them with precomputed tidal datum offset surfaces, and serializing the result into cloud-native, vertically referenced outputs without exceeding worker memory limits.

Vertical Datum Offset Architecture

MLLW (Mean Lower Low Water) is a tidal datum derived from the arithmetic mean of the lower low water height of each tidal day over a 19-year National Tidal Datum Epoch. Unlike geoid-based vertical datums, MLLW is hydrodynamically defined and exhibits pronounced spatial variability across estuarine gradients, continental shelves, and open-coast environments. Operational pipelines do not compute MLLW from harmonic constituents in real-time; instead, they consume precomputed offset grids (e.g., NOAA VDATUM separation surfaces, regional tidal models, or agency-derived MLLW-to-NAVD88 grids). The transformation is strictly additive: Z_MLLW = Z_raw - Offset, where Offset represents the vertical distance from the raw datum to MLLW at each coordinate.

Vertical datum alignment must be treated as a first-class pipeline constraint. Horizontal CRS (e.g., EPSG:26918) and vertical CRS (e.g., EPSG:5829 for MLLW) must be explicitly declared in metadata to prevent downstream misinterpretation by GIS engines, hydrodynamic solvers, or charting systems. The foundational architecture for managing these spatial references and tidal surfaces is documented in Marine Spatial Data Fundamentals & Architecture, which establishes the schema for datum-aware data lakes, version-controlled offset grids, and standardized metadata propagation.

Chunked Transformation Pipeline & Memory Constraints

Loading full coastal extents into memory triggers immediate OOM failures in containerized environments, particularly when processing high-resolution multibeam bathymetry or LiDAR-derived DEMs. Production pipelines must enforce strict chunking boundaries aligned to raster tile sizes (typically 256×256 or 512×512) and leverage lazy evaluation. The transformation logic operates on aligned xarray DataArrays backed by dask arrays, ensuring that only the active computational window resides in RAM. Vertical offset grids must be resampled to match the target survey grid’s resolution, extent, and CRS prior to arithmetic operations to prevent edge artifacts, coordinate drift, and NaN propagation.

For comprehensive implementation patterns regarding datum arithmetic, coordinate system validation, and harmonic-to-surface conversion, refer to Tidal Datum Transformations in Python.

Production-Grade Implementation

The following implementation demonstrates a chunked, memory-aware transformation pipeline. It enforces explicit CRS validation, aligns offset surfaces via nearest-neighbor or bilinear resampling, applies the vertical shift, and serializes the output with compliant vertical datum metadata.

import logging
import pathlib
from typing import Optional, Tuple

import dask
import dask.array as da
import numpy as np
import rasterio
import xarray as xr
from rasterio.crs import CRS
from rasterio.enums import Resampling
from rasterio.warp import reproject

logging.basicConfig(level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s")
logger = logging.getLogger(__name__)

def apply_mllw_transform(
    survey_path: pathlib.Path,
    offset_path: pathlib.Path,
    output_path: pathlib.Path,
    chunk_size: int = 512,
    target_horizontal_crs: str = "EPSG:26918",
    target_vertical_crs: str = "EPSG:5829",
    resampling_method: Resampling = Resampling.bilinear,
) -> None:
    """
    Apply MLLW vertical datum transformation to coastal survey rasters using chunked dask arrays.
    
    Parameters
    ----------
    survey_path : pathlib.Path
        Path to raw bathymetry/elevation grid (GeoTIFF/NetCDF).
    offset_path : pathlib.Path
        Path to precomputed vertical offset surface (raw_datum -> MLLW).
    output_path : pathlib.Path
        Destination path for vertically referenced output.
    chunk_size : int
        Dask chunk dimension (square).
    target_horizontal_crs : str
        Target horizontal EPSG code.
    target_vertical_crs : str
        Target vertical EPSG code (e.g., EPSG:5829 for MLLW).
    resampling_method : rasterio.enums.Resampling
        Resampling strategy for offset alignment.
    """
    logger.info(f"Initializing MLLW transformation pipeline: {survey_path.name}")
    
    # Configure dask for memory-constrained workers
    dask.config.set(scheduler="threads", num_workers=4)
    
    # Load survey data with explicit chunking
    with xr.open_dataset(survey_path, engine="rasterio", chunks={"x": chunk_size, "y": chunk_size}) as survey_ds:
        survey_da = survey_ds.band_data.sel(band=1)
        survey_crs = CRS.from_string(survey_ds.spatial_ref.attrs.get("crs_wkt", survey_ds.rio.crs.to_string()))
        
        if not survey_crs.equals(CRS.from_string(target_horizontal_crs)):
            raise ValueError(f"Survey CRS {survey_crs} does not match target {target_horizontal_crs}. Reproject first.")
            
        logger.info(f"Loaded survey grid: {survey_da.shape}, CRS: {survey_crs}")
        
        # Load offset grid
        with xr.open_dataset(offset_path, engine="rasterio", chunks={"x": chunk_size, "y": chunk_size}) as offset_ds:
            offset_da = offset_ds.band_data.sel(band=1)
            offset_crs = CRS.from_string(offset_ds.spatial_ref.attrs.get("crs_wkt", offset_ds.rio.crs.to_string()))
            
            # Align offset to survey grid if CRS/transform mismatch exists
            if not offset_crs.equals(survey_crs):
                logger.warning("Offset CRS mismatch. Reprojecting offset surface to survey CRS.")
                offset_da = offset_da.rio.reproject(survey_crs, resampling=resampling_method)
            
            # Ensure spatial alignment (same bounds/resolution)
            if not np.allclose(survey_da.coords["x"].values, offset_da.coords["x"].values) or \
               not np.allclose(survey_da.coords["y"].values, offset_da.coords["y"].values):
                logger.info("Resampling offset grid to survey spatial resolution.")
                offset_da = offset_da.rio.reproject_match(survey_da, resampling=resampling_method)
            
            # Apply vertical datum transformation: Z_MLLW = Z_raw - Offset
            logger.info("Executing chunked vertical datum transformation.")
            mllw_da = (survey_da - offset_da).rename("elevation_mllw")
            
            # Attach vertical CRS metadata
            mllw_da.attrs.update({
                "crs": target_vertical_crs,
                "grid_mapping": "vertical_datum",
                "units": "meters",
                "long_name": "Elevation relative to Mean Lower Low Water",
                "vertical_datum": "MLLW",
                "vertical_datum_epsg": target_vertical_crs.split(":")[1]
            })
            
            # Serialize to cloud-native format (Zarr recommended for chunked workflows)
            logger.info(f"Serializing output to {output_path}")
            mllw_da.to_netcdf(output_path, mode="w", format="NETCDF4", engine="netcdf4")
            
    logger.info("MLLW transformation pipeline completed successfully.")

if __name__ == "__main__":
    # Example execution
    apply_mllw_transform(
        survey_path=pathlib.Path("data/raw_survey_navd88.tif"),
        offset_path=pathlib.Path("data/offsets/navd88_to_mllw.tif"),
        output_path=pathlib.Path("data/processed/survey_mllw.nc"),
        chunk_size=512
    )

Validation & Output Serialization

Post-transformation validation must verify vertical datum integrity, spatial alignment, and metadata compliance. Automated QA steps should include:

  1. Extent & Resolution Verification: Confirm x/y coordinate arrays match the source survey grid exactly.
  2. NaN Propagation Check: Ensure NaN values in the offset surface do not artificially flatten valid bathymetry. Masking or nearest-neighbor fallback should be applied where tidal models lack coverage.
  3. Vertical CRS Tagging: GIS engines require explicit vertical datum declarations. NetCDF outputs must include grid_mapping variables referencing the vertical CRS, while GeoTIFF outputs require VerticalCSType GeoTags (e.g., 9829 for MLLW). Refer to the EPSG Geodetic Parameter Dataset for authoritative vertical CRS codes and transformation matrices.
  4. Cloud-Native Serialization: Zarr or chunked NetCDF4 formats preserve dask chunk boundaries and enable parallel downstream consumption. Avoid monolithic TIFF outputs for multi-terabyte extents.

For authoritative tidal datum definitions and regional separation surface generation, consult the official NOAA VDATUM documentation. Pipelines should version-control offset surfaces alongside survey products to maintain reproducible vertical baselines across tidal epoch updates.