DEM Interpolation Techniques for Seafloor Mapping

Transforming sparse, noisy multibeam returns into continuous, analysis-ready terrain models is the gridding stage of the Bathymetric Processing & Terrain Modeling pipeline. This workflow operationalizes bathymetric interpolation with a deterministic, memory-bounded architecture: ingest validated depth measurements, apply spatially constrained IDW or Kriging within projected coordinates, and emit Cloud-Optimized GeoTIFFs (COGs) without exceeding container memory limits or introducing datum-induced offsets. Every design decision in this implementation is driven by the constraint that output grids must be bit-for-bit reproducible across re-runs and unambiguously traceable to their source soundings.

Reference Configuration and Specification Table

Parameter	Recommended value	Rationale
Input CRS	Projected metric (UTM or regional Lambert)	Distance-based weights require metric units
Vertical datum	MSL, MLLW, or NAVD88 — explicit in metadata	Prevents silent datum shifts in handoff
Grid resolution	1–5 m (nearshore), 25–50 m (continental shelf)	Match to survey line spacing × 0.5
IDW power exponent	2 (escarpments: 3–4; flat plains: 1.5)	Controls spatial falloff rate
IDW search radius	3× median point spacing	Avoids extrapolation into data voids
Max neighbors (`k`)	50	Caps memory per chunk query
Chunk size	2048 × 2048 pixels	~16 MB float32 per tile at 1 m resolution
Tile overlap	50% of search radius in pixels	Eliminates seam artifacts at tile edges
COG tile size	256 × 256	GDAL streaming default; cloud-range-request optimal
COG compression	`deflate`, predictor 3	Floating-point predictor; ~40–60% size reduction
Python libraries	`numpy 1.26`, `scipy 1.13`, `rasterio 1.3`, `pyproj 3.6`	Tested combination; pin in `requirements.txt`
RMSE threshold (check soundings)	≤ 0.15 m RMS at survey depth	IHO S-44 Order 1a acceptance criterion

Pre-Interpolation Conditioning and CRS Enforcement

Interpolation must never precede rigorous point cloud conditioning. Raw acoustic returns carry water-column noise, nadir spikes, and multipath artifacts that propagate directly into any gridded surface. Apply automated outlier rejection, sound-velocity refraction correction, and vertical datum alignment before the gridding stage — the complete procedure is defined in Point Cloud Filtering for Multibeam Sonar. Only validated, CRS-tagged depth measurements should enter the interpolation pipeline.

CRS validation is not optional. Spatial weighting functions in both IDW and Kriging compute Euclidean distances. Feeding geographic coordinates (WGS84 lat/lon decimal degrees) into a distance-based kernel introduces metric distortion that grows with latitude — at 60° N, one degree of latitude is ~111 km but one degree of longitude is ~55 km, breaking isotropy assumptions and producing systematically elongated prediction footprints. The pipeline must detect the input CRS, transform to a local projected CRS using pyproj.Transformer following the same conventions as the project-wide CRS alignment workflow, and carry vertical datum metadata (MSL, MLLW, or NAVD88) through every transformation without implicit overwrite. Where soundings arrive on differing chart datums, reconcile them through the tidal datum transformation step before they reach the gridding stage — a vertical offset baked into the input points cannot be recovered after interpolation.

The IDW estimate at a prediction location is the inverse-distance-weighted mean of the $k$ nearest soundings within the search radius:

\hat{z}(\mathbf{x}_0) = \frac{\sum_{i=1}^{k} d_i^{-p}\, z_i}{\sum_{i=1}^{k} d_i^{-p}}, \qquad d_i = \lVert \mathbf{x}_0 - \mathbf{x}_i \rVert_2

where $p$ is the power exponent and $d_i$ is the Euclidean distance — which is only meaningful once $\mathbf{x}_0$ and $\mathbf{x}_i$ share a projected metric CRS, the reason the guard above raises on geographic input.

import logging
from pyproj import CRS, Transformer

log = logging.getLogger(__name__)

def enforce_projected_crs(
    points_xy: "np.ndarray",
    source_epsg: int,
    target_epsg: int
) -> "np.ndarray":
    """
    Reproject points from source_epsg to target_epsg.
    Raises ValueError if target CRS is geographic (non-metric).
    """
    target_crs = CRS.from_epsg(target_epsg)
    if target_crs.is_geographic:
        raise ValueError(
            f"Target EPSG:{target_epsg} is geographic — interpolation requires a projected metric CRS."
        )
    transformer = Transformer.from_crs(
        CRS.from_epsg(source_epsg),
        target_crs,
        always_xy=True
    )
    xs, ys = transformer.transform(points_xy[:, 0], points_xy[:, 1])
    log.info("Reprojected %d points EPSG:%d → EPSG:%d", len(xs), source_epsg, target_epsg)
    return np.column_stack([xs, ys])

Algorithm Selection

Deterministic methods (IDW, Triangulated Irregular Networks) and geostatistical methods (Ordinary Kriging, Universal Kriging) exhibit distinct computational and accuracy profiles. The decision matrix below governs algorithm routing in production:

IDW scales efficiently with point density but produces bullseye artifacts in areas with wide track spacing and a power exponent that is too low. Ordinary Kriging models spatial autocorrelation and provides prediction variance, but variogram fitting adds configuration overhead and naive implementations scale O(N²) without neighborhood approximations. For production pipelines spanning continental shelf to abyssal plain transitions, the routing rule is: use IDW when point spacing is at most 1.5× the target grid resolution; switch to Kriging when a fitted variogram shows a nugget-to-sill ratio below 0.3, indicating strong spatial structure worth capturing. Detailed parameterization and variogram fitting are covered in Kriging vs IDW for Bathymetry Interpolation.

Memory-Bounded Python Implementation

The implementation below enforces strict memory ceilings through spatial chunking and a pre-built cKDTree spatial index. Overlapping tiles prevent seam artifacts at chunk boundaries; the overlap width equals the search radius in pixels. The diagram below shows why the overlap is necessary and how only the inner crop of each expanded tile is written to the COG.

import logging
from typing import Any

import numpy as np
import rasterio
from rasterio.enums import Resampling
from rasterio.transform import from_origin
from rasterio.windows import Window
from scipy.spatial import cKDTree

log = logging.getLogger(__name__)


def interpolate_bathy_chunked(
    points_xy: np.ndarray,      # shape (N, 2), projected metric coordinates
    points_z: np.ndarray,       # shape (N,), depth values (positive-down or sign-convention consistent)
    bounds: tuple[float, float, float, float],  # (minx, miny, maxx, maxy)
    resolution: float,
    crs_epsg: int,
    search_radius: float,
    idw_power: float = 2.0,
    max_points: int = 50,
    chunk_px: int = 2048,
    output_cog: str = "seafloor_dem.tif",
    nodata: float = float("nan"),
) -> dict[str, Any]:
    """
    Memory-bounded bathymetric IDW interpolation with COG output.

    Partitions the survey extent into overlapping tiles, computes local IDW
    grids using cKDTree radius queries, and writes a tiled, compressed GeoTIFF.
    Overlap of search_radius prevents discontinuities at tile seams.
    """
    if points_xy.shape[0] == 0:
        raise ValueError("points_xy is empty — no soundings to interpolate.")

    minx, miny, maxx, maxy = bounds
    width = int(np.ceil((maxx - minx) / resolution))
    height = int(np.ceil((maxy - miny) / resolution))

    if width < 1 or height < 1:
        raise ValueError(f"Grid dimensions ({width}×{height}) are invalid for bounds {bounds}.")

    # Overlap in pixels to eliminate tile seam artifacts
    overlap_px = max(1, int(np.ceil(search_radius / resolution)))

    transform = from_origin(minx, maxy, resolution, resolution)

    # Build the spatial index once; fits in RAM for typical survey extents (<500 M points)
    log.info("Building cKDTree for %d soundings …", len(points_z))
    tree = cKDTree(points_xy)

    profile: dict[str, Any] = {
        "driver": "GTiff",
        "dtype": "float32",
        "width": width,
        "height": height,
        "count": 1,
        "crs": f"EPSG:{crs_epsg}",
        "transform": transform,
        "tiled": True,
        "blockxsize": 256,
        "blockysize": 256,
        "compress": "deflate",
        "predictor": 3,     # floating-point predictor for float32 — ~40-60% extra compression
        "nodata": nodata,
    }

    with rasterio.open(output_cog, "w", **profile) as dst:
        for row_off in range(0, height, chunk_px):
            for col_off in range(0, width, chunk_px):
                # Expanded tile (includes overlap padding)
                exp_col = max(0, col_off - overlap_px)
                exp_row = max(0, row_off - overlap_px)
                exp_w = min(chunk_px + 2 * overlap_px, width - exp_col)
                exp_h = min(chunk_px + 2 * overlap_px, height - exp_row)

                col_idx = np.arange(exp_w)
                row_idx = np.arange(exp_h)
                cols, rows = np.meshgrid(col_idx, row_idx)

                grid_x = minx + (exp_col + cols) * resolution + resolution / 2.0
                grid_y = maxy - (exp_row + rows) * resolution - resolution / 2.0
                query_pts = np.column_stack([grid_x.ravel(), grid_y.ravel()])

                # Query up to max_points neighbors within search_radius
                distances, indices = tree.query(
                    query_pts,
                    k=max_points,
                    distance_upper_bound=search_radius,
                    workers=-1,    # use all CPU threads for query; safe with read-only tree
                )

                valid = np.isfinite(distances)
                safe_idx = np.where(valid, indices, 0)
                # Avoid zero-division at coincident points; 1e-6 m is sub-millimetre
                weights = np.where(valid, 1.0 / np.maximum(distances, 1e-6) ** idw_power, 0.0)
                weight_sum = weights.sum(axis=1)

                z_neighbors = points_z[safe_idx]
                cell_z = np.full(len(query_pts), nodata, dtype=np.float32)
                has_data = weight_sum > 0
                cell_z[has_data] = (
                    (weights * z_neighbors).sum(axis=1)[has_data] / weight_sum[has_data]
                )

                # Trim overlap back to the actual (non-padded) tile
                inner_col_start = col_off - exp_col
                inner_row_start = row_off - exp_row
                inner_w = min(chunk_px, width - col_off)
                inner_h = min(chunk_px, height - row_off)

                tile_z = cell_z.reshape(exp_h, exp_w)
                inner_tile = tile_z[
                    inner_row_start: inner_row_start + inner_h,
                    inner_col_start: inner_col_start + inner_w,
                ]
                dst.write(inner_tile.astype(np.float32), 1, window=Window(col_off, row_off, inner_w, inner_h))

                log.debug(
                    "Wrote tile col=%d row=%d size=%dx%d coverage=%.1f%%",
                    col_off, row_off, inner_w, inner_h,
                    100.0 * has_data.sum() / len(has_data),
                )

        # Build internal overviews — mandatory for COG compliance
        log.info("Building overview pyramids …")
        dst.build_overviews([2, 4, 8, 16, 32], Resampling.average)
        dst.update_tags(ns="rio_overview", resampling="average")

    log.info("COG written: %s (%d×%d)", output_cog, width, height)
    return {"output": output_cog, "width": width, "height": height, "crs_epsg": crs_epsg}

Key implementation decisions:

cKDTree built once, queried per tile. The tree construction is O(N log N) and the resulting object is read-only, enabling thread-safe multi-worker queries via workers=-1.
Overlap padding. Each expanded tile includes overlap_px extra rows/columns on all sides; only the inner crop is written. This eliminates the hard discontinuities that appear when adjacent IDW tiles have different neighbor sets at their shared boundary.
predictor=3 for float32. GDAL’s horizontal differencing predictor on floating-point data typically delivers 40–60% additional compression over raw deflate on smooth terrain. Omitting it is a common performance oversight.
nodata=NaN. NaN is unambiguous in downstream rasterio/xarray reads. Integer nodata sentinels (−9999 cast to float32) accumulate rounding errors and corrupt statistics.

Validation Gates and Quality Control

A grid that passes visual inspection can still carry systematic depth offsets. Three mandatory checks must run before any output is staged for downstream use.

1. Coverage ratio

import rasterio
import numpy as np

def check_coverage(cog_path: str, min_coverage: float = 0.90) -> float:
    """Return filled-cell fraction; raise if below threshold."""
    with rasterio.open(cog_path) as src:
        data = src.read(1, masked=True)
    filled = float((~data.mask).sum()) / data.size
    if filled < min_coverage:
        raise RuntimeError(
            f"Coverage {filled:.2%} below threshold {min_coverage:.2%}. "
            "Expand search_radius or reduce target resolution."
        )
    return filled

2. Control-point RMSE

Compare grid-interpolated depths against independently held-out check soundings. The acceptance metric is the root-mean-square residual between sampled grid depths $\hat{z}_j$ and reference depths $z_j$ at the $M$ check locations:

\mathrm{RMSE} = \sqrt{\frac{1}{M}\sum_{j=1}^{M}\left(\hat{z}_j - z_j\right)^2}

IHO S-44 Order 1a specifies a total vertical uncertainty (TVU) of ±0.25 m at 95% confidence (roughly ±0.15 m RMS) in depths ≤ 100 m.

def check_rmse(
    cog_path: str,
    check_xy: np.ndarray,   # shape (M, 2), projected metric
    check_z: np.ndarray,    # shape (M,), reference depths
    max_rmse: float = 0.15,
) -> float:
    """Sample grid at check-sounding locations and compute RMSE."""
    import rasterio
    from rasterio.sample import sample_gen

    with rasterio.open(cog_path) as src:
        sampled = np.array(
            [v[0] for v in sample_gen(src, ((x, y) for x, y in check_xy))]
        )

    valid = np.isfinite(sampled) & np.isfinite(check_z)
    if valid.sum() < 10:
        raise RuntimeError(f"Fewer than 10 valid check-sounding intersections ({valid.sum()}).")

    rmse = float(np.sqrt(np.mean((sampled[valid] - check_z[valid]) ** 2)))
    if rmse > max_rmse:
        raise RuntimeError(
            f"Control-point RMSE {rmse:.4f} m exceeds threshold {max_rmse:.4f} m."
        )
    return rmse

3. Void-patch audit

Scattered NaN cells are acceptable data voids. Large contiguous NaN patches indicate a search radius that is too tight for the point spacing in that region, or a data gap requiring a different interpolation strategy (TIN extrapolation, nearest-neighbor gap-fill with explicit flagging).

def audit_void_patches(cog_path: str, max_void_patch_px: int = 500) -> int:
    """Return count of void patches larger than max_void_patch_px cells."""
    from scipy.ndimage import label

    with rasterio.open(cog_path) as src:
        data = src.read(1)

    void_mask = ~np.isfinite(data)
    labeled, num_features = label(void_mask)
    sizes = np.bincount(labeled.ravel())[1:]  # exclude background label 0
    large = int((sizes > max_void_patch_px).sum())
    if large > 0:
        log.warning("%d void patches exceed %d px — consider widening search_radius.", large, max_void_patch_px)
    return large

Cross-link to sibling pages that address artifact sources feeding into these failures: Removing Bathymetric Artifacts and Noise and Surface Smoothing Algorithms in Python.

Common Failure Modes and Diagnosis

Bullseye rings around isolated soundings

Symptom: Concentric circular contours radiating outward from individual depth measurements, visible in hillshade renderings.

Root cause: IDW power exponent is too high (≥ 4) in low-density regions, concentrating nearly all weight on the single nearest neighbor and producing a peaked artifact.

Remediation: Reduce idw_power to 2.0 for open-ocean data. For mixed-density surveys, implement density-adaptive power: compute local point density in a 5× search-radius window and reduce the exponent proportionally where density falls below one point per grid cell.

COG fails `rio-cogeo` validation despite `tiled=True`

Symptom: rio-cogeo info --strict reports “Overview levels are not consistent with tiling.”

Root cause: build_overviews was called on a file opened in write mode before the final close(). GDAL writes overview data into the IFD chain after the main image data, which leaves the file in a non-streamable layout.

Remediation: Close the file after the main write, reopen in read mode, build overviews, then call copy with copy_src_overviews=True via rasterio.shutil.copy.

import rasterio
from rasterio.enums import Resampling
from rasterio.shutil import copy as rio_copy

# After the main write loop completes and file is closed:
tmp_path = output_cog + ".tmp.tif"
import os; os.rename(output_cog, tmp_path)

with rasterio.open(tmp_path, "r+") as src:
    src.build_overviews([2, 4, 8, 16, 32], Resampling.average)
    src.update_tags(ns="rio_overview", resampling="average")

rio_copy(tmp_path, output_cog, copy_src_overviews=True, driver="GTiff",
         tiled=True, blockxsize=256, blockysize=256, compress="deflate", predictor=3)
os.remove(tmp_path)

Systematic depth offset at datum boundaries

Symptom: Grid values differ from known depths by a constant offset (e.g., +0.8 m) in a specific geographic zone.

Root cause: Input soundings carry mixed vertical datums (e.g., MSL in the offshore zone versus NAVD88 in the nearshore). The pipeline merged them without datum alignment.

Remediation: Tag each sounding with its vertical datum identifier before merging. Use pyproj.Transformer with a compound CRS (horizontal + vertical) to transform all soundings to a single vertical datum before the cKDTree is built. Vertical datum grids for CONUS are available in the NOAA VDatum tool and can be accessed programmatically via pyproj’s PROJ network access.

OOM crash during `cKDTree` construction

Symptom: MemoryError during cKDTree(points_xy) with surveys exceeding ~300 M points.

Root cause: scipy.spatial.cKDTree loads all points into RAM. For continental-scale surveys this can exceed available node memory.

Remediation: Partition the survey into spatial quadrants using a bounding-box split, build one tree per quadrant, and process tiles that intersect each quadrant against the appropriate local tree. Alternatively, use pykdtree (GPU-accelerated) or a DaskKDTree wrapper that keeps the index on disk.

Pipeline Integration and Downstream Handoff

The COG produced by this stage is consumed by two immediate downstream steps:

Surface smoothing — Surface Smoothing Algorithms in Python applies anisotropic filtering to suppress interpolation ringing before hydrodynamic modeling or habitat mapping.
Artifact detection — Removing Bathymetric Artifacts and Noise runs a second-pass audit of the gridded surface to identify residual acquisition striping that was not caught by the point-level filter.

The metadata manifest that accompanies the COG must include:

{
  "source_survey_id": "NOAA-SURVEY-2024-001",
  "input_crs_epsg": 32619,
  "output_crs_epsg": 32619,
  "vertical_datum": "MLLW",
  "resolution_m": 2.0,
  "interpolation_method": "IDW",
  "idw_power": 2.0,
  "search_radius_m": 15.0,
  "coverage_ratio": 0.97,
  "check_point_rmse_m": 0.09,
  "void_patches_gt500px": 0,
  "processing_timestamp_utc": "2026-06-25T14:22:00Z",
  "output_cog": "gs://agency-survey-bucket/2024-001/seafloor_dem_2m.tif"
}

Store this manifest as a sidecar JSON file alongside the COG in object storage. The tiled GeoTIFF is the right container for a single gridded surface; when the deliverable is instead a multi-resolution or time-varying stack, weigh the trade-offs in NetCDF vs GeoTIFF for marine data before committing the output format. Downstream pipeline stages (smoothing, artifact removal, habitat classification) must validate the vertical_datum and output_crs_epsg fields before consuming the grid — silent datum mismatches are the most common cause of systematic errors in multi-survey mosaics.

For containerized deployments, resource limits of memory: 8Gi and cpu: 4 accommodate a 100 km² survey at 2 m resolution processed in 2048-pixel chunks. Concurrent survey processing via Airflow or Prefect should enforce concurrency limits at the DAG level to prevent node memory contention.

Kriging vs IDW for Bathymetry Interpolation — algorithm-level parameterization and variogram fitting
Point Cloud Filtering for Multibeam Sonar — upstream conditioning that this stage depends on
Surface Smoothing Algorithms in Python — post-interpolation regularization
Removing Bathymetric Artifacts and Noise — second-pass audit on the gridded surface

Up: Bathymetric Processing & Terrain Modeling

DEM Interpolation Techniques for Seafloor Mapping

Reference Configuration and Specification Table #

Pre-Interpolation Conditioning and CRS Enforcement #

Algorithm Selection #

Memory-Bounded Python Implementation #

Validation Gates and Quality Control #

1. Coverage ratio #

2. Control-point RMSE #

3. Void-patch audit #

Common Failure Modes and Diagnosis #

Bullseye rings around isolated soundings #

COG fails rio-cogeo validation despite tiled=True #

Systematic depth offset at datum boundaries #

OOM crash during cKDTree construction #

Pipeline Integration and Downstream Handoff #

Related #