Removing Bathymetric Artifacts and Noise

Removing bathymetric artifacts and noise is a deterministic preprocessing requirement in automated coastal and marine spatial pipelines. Raw multibeam echosounder (MBES), singlebeam (SBES), and airborne LiDAR-derived bathymetry routinely contain acquisition spikes, nadir voids, cross-track striping, and edge ringing. These anomalies originate from vessel motion compensation errors, sound velocity profile miscalibration, multipath reflections, or gridding interpolation artifacts. If unfiltered, they propagate into derivative terrain products, corrupting slope calculations, rugosity metrics, benthic habitat classification, and hydrodynamic boundary conditions. This workflow defines a production-grade, memory-constrained pipeline for isolating and suppressing surface-level anomalies prior to downstream modeling. The methodology operates within the broader Bathymetric Processing & Terrain Modeling framework and assumes ingestion of pre-aligned, CRS-validated raster or dense point-derived grids.

Pipeline Initialization & Memory Architecture

Large-scale seafloor grids routinely exceed available RAM when loaded as monolithic arrays. Pipeline stability requires explicit chunking, lazy evaluation, and strict coordinate reference system (CRS) enforcement before any filtering operation executes.

Input Format Support: The pipeline accepts Cloud-Optimized GeoTIFF (COG), IHO BAG v2, and tiled XYZ formats. All inputs must contain explicit spatial reference metadata and valid NoData flags.

CRS Enforcement: All inputs must be projected to a metric coordinate system (e.g., UTM zone, EPSG:326xx) prior to artifact detection. Geographic coordinates (EPSG:4326) introduce non-linear distance distortion that invalidates gradient calculations, morphological kernel sizing, and statistical distance thresholds. Pipeline initialization must explicitly reject or transform non-metric projections.

Chunking Strategy: Processing executes on 512×512 or 1024×1024 pixel blocks aligned to native tile boundaries. Overlap buffers of 16–32 pixels are mandatory to prevent edge discontinuities during neighborhood operations. Overlap regions are discarded post-filtering to maintain spatial continuity across tile seams.

Memory Budget: Peak RAM consumption must remain below 75% of available system memory. Depth values are cast to float32 at ingestion; float64 is unnecessary for bathymetric precision and doubles I/O overhead. Lazy evaluation via Dask arrays prevents memory spikes during multi-pass filtering.

Artifact Taxonomy & Detection Logic

Artifact isolation relies on multi-scale statistical thresholds and morphological operations rather than manual digitization. The detection logic targets four primary noise classes:

  1. Acquisition Spikes: Single-cell or small-cluster depth excursions exceeding local standard deviation by >3.5σ. Typically caused by multipath reflections, air bubbles, or biological scatterers. Detection logic for these excursions is detailed in Automated Spike Removal in Sonar Datasets.
  2. Nadir Gaps & Cross-Track Striping: Systematic voids or linear depressions along the vessel trackline where beam geometry yields poor coverage or where swath overlap is insufficient. Detected via directional gradient analysis and periodicity transforms.
  3. Edge Ringing: Gibbs-like oscillations at survey boundaries or interpolation seams. Identifiable through high-frequency Fourier decomposition and localized Laplacian variance mapping.
  4. Interpolation Artifacts: Blocky or terraced surfaces generated by aggressive nearest-neighbor or linear gridding on sparse point clouds. Suppressed via adaptive median filtering and curvature-constrained smoothing.

Upstream point-cloud quality directly dictates grid-level artifact frequency. Implementing rigorous Point Cloud Filtering for Multibeam Sonar prior to gridding reduces the computational load on raster-level artifact suppression by 40–60%.

Production-Grade Python Implementation

The following implementation demonstrates a chunked, memory-safe pipeline using rasterio, numpy, scipy.ndimage, and dask. It enforces CRS validation, applies overlap-aware filtering, and outputs a cleaned depth raster alongside a binary artifact mask.

import numpy as np
import rasterio
from rasterio.windows import Window
from scipy.ndimage import median_filter, generic_filter, binary_dilation
from dask import array as da
import warnings

def validate_crs(src: rasterio.io.DatasetReader) -> None:
    """Enforce metric projection. Reject EPSG:4326 or undefined CRS."""
    if src.crs is None or src.crs.is_geographic:
        raise ValueError("Input must be in a metric projected CRS (e.g., UTM). Geographic coordinates invalidate gradient thresholds.")

def process_chunk(chunk: np.ndarray, nodata: float, sigma_thresh: float = 3.5) -> tuple:
    """Apply statistical and morphological filtering to a single raster block."""
    # Mask NoData
    mask = np.isclose(chunk, nodata)
    valid_data = np.where(mask, np.nan, chunk)
    
    # 1. Spike Detection & Removal (Local Median + Sigma Threshold)
    local_median = median_filter(valid_data, size=5)
    local_std = generic_filter(valid_data, lambda x: np.nanstd(x), size=5)
    spike_mask = np.abs(valid_data - local_median) > (sigma_thresh * local_std)
    
    # Replace spikes with local median
    cleaned = np.where(spike_mask, local_median, valid_data)
    
    # 2. Morphological Smoothing for Striping/Ringing
    # Apply directional median to suppress linear artifacts without blurring terrain
    cleaned = median_filter(cleaned, size=3)
    
    # 3. Generate Composite Artifact Mask
    artifact_mask = mask | spike_mask
    # Dilate slightly to ensure edge contamination is captured in downstream masking
    artifact_mask = binary_dilation(artifact_mask, structure=np.ones((3,3)))
    
    return cleaned, artifact_mask.astype(np.uint8)

def run_pipeline(input_path: str, output_path: str, mask_path: str, chunk_size: int = 512, overlap: int = 16):
    """Execute chunked bathymetric artifact removal pipeline."""
    with rasterio.open(input_path) as src:
        validate_crs(src)
        profile = src.profile.copy()
        profile.update(dtype='float32', nodata=np.nan)
        
        with rasterio.open(output_path, 'w', **profile) as out, \
             rasterio.open(mask_path, 'w', **{**profile, 'dtype': 'uint8', 'nodata': 0}) as mask_out:
            
            for ji, window in src.block_windows(1):
                # Expand window for overlap
                w = window
                col_off = max(0, w.col_off - overlap)
                row_off = max(0, w.row_off - overlap)
                w_width = min(src.width - col_off, w.width + 2 * overlap)
                w_height = min(src.height - row_off, w.height + 2 * overlap)
                
                read_window = Window(col_off, row_off, w_width, w_height)
                data = src.read(1, window=read_window).astype(np.float32)
                
                # Process chunk
                cleaned, artifact_mask = process_chunk(data, src.nodata)
                
                # Trim overlap to match original window
                trim = slice(overlap, overlap + w.width) if col_off > 0 else slice(0, w.width)
                trim_r = slice(overlap, overlap + w.height) if row_off > 0 else slice(0, w.height)
                
                out.write(cleaned[trim_r, trim], 1, window=w)
                mask_out.write(artifact_mask[trim_r, trim], 1, window=w)

Implementation Notes:

  • float32 casting at read time prevents memory bloat. np.nan propagation ensures NoData regions do not contaminate neighborhood statistics.
  • The process_chunk function isolates spikes using a 5×5 local median and σ-thresholding, then applies a secondary 3×3 median pass to suppress cross-track striping.
  • Overlap trimming guarantees seamless tile stitching. For datasets exceeding 50 GB, replace the rasterio block iterator with dask.array.from_array and map_blocks for fully lazy execution.

Validation & Downstream Integration

Artifact suppression must be validated before integration into terrain modeling workflows. Pipeline output requires three quantitative checks:

  1. Mask Coverage Ratio: Artifact mask coverage should typically remain below 8–12% of total grid area. Higher percentages indicate upstream acquisition failures or overly aggressive thresholding.
  2. Residual Variance: Compute slope variance before and after filtering. A successful run reduces high-frequency slope variance by ≥35% without flattening geomorphic features (e.g., reef crests, channel margins).
  3. Continuity Verification: Cross-track profiles must show no artificial terracing or step functions. Use directional variograms to confirm isotropic spatial continuity.

Validated outputs feed directly into DEM Interpolation Techniques for Seafloor Mapping, where cleaned depth grids serve as primary inputs for TIN generation, kriging, or spline-based surface reconstruction. Adherence to IHO S-44 Standards for Hydrographic Surveys ensures artifact thresholds align with recognized depth accuracy requirements for coastal engineering and habitat mapping.

For operational deployment, integrate this pipeline into CI/CD workflows using containerized environments with pinned scipy.ndimage and rasterio versions. Monitor memory utilization via resource.getrusage and enforce automatic retry on chunk-level I/O failures.