Using PDAL for Bathymetric Point Cloud Cleaning
Raw multibeam echosounder (MBES) datasets routinely contain water-column backscatter, surface multipath returns, acoustic sidelobe artifacts, and vessel-wake noise. Manual GUI-based cleaning fails at survey scale and breaks automated reproducibility. Using PDAL for Bathymetric Point Cloud Cleaning establishes a deterministic, scriptable workflow that enforces spatial consistency, manages memory overhead, and gates outputs before downstream interpolation. This pipeline targets automated coastal and marine spatial analysis environments where throughput, schema compliance, and CI/CD validation dictate operational viability. It operates under the Pillar: Bathymetric Processing & Terrain Modeling, within the Cluster: Point Cloud Filtering for Multibeam Sonar, and directly informs related workflows including DEM interpolation, surface smoothing, artifact removal, and memory bottleneck resolution.
Pipeline Architecture & Filter Sequencing
PDAL executes filters sequentially through a directed acyclic graph. For bathymetric data, the cleaning sequence must isolate true seafloor returns while preserving acoustic resolution and slope continuity. The production JSON below implements range clipping, statistical outlier rejection, ASPRS-compliant reclassification, and spatial partitioning.
{
"pipeline": [
{
"type": "readers.las",
"filename": "input_survey.laz"
},
{
"type": "filters.range",
"limits": "Z[0:120], Classification[1:9]"
},
{
"type": "filters.outlier",
"method": "statistical",
"mean_k": 24,
"multiplier": 2.0,
"tag": "outlier"
},
{
"type": "filters.assign",
"assignment": "Classification[Classification!=9] = 18"
},
{
"type": "filters.splitter",
"length": 400,
"origin_x": "auto",
"origin_y": "auto"
},
{
"type": "writers.las",
"filename": "cleaned_%.laz",
"compression": "laszip",
"forward": "all"
}
]
}
Operational Notes:
filters.rangeremoves surface returns (Z > tidal datum + max draft) and deep-water noise. Adjust bounds to local bathymetry and survey datum.filters.outlieruses a k-nearest neighbor density estimator. For high-density MBES (>50 pts/m²), increasemean_kto 32–48 to prevent over-filtering of steep slope returns.filters.assignreclassifies non-ground points to ASPRS code 18 (high noise). This preserves original geometry while flagging artifacts for downstream exclusion.filters.splittertiles the dataset into 400m² chunks, enabling parallel execution and preventing heap exhaustion.
Integrating this sequence into broader Bathymetric Processing & Terrain Modeling workflows requires strict adherence to vertical datum alignment before PDAL execution. Misaligned tidal corrections will cause filters.range to clip valid seafloor returns or retain surface noise.
Memory Optimization & Streaming Execution
MBES surveys routinely exceed 40–80 GB. PDAL defaults to in-memory loading, which triggers OOM failures on standard CI runners or edge processing nodes. Force disk-backed streaming and cap heap allocation:
# Enable streaming mode and limit worker threads to prevent memory thrashing
pdal pipeline pipeline.json --streaming --threads 4
For Python-based orchestration, bypass the default pdal.Pipeline in-memory buffer by leveraging pdal.Pipeline.execute() with explicit chunking. Use filters.splitter upstream to force tile-based I/O. Monitor RSS usage; if heap exceeds 75% of available RAM, reduce tile size to 200m and enable --streaming at the CLI level. PDAL’s internal memory manager respects OS-level cgroups, but explicit thread capping prevents Python GIL contention during parallel tile writes.
CRS Alignment & Vertical Datum Enforcement
Bathymetric pipelines fail silently when horizontal and vertical coordinate reference systems diverge. PDAL does not auto-resolve datum shifts. Enforce explicit transformation chains using filters.reprojection or filters.transformation before filtering:
{
"type": "filters.reprojection",
"in_srs": "EPSG:4326+EPSG:5703",
"out_srs": "EPSG:32618+EPSG:5703",
"forward": "all"
}
Always validate vertical datums against local tidal benchmarks (e.g., MLLW, NAVD88, or LAT). Use pdal info --metadata to verify VCS and PCS codes. If your survey provider delivers raw ellipsoidal heights, apply a geoid model via filters.transformation with --grid pointing to a .gtx or .tif geoid file. Failure to normalize Z-values before filters.range execution guarantees datum-induced clipping or artifact retention.
Validation & Quality Gates
Automated cleaning requires deterministic post-processing checks. Implement density thresholds, coverage validation, and classification audits before passing data to Point Cloud Filtering for Multibeam Sonar or interpolation modules.
Use PDAL’s filters.stats to generate summary metrics:
pdal pipeline stats_pipeline.json
Parse the output JSON to enforce CI/CD gates:
- Minimum point density: ≥ 15 pts/m² (shallow), ≥ 5 pts/m² (deep)
- Classification 9 (ground) ratio: ≥ 85%
- Z-range compliance: No values outside survey bounds ± 2m tolerance
Reject tiles that fail these thresholds. Log failures with exact bounding coordinates for manual review. This prevents garbage-in-garbage-out propagation into DEM generation.
Python Integration & CI/CD Orchestration
Embed PDAL directly into Python spatial analysis stacks using the pdal Python bindings. Wrap pipeline execution in retry logic and structured logging:
import pdal
import json
import logging
logging.basicConfig(level=logging.INFO)
def run_bathy_cleaner(pipeline_json: dict, input_file: str) -> bool:
pipeline = pdal.Pipeline(json.dumps(pipeline_json))
try:
pipeline.execute()
stats = pipeline.metadata
ground_ratio = stats['stats']['count']['Classification_9'] / stats['stats']['count']['total']
if ground_ratio < 0.85:
logging.warning(f"Ground ratio {ground_ratio:.2%} below threshold. Flagging tile.")
return False
logging.info("Pipeline executed successfully. Output validated.")
return True
except pdal.PipelineError as e:
logging.error(f"PDAL execution failed: {e}")
return False
Reference the official PDAL Pipeline Documentation for schema validation and filter parameter updates. For vertical datum transformations, consult NOAA VDatum Technical Documentation to ensure geoid offsets match your survey epoch.
Production Deployment Checklist
This architecture eliminates manual intervention, enforces spatial rigor, and scales across distributed marine survey campaigns. Deploy, validate, and iterate.