Dual-Camera LiDAR Fusion

Abstract

Three-dimensional object detection from LiDAR point clouds is a cornerstone of autonomous driving perception, yet single-sensor systems remain vulnerable to occlusion in complex urban environments. This paper proposes a symmetric dual-camera LiDAR fusion framework that combines PointPillar and CenterPoint 3D LiDAR detectors with YOLOv8-based 2D detections from two complementary camera viewpoints: a drone (top-down, 40m altitude) and a subject-vehicle forward camera. The fusion operates at the decision level (late fusion), where camera-confirmed LiDAR detections receive confidence boosts while unconfirmed low-confidence detections are suppressed. The full dual-camera fusion achieves +0.92 pp mAP@0.5 (+4.4% relative; sign test p = 0.001, t-test p < 0.0001), with all ten seeds showing positive improvement.

+4.4%

mAP@0.5 Improvement

10/10

Seeds Positive

35,837

Object Annotations

2,600

CARLA Frames

System overview: symmetric dual-camera LiDAR fusion combining PointPillar and CenterPoint 3D detection with YOLOv8 drone and forward-camera 2D detections for occlusion-robust perception.

Method

Symmetric dual-camera LiDAR fusion pipeline

Overview of the proposed dual-camera LiDAR fusion pipeline. The PointPillar and CenterPoint 3D detectors and two independent YOLOv8 2D detectors produce detections from their respective sensors. The symmetric late-fusion module refines confidence scores based on uniform boost and suppress rules across both cameras.

Sensor configuration in CARLA Town10HD: ego vehicle with 64-channel LiDAR + forward camera, supplemented by a drone camera at 40m altitude.

Dual YOLOv8 detector architecture: independent models process drone (top-down) and forward camera (SDC) views.

Three-Stage Pipeline

3D LiDAR Detection (PointPillar + CenterPoint) — Processes ego vehicle’s 64-channel LiDAR point cloud into 3D bounding boxes with class labels and confidence scores using two complementary architectures
2D Camera Detection (YOLOv8) — Two independent YOLOv8m models process drone (top-down, 40m) and forward camera images, fine-tuned on CARLA-rendered data
Symmetric Late Fusion — Camera-confirmed LiDAR detections receive confidence boosts (×1.15 single-cam, ×1.30 dual-cam), while unconfirmed low-confidence detections are suppressed (×0.75). Both cameras apply boost and suppress operations uniformly

Key Finding: Symmetric > Asymmetric

Counter-intuitively, applying both boost and suppress operations to both cameras outperforms asymmetric designs where the forward camera is restricted to boost-only. The forward camera’s value lies entirely in its ability to suppress false positives, not in boosting true positives.

Results

Evaluated on a CARLA Town10HD dataset with 2,600 frames, 35,837 annotations (Car + Pedestrian), and ten-seed repeated random sub-sampling validation.

PointPillar Results (10-seed average)

Configuration	Car AP	Ped AP	mAP@0.5	Δ	Significance
LiDAR-only	38.50	2.69	20.76 ± 0.38	—	—
+ SDC (boost-only)	38.50	2.69	20.75 ± 0.38	-0.01	n.s.
+ Drone	40.21	2.57	21.39 ± 0.35	+0.63	p = 0.001*
+ Symmetric	40.74	2.61	21.68 ± 0.31	+0.92	p = 0.001*

CenterPoint Results (10-seed average)

Configuration	Car AP	Ped AP	mAP@0.5	Δ	Significance
LiDAR-only	39.04	5.56	22.30 ± 0.28	—	—
+ SDC (boost-only)	38.94	5.55	22.25 ± 0.25	-0.05	n.s.
+ Drone	40.39	5.55	22.97 ± 0.32	+0.67	p = 0.001*
+ Symmetric	40.54	5.54	23.04 ± 0.31	+0.74	p = 0.001*

Ten-seed averaged mAP@0.5 with standard deviation error bars across PointPillar and CenterPoint detectors

Improvement over LiDAR-only baseline across all metrics and configurations. Bold borders indicate statistical significance.

Per-Seed Consistency

Per-seed mAP@0.5 consistency. All fusion configurations consistently outperform the LiDAR-only baseline across all ten random seeds (10/10 positive, sign test p = 0.001).

Camera Contribution Ablation (PointPillar)

Configuration	mAP@0.5	Δ (pp)	Relative
LiDAR-only	20.76	—	—
+ SDC camera (boost only)	20.75	-0.01	n.s.
+ Drone camera (symmetric)	21.39	+0.63	+3.0%
+ Both cameras (symmetric)	21.68	+0.92	+4.4%

The drone camera drives the majority of improvement, while the forward camera’s value lies in its ability to suppress false positives rather than boost true positives. Symmetric fusion outperforms asymmetric across both PointPillar and CenterPoint detectors, with all gains statistically significant (sign test p = 0.001, t-test p < 0.0001).

Sensitivity analysis of fusion hyperparameters (boost factor, suppress factor, confidence threshold). The optimal symmetric configuration achieves +0.92 pp mAP@0.5 improvement.

Qualitative Results

Representative scenarios showing LiDAR-only (left) vs dual-camera fused (right) detections. Fusion recovers occluded vehicles and suppresses false positives in cluttered scenes.

False positive suppression: the drone camera's overhead view confirms or denies LiDAR detections, reducing spurious boxes in occluded regions.

Multi-View Sample Pairs

Synchronized samples from all three sensors show how complementary viewpoints resolve ambiguities:

Drone camera view (40m altitude) — complete overhead coverage

Forward camera (SDC) view — frontal sector with depth perception

Side-by-side drone and SDC views for the same frame, showing complementary coverage areas.

Video Demonstrations

These videos show the fusion system operating in real-time on CARLA Town10HD sequences.

Driving sequence: ego vehicle navigating through urban traffic with multi-sensor detections overlaid.

Drone camera detection: YOLOv8 detections from the overhead perspective at 40m altitude.

Fusion demonstration: combining LiDAR 3D detections with dual-camera 2D confirmations in real-time.

Citation

@article{zhou2026bevfusion,
  title={Dual-Camera LiDAR Fusion for Occlusion-Robust 3D Detection in Urban Driving Simulation},
  author={Zhou, Xingnan and Alecsandru, Ciprian},
  year={2026},
  note={In Preparation}
}