Abstract
Three-dimensional object detection from LiDAR point clouds is a cornerstone of autonomous driving perception,
yet single-sensor systems remain vulnerable to occlusion in complex urban environments.
This paper proposes a symmetric dual-camera LiDAR fusion framework that combines
PointPillar and CenterPoint 3D LiDAR detectors with YOLOv8-based 2D detections from two complementary
camera viewpoints: a drone (top-down, 40m altitude) and a subject-vehicle
forward camera. The fusion operates at the decision level (late fusion),
where camera-confirmed LiDAR detections receive confidence boosts while unconfirmed
low-confidence detections are suppressed. The full dual-camera fusion achieves
+0.92 pp mAP@0.5 (+4.4% relative; sign test p = 0.001, t-test p < 0.0001),
with all ten seeds showing positive improvement.
+4.4%
mAP@0.5 Improvement
35,837
Object Annotations
System overview: symmetric dual-camera LiDAR fusion combining PointPillar and CenterPoint 3D detection with YOLOv8 drone and forward-camera 2D detections for occlusion-robust perception.
Method
Overview of the proposed dual-camera LiDAR fusion pipeline. The PointPillar and CenterPoint 3D detectors
and two independent YOLOv8 2D detectors produce detections from their respective sensors.
The symmetric late-fusion module refines confidence scores based on uniform boost and suppress rules across both cameras.
Sensor configuration in CARLA Town10HD: ego vehicle with 64-channel LiDAR + forward camera, supplemented by a drone camera at 40m altitude.
Dual YOLOv8 detector architecture: independent models process drone (top-down) and forward camera (SDC) views.
Three-Stage Pipeline
-
3D LiDAR Detection (PointPillar + CenterPoint) — Processes ego vehicle’s
64-channel LiDAR point cloud into 3D bounding boxes with class labels and confidence scores using two complementary architectures
-
2D Camera Detection (YOLOv8) — Two independent YOLOv8m models
process drone (top-down, 40m) and forward camera images, fine-tuned on CARLA-rendered data
-
Symmetric Late Fusion — Camera-confirmed LiDAR detections receive
confidence boosts (×1.15 single-cam, ×1.30 dual-cam), while
unconfirmed low-confidence detections are suppressed (×0.75). Both cameras apply boost and suppress operations uniformly
Key Finding: Symmetric > Asymmetric
Counter-intuitively, applying both boost and suppress operations to
both cameras outperforms asymmetric designs where the forward camera is restricted
to boost-only. The forward camera’s value lies entirely in its ability to
suppress false positives, not in boosting true positives.
Results
Evaluated on a CARLA Town10HD dataset with 2,600 frames, 35,837 annotations (Car + Pedestrian),
and ten-seed repeated random sub-sampling validation.
PointPillar Results (10-seed average)
| Configuration |
Car AP |
Ped AP |
mAP@0.5 |
Δ |
Significance |
| LiDAR-only |
38.50 |
2.69 |
20.76 ± 0.38 |
— |
— |
| + SDC (boost-only) |
38.50 |
2.69 |
20.75 ± 0.38 |
-0.01 |
n.s. |
| + Drone |
40.21 |
2.57 |
21.39 ± 0.35 |
+0.63 |
p = 0.001* |
| + Symmetric |
40.74 |
2.61 |
21.68 ± 0.31 |
+0.92 |
p = 0.001* |
CenterPoint Results (10-seed average)
| Configuration |
Car AP |
Ped AP |
mAP@0.5 |
Δ |
Significance |
| LiDAR-only |
39.04 |
5.56 |
22.30 ± 0.28 |
— |
— |
| + SDC (boost-only) |
38.94 |
5.55 |
22.25 ± 0.25 |
-0.05 |
n.s. |
| + Drone |
40.39 |
5.55 |
22.97 ± 0.32 |
+0.67 |
p = 0.001* |
| + Symmetric |
40.54 |
5.54 |
23.04 ± 0.31 |
+0.74 |
p = 0.001* |
Ten-seed averaged mAP@0.5 with standard deviation error bars across PointPillar and CenterPoint detectors
Improvement over LiDAR-only baseline across all metrics and configurations. Bold borders indicate statistical significance.
Per-Seed Consistency
Per-seed mAP@0.5 consistency. All fusion configurations consistently outperform
the LiDAR-only baseline across all ten random seeds (10/10 positive, sign test p = 0.001).
Camera Contribution Ablation (PointPillar)
| Configuration |
mAP@0.5 |
Δ (pp) |
Relative |
| LiDAR-only | 20.76 | — | — |
| + SDC camera (boost only) | 20.75 | -0.01 | n.s. |
| + Drone camera (symmetric) | 21.39 | +0.63 | +3.0% |
| + Both cameras (symmetric) | 21.68 | +0.92 | +4.4% |
The drone camera drives the majority of improvement, while the forward camera’s value lies
in its ability to suppress false positives rather than boost true positives.
Symmetric fusion outperforms asymmetric across both PointPillar and CenterPoint detectors,
with all gains statistically significant (sign test p = 0.001, t-test p < 0.0001).
Sensitivity analysis of fusion hyperparameters (boost factor, suppress factor, confidence threshold). The optimal symmetric configuration achieves +0.92 pp mAP@0.5 improvement.
Qualitative Results
Representative scenarios showing LiDAR-only (left) vs dual-camera fused (right) detections. Fusion recovers occluded vehicles and suppresses false positives in cluttered scenes.
False positive suppression: the drone camera's overhead view confirms or denies LiDAR detections, reducing spurious boxes in occluded regions.
Multi-View Sample Pairs
Synchronized samples from all three sensors show how complementary viewpoints resolve ambiguities:
Drone camera view (40m altitude) — complete overhead coverage
Forward camera (SDC) view — frontal sector with depth perception
Side-by-side drone and SDC views for the same frame, showing complementary coverage areas.
Video Demonstrations
These videos show the fusion system operating in real-time on CARLA Town10HD sequences.
Driving sequence: ego vehicle navigating through urban traffic with multi-sensor detections overlaid.
Drone camera detection: YOLOv8 detections from the overhead perspective at 40m altitude.
Fusion demonstration: combining LiDAR 3D detections with dual-camera 2D confirmations in real-time.
Citation
@article{zhou2026bevfusion,
title={Dual-Camera LiDAR Fusion for Occlusion-Robust 3D Detection in Urban Driving Simulation},
author={Zhou, Xingnan and Alecsandru, Ciprian},
year={2026},
note={In Preparation}
}