Prerequisite:
This project builds upon the 2D perception stack developed in:
ROS2 Autonomous Perception Stack — 2D Perception
Extend the perception stack beyond monocular vision by integrating LiDAR data and basic sensor fusion techniques.
Question: What is this object and how far away is it in 3D World?
- Add LiDAR sensor to CARLA vehicle (completed)
- Publish LiDAR point clouds through ROS2 (completed)
- Visualize point cloud data (completed)
- Point cloud processing
CARLA Online RGB Camera + LiDAR Integration
- Extract camera intrinsic parameters (completed)
- Extract LiDAR extrinsic parameters (completed)
- Validate camera–LiDAR synchronization (completed)
- Transform LiDAR points into camera coordinates (completed)
- Project LiDAR points onto image plane (completed)
- Visualize projected LiDAR points on RGB images (completed)
- ✅ Camera intrinsics extracted from ROS2 CameraInfo
- ✅ Camera matrix validated
- ✅ LiDAR extrinsics derived from CARLA sensor configuration
- ✅ ROS2 TimeSynchronizer implemented
- ✅ Exact camera–LiDAR timestamp synchronization verified
- ✅ PointCloud2 parsing implemented
- ✅ LiDAR → camera coordinate transformation implemented
- ✅ Perspective projection implemented
- ✅ Image bounds filtering implemented
- ✅ OpenCV overlay visualization implemented
- ✅ Depth-colored projection visualization implemented
Topic:
/carla/ego_vehicle/rgb_front/image
Resolution:
- 640 × 480
FOV:
- 90°
Position:
- x = -1.5
- y = 0.0
- z = 2.4
Topic:
/carla/ego_vehicle/lidar
Position:
- x = 0.0
- y = 0.0
- z = 2.4
Parameters:
- Range: 50 m
- Channels: 32
- Points/sec: 56000
- Rotation Frequency: 10 Hz
fx = 320.0
fy = 320.0
cx = 320.0
cy = 240.0
Camera Matrix:
| 320 | 0 | 320 |
|---|---|---|
| 0 | 320 | 240 |
| 0 | 0 | 1 |
Used:
- ROS2
message_filters.TimeSynchronizer
Results:
Image TS : 3045.320203
LiDAR TS : 3045.320203
Delta : 0.000 ms
CARLA Online RGB Camera + LiDAR Integration
- Run YOLOv8m-seg TensorRT perception pipeline (completed)
- Associate projected LiDAR points with detected objects (completed)
- Filter object-specific point clusters using segmentation masks (completed)
- Record synchronized camera–LiDAR data (completed)
- Build deterministic camera–LiDAR dataset (completed)
- Implement ROS2 camera–LiDAR replay publisher (completed)
- Validate object association on recorded datasets (completed)
- ✅ YOLOv8m-seg TensorRT inference integrated
- ✅ Segmentation mask extraction implemented
- ✅ Projected LiDAR points associated with segmented objects
- ✅ Object-specific LiDAR point filtering implemented
- ✅ CARLA synchronous recording pipeline implemented
- ✅ Synchronized camera–LiDAR dataset generation implemented
- ✅ Camera image recording implemented
- ✅ LiDAR point cloud recording implemented
- ✅ Frame-level timestamp logging implemented
- ✅ Deterministic offline replay pipeline implemented
- ✅ ROS2 Image publisher implemented
- ✅ ROS2 PointCloud2 publisher implemented
- ✅ Adjustable replay FPS implemented
- ✅ Exact camera–LiDAR timestamp synchronization during replay verified
- ✅ Existing perception pipeline validated on replayed datasets
- ✅ Repeatable offline testing workflow established
Offline detection point clouds
- Compute Monocular Camera Distance (completed)
- Compute LiDAR Distance (completed)
- Compute Sensor Fusion (Camera + LiDAR) Distance (completed)
- ✅ Monocular camera distance estimation implemented
- ✅ LiDAR distance estimation using object-specific LiDAR point clusters implemented
- ✅ Camera and LiDAR distance visualization integrated into perception pipeline
- ✅ Weighted camera–LiDAR fusion distance estimation implemented
- ✅ Object-level distance estimation validated on synchronized camera–LiDAR streams
- ✅ End-to-end RGB + LiDAR distance estimation pipeline established
- Monocular geometric distance estimation
- LiDAR-based object range estimation
- Camera–LiDAR sensor fusion
- Object-level metric scene understanding
- Multi-modal perception validation
Camera, LiDAR, and Fused Distance Estimation
Replay Node
(Publishes 5 sensor topics, cameras, lidar)
│
▼
Synchronization Node
(Receive & validate all sensors)
│
▼
Perception Node
├── Stitch images
├── YOLO inference
├── LiDAR projection
├── 2D–3D association
├── Distance estimation
└── BEV (later)
│
▼
Visualization Node
(Display images, overlays, FPS, BEV)
Expand perception coverage using multiple synchronized cameras.
Question: Can I perceive my surroundings in all directions in 3D World?
- Multi-camera ROS2 integration (completed)
- Camera synchronization (completed)
- Multi-stream visualization (completed)
- Unified perception visualization (completed)
- Multi-camera object detection (completed)
- Project LiDAR onto all camera views (completed)
- Perform detection and extract objects point clouds (completed)
- Estimate Distance of surronding objects (completed)
- ✅ Multi-camera (front, rear, left, right) ROS2 perception pipeline implemented
- ✅ 360° synchronized multi-camera perception established
- ✅ Camera–LiDAR projection implemented for all camera views
- ✅ Multi-camera 2D–3D object association implemented
- ✅ Object-specific LiDAR point cloud extraction implemented
- ✅ Per-object LiDAR distance estimation across all camera views implemented
- ✅ Unified 360° perception visualization with detections, point clouds, and distance overlays integrated
- ✅ End-to-end 360° camera–LiDAR perception pipeline established
Combine camera, LiDAR, and multi-camera perception into a unified perception system.
Question : Where is everything relative to me in a unified world representation?
- Objects localization in the ego coordinate frame (completed)
- Unified world representation (completed)
- Bird's-Eye View generation (completed)
- Improvement to phase9.1
- Overlapping field-of-view analysis
- merge object if seeing twice in cameras intersection
- compare it in bev, 9.1 and 9.2
- how good is perception.
- identify each object differently (may be yolo instance segmentation)
- get ground truth distances from the Carla simulator
- compare it with the phase 9.2/phase9.1
- and check the difference
Build a complete multi-modal perception stack resembling modern autonomous systems.
Once the full multi-camera + LiDAR system is in place:
Evaluate distance against ground truth. Compare single-camera vs multi-camera vs fused estimates. Analyze how BEV/world representation affects localization. Optimize the distance estimation strategy if needed.
Which object is which over time? deepsort etc..
Deploy the optimized perception stack on embedded edge hardware.
- NVIDIA Jetson Nano
- NVIDIA Jetson Xavier NX
- NVIDIA Jetson Orin Nano
- Containerize perception pipeline
- Prepare deployment scripts
- Package ROS2 perception nodes
- Validate TensorRT deployment workflow
- Optimize memory usage
- Optimize power consumption
- Tune TensorRT inference settings
- Analyze thermal behavior
- Evaluate deployment constraints
| Platform | FPS | Latency | GPU Utilization | Memory Usage |
|---|---|---|---|---|
| Desktop GPU | ||||
| Jetson Nano | ||||
| Xavier NX | ||||
| Orin Nano |
- Real-time FPS
- End-to-end latency
- Memory footprint
- Power efficiency
- Thermal stability
- Continuous perception testing
- Long-duration stability testing
- Resource monitoring
- Failure analysis
- Edge deployment workflow
- TensorRT deployment package
- Containerized perception stack
- Edge benchmarking report
- Embedded deployment guide
Deploy a robotics perception system on real edge hardware with validated real-time performance.
- Integrate transformer-based detector
- Accuracy
- FPS
- Latency
- Edge suitability
- CNN vs ViT perception comparison
- GitHub repository
- Demo video
- Benchmark report
- ROS2 modular perception stack
- CNN vs ViT comparison



