Module 4 — Advanced Perception and Prediction¶
Parent: Phase 5 — Autonomous Driving
Time: 6–12 months
Prerequisites: Modules 1–3, Phase 3 (Sensor Fusion — Kalman filtering, multi-sensor math).
Why this is advanced¶
Modules 1–3 teach you to understand and work with a single-camera, production-deployed ADAS (openpilot). This module goes beyond: multi-sensor perception, BEV architectures, trajectory prediction, HD maps, and simulation — the research and engineering frontier for L3+ autonomy.
1. Advanced Sensor Hardware¶
-
LiDAR:
- Working principles: mechanical spinning (Velodyne), solid-state (Livox, Innoviz), FMCW (Aeva, Luminar).
- Trade-offs: range, resolution, FoV, latency, cost.
- Point cloud representations: raw points, voxels, range images, pillars.
-
Radar:
- Automotive radar: 77 GHz FMCW, range-Doppler processing, angular resolution.
- 4D imaging radar: elevation + azimuth + range + velocity.
- Radar advantages: all-weather, direct velocity measurement, long range.
-
Camera systems:
- Sensor types: CCD vs CMOS, global vs rolling shutter.
- Optics: FoV, focal length, aperture, HDR techniques.
- ISP pipelines for automotive (connection to Module 2 camerad).
2. Multi-Sensor Calibration¶
-
Intrinsic calibration:
- Camera intrinsics (focal length, principal point, distortion) using checkerboard or ArUco patterns with OpenCV or Kalibr.
-
Extrinsic calibration:
- Spatial transforms between sensors: camera-LiDAR, camera-radar, LiDAR-IMU.
- Target-based (checkerboard, ArUco) and targetless methods (LI-Calib, ACSC).
-
Temporal calibration:
- Aligning sensor timestamps across hardware clocks.
- PPS signals, IEEE 1588 PTP, cross-correlation of observed events.
- Impact of time misalignment on fusion accuracy.
Projects: * Build a calibration target and calibrate a camera-LiDAR pair using Kalibr or ACSC. Verify by projecting LiDAR points onto camera image. * Implement a radar-camera fusion pipeline: associate radar detections with camera bounding boxes for velocity-augmented detection.
3. BEV (Bird's-Eye View) Perception¶
-
BEV transformers:
- BEVFormer: attention-based cross-view feature lifting from perspective to BEV.
- BEVFusion: multi-modal (camera + LiDAR) fusion in BEV space.
- Tesla's Occupancy Network: dense 3D voxel prediction from cameras.
-
3D occupancy prediction:
- Voxel-based occupancy (Occ3D, SurroundOcc) as alternative to explicit object detection.
- Represent environment as dense 3D grid of occupied/free voxels.
-
Temporal fusion:
- Incorporate temporal information using recurrent architectures or long-range temporal attention.
- Track implicit object states across frames without explicit tracking.
Projects: * Train a BEVFusion-style model on nuScenes for camera+LiDAR 3D object detection. Compare with LiDAR-only baseline.
4. Trajectory Prediction¶
-
Multi-modal prediction:
- TNT, MTR, Wayformer: predict multiple plausible future trajectories with probability scores.
- Why multi-modal: a vehicle at an intersection might go straight, turn left, or turn right.
-
Interaction modeling:
- Social force models, graph neural networks (GRIP, HYPER).
- Transformer-based interaction encoders for agent-agent reasoning.
-
Map-conditioned prediction:
- Incorporate lane geometry and traffic rules.
- VectorNet, MapTR-style map encoding.
- Constraining predictions to physically plausible, lane-following trajectories.
Projects: * Implement TNT or MTR on the Waymo Open Motion Dataset. Evaluate minADE/minFDE. Visualize multi-modal predictions.
5. HD Maps and Online Mapping¶
-
HD map components:
- Layers: lane geometry, road markings, traffic signs, traffic lights, 3D landmarks.
- Formats: OpenDRIVE, Lanelet2.
-
Online map building:
- MapTR, BeMapNet: predict HD map from sensor data in real-time.
- Reduces dependence on pre-built maps, supports uncharted areas.
-
Map-based localization:
- Point cloud matching: NDT, ICP.
- Camera-map matching for precise positioning within HD map.
Projects: * Deploy a MapTR-based online map system in CARLA. Compare online map quality against ground-truth HD map.
6. Sensor Simulation and Synthetic Data¶
-
Synthetic data generation:
- CARLA, SUMO+CARLA, NVIDIA DRIVE Sim for photorealistic training data with ground-truth labels.
- Labels: bounding boxes, semantic masks, depth, radar detections.
-
Sensor model simulation:
- LiDAR ray casting, radar multipath, camera noise models.
- Reducing sim-to-real gap in training data.
-
Scenario generation:
- Corner cases: fog, rain, night, occlusions, near-miss events.
- Rare or unsafe scenarios that cannot be collected in real driving.
Projects: * Generate a 10,000-frame synthetic dataset in CARLA with varying weather, lighting, and traffic density. Train a 3D detector on synthetic vs. real data and compare.
Resources¶
| Resource | Why |
|---|---|
| nuScenes | Multi-modal AV dataset with 3D annotations and HD maps |
| Waymo Open Dataset | Large-scale dataset with motion prediction benchmarks |
| BEVFormer / BEVFusion papers | Foundational BEV perception architectures |
| Kalibr | Multi-sensor calibration toolkit |
| CARLA | Sensor simulation and synthetic data |
Next¶
→ Module 5 — Safety Standards and Deployment — ISO 26262, SOTIF, V2X, HIL testing, and production deployment.