Skip to content

Module 4 — Advanced Perception and Prediction

Parent: Phase 5 — Autonomous Driving

Time: 6–12 months

Prerequisites: Modules 1–3, Phase 3 (Sensor Fusion — Kalman filtering, multi-sensor math).


Why this is advanced

Modules 1–3 teach you to understand and work with a single-camera, production-deployed ADAS (openpilot). This module goes beyond: multi-sensor perception, BEV architectures, trajectory prediction, HD maps, and simulation — the research and engineering frontier for L3+ autonomy.


1. Advanced Sensor Hardware

  • LiDAR:

    • Working principles: mechanical spinning (Velodyne), solid-state (Livox, Innoviz), FMCW (Aeva, Luminar).
    • Trade-offs: range, resolution, FoV, latency, cost.
    • Point cloud representations: raw points, voxels, range images, pillars.
  • Radar:

    • Automotive radar: 77 GHz FMCW, range-Doppler processing, angular resolution.
    • 4D imaging radar: elevation + azimuth + range + velocity.
    • Radar advantages: all-weather, direct velocity measurement, long range.
  • Camera systems:

    • Sensor types: CCD vs CMOS, global vs rolling shutter.
    • Optics: FoV, focal length, aperture, HDR techniques.
    • ISP pipelines for automotive (connection to Module 2 camerad).

2. Multi-Sensor Calibration

  • Intrinsic calibration:

    • Camera intrinsics (focal length, principal point, distortion) using checkerboard or ArUco patterns with OpenCV or Kalibr.
  • Extrinsic calibration:

    • Spatial transforms between sensors: camera-LiDAR, camera-radar, LiDAR-IMU.
    • Target-based (checkerboard, ArUco) and targetless methods (LI-Calib, ACSC).
  • Temporal calibration:

    • Aligning sensor timestamps across hardware clocks.
    • PPS signals, IEEE 1588 PTP, cross-correlation of observed events.
    • Impact of time misalignment on fusion accuracy.

Projects: * Build a calibration target and calibrate a camera-LiDAR pair using Kalibr or ACSC. Verify by projecting LiDAR points onto camera image. * Implement a radar-camera fusion pipeline: associate radar detections with camera bounding boxes for velocity-augmented detection.


3. BEV (Bird's-Eye View) Perception

  • BEV transformers:

    • BEVFormer: attention-based cross-view feature lifting from perspective to BEV.
    • BEVFusion: multi-modal (camera + LiDAR) fusion in BEV space.
    • Tesla's Occupancy Network: dense 3D voxel prediction from cameras.
  • 3D occupancy prediction:

    • Voxel-based occupancy (Occ3D, SurroundOcc) as alternative to explicit object detection.
    • Represent environment as dense 3D grid of occupied/free voxels.
  • Temporal fusion:

    • Incorporate temporal information using recurrent architectures or long-range temporal attention.
    • Track implicit object states across frames without explicit tracking.

Projects: * Train a BEVFusion-style model on nuScenes for camera+LiDAR 3D object detection. Compare with LiDAR-only baseline.


4. Trajectory Prediction

  • Multi-modal prediction:

    • TNT, MTR, Wayformer: predict multiple plausible future trajectories with probability scores.
    • Why multi-modal: a vehicle at an intersection might go straight, turn left, or turn right.
  • Interaction modeling:

    • Social force models, graph neural networks (GRIP, HYPER).
    • Transformer-based interaction encoders for agent-agent reasoning.
  • Map-conditioned prediction:

    • Incorporate lane geometry and traffic rules.
    • VectorNet, MapTR-style map encoding.
    • Constraining predictions to physically plausible, lane-following trajectories.

Projects: * Implement TNT or MTR on the Waymo Open Motion Dataset. Evaluate minADE/minFDE. Visualize multi-modal predictions.


5. HD Maps and Online Mapping

  • HD map components:

    • Layers: lane geometry, road markings, traffic signs, traffic lights, 3D landmarks.
    • Formats: OpenDRIVE, Lanelet2.
  • Online map building:

    • MapTR, BeMapNet: predict HD map from sensor data in real-time.
    • Reduces dependence on pre-built maps, supports uncharted areas.
  • Map-based localization:

    • Point cloud matching: NDT, ICP.
    • Camera-map matching for precise positioning within HD map.

Projects: * Deploy a MapTR-based online map system in CARLA. Compare online map quality against ground-truth HD map.


6. Sensor Simulation and Synthetic Data

  • Synthetic data generation:

    • CARLA, SUMO+CARLA, NVIDIA DRIVE Sim for photorealistic training data with ground-truth labels.
    • Labels: bounding boxes, semantic masks, depth, radar detections.
  • Sensor model simulation:

    • LiDAR ray casting, radar multipath, camera noise models.
    • Reducing sim-to-real gap in training data.
  • Scenario generation:

    • Corner cases: fog, rain, night, occlusions, near-miss events.
    • Rare or unsafe scenarios that cannot be collected in real driving.

Projects: * Generate a 10,000-frame synthetic dataset in CARLA with varying weather, lighting, and traffic density. Train a 3D detector on synthetic vs. real data and compare.


Resources

Resource Why
nuScenes Multi-modal AV dataset with 3D annotations and HD maps
Waymo Open Dataset Large-scale dataset with motion prediction benchmarks
BEVFormer / BEVFusion papers Foundational BEV perception architectures
Kalibr Multi-sensor calibration toolkit
CARLA Sensor simulation and synthetic data

Next

Module 5 — Safety Standards and Deployment — ISO 26262, SOTIF, V2X, HIL testing, and production deployment.