Phase 1: Camera Calibration and Real-Time Object Detection with Depth¶

Pipeline for the non-contact monitoring project using camera01 from the Free-Viewpoint RGB-D Video Dataset (SJTU).

What it does¶

Camera calibration: Loads intrinsics (and extrinsics) from Camera Parameters/paras.txt for camera01 (index 0).
Depth: Converts grayscale depth frames to metric depth (meters) using the dataset formula.
Detection: Runs face (Haar) and person (HOG) detection on the RGB stream via OpenCV (no extra model files).
Depth per ROI: For each detection bounding box, computes median depth and overlays it on the image.

Setup¶

cd "Phase 4 - Track B - Nvidia Jetson/5. Edge AI Optimization/non-contact-monitoring-edge"
pip install -r phase1/requirements.txt

Ensure the dataset is present:

Free-Viewpoint-RGB-D-Video-Dataset-main/camera01-rgb.mp4
Free-Viewpoint-RGB-D-Video-Dataset-main/camera01-depth.mp4
Free-Viewpoint-RGB-D-Video-Dataset-main/Camera Parameters/paras.txt

Run¶

From the non-contact-monitoring-edge directory:

python phase1/run_pipeline.py

Or from phase1:

cd phase1
python run_pipeline.py --dataset-dir ../Free-Viewpoint-RGB-D-Video-Dataset-main

Options¶

Option	Description
`--dataset-dir PATH`	Root of the dataset (default: `../Free-Viewpoint-RGB-D-Video-Dataset-main`)
`--camera-index N`	Camera index in paras.txt (0 = camera01)
`--no-person`	Only run face detection (faster)
`--no-display`	No GUI (e.g. headless); still processes and prints progress
`--out PATH`	Write output video (e.g. `phase1_out.mp4`)
`--max-frames N`	Process at most N frames (0 = all)

Module overview¶

Module	Role
`calibration.py`	Parse `paras.txt`, expose K, R, t; world-to-image projection.
`depth_utils.py`	Grayscale → metric depth; median depth in a bounding box.
`detection.py`	Face (Haar) and person (HOG) detectors; composite that avoids duplicate person/face boxes.
`run_pipeline.py`	Main: load calibration, read RGB + depth, detect, compute depth per box, visualize.

Citation¶

When using the dataset, cite: Guo S, Zhou K, Hu J, et al. A new free viewpoint video dataset and DIBR benchmark. Proceedings of the 13th ACM Multimedia Systems Conference. 2022: 265–271.