3D TRACKING SYSTEM
Multi-camera calibration and object tracking in 3D space.
1. The "Why": Beyond 2D
Single-camera computer vision is powerful, but it lacks depth. To truly understand an object's movement—whether for motion capture, robotics, or analysis—you need to know its position in 3D space (X, Y, Z).
This project aims to build a scalable system that can take feeds from multiple cameras, automatically calibrate them, and triangulate the precise 3D position of a target object in real-time.
2. The "How": System Architecture
Tech Stack
- Cameras: 3+ USB Webcams (Logitech C920) or Raspberry Pi Cameras
- Core Logic: Python 3.10 + NumPy for vector math
- Vision Library: OpenCV (cv2) for image processing
- Visualization: Plotly (Real-time 3D Scatter Plots)
- Networking: ZeroMQ for inter-process communication
Triangulation Pipeline
The core challenge is converting 2D pixel coordinates (u, v) from multiple cameras into a single 3D world coordinate (X, Y, Z).
Data Processing Pipeline
Calibration: The Foundation
Before tracking can begin, we must define the geometry of the setup.
Intrinsic Calibration (Per Camera)
Every lens distorts the world (e.g., fisheye effect). We capture 20+ images of a checkerboard to calculate the Camera Matrix (K) and Distortion Coefficients. This allows us to "undistort" the image, making lines straight again.
Extrinsic Calibration (Global)
We need to know where Camera 2 is relative to Camera 1. By placing a known reference object (checkerboard) visible to all cameras, we compute the Rotation (R) and Translation (T) vectors. This creates a unified "World Coordinate System."
Math: Direct Linear Transformation (DLT)
For each camera, we have a projection matrix P = K[R|T]. A 3D point X projects to a 2D pixel x via x = PX.
With multiple cameras, we stack these equations into
a system AX = 0. We solve this using
Singular Value Decomposition (SVD). The solution is
the eigenvector corresponding to the smallest
singular value, which gives us the estimated 3D
position minimizing reprojection error.