COMPUTER VISION & MATH

3D TRACKING SYSTEM

Multi-camera calibration and object tracking in 3D space.

1. The "Why": Beyond 2D

Single-camera computer vision is powerful, but it lacks depth. To truly understand an object's movement—whether for motion capture, robotics, or analysis—you need to know its position in 3D space (X, Y, Z).

This project aims to build a scalable system that can take feeds from multiple cameras, automatically calibrate them, and triangulate the precise 3D position of a target object in real-time.

2. The "How": System Architecture

Tech Stack

  • Cameras: 3+ USB Webcams (Logitech C920) or Raspberry Pi Cameras
  • Core Logic: Python 3.10 + NumPy for vector math
  • Vision Library: OpenCV (cv2) for image processing
  • Visualization: Plotly (Real-time 3D Scatter Plots)
  • Networking: ZeroMQ for inter-process communication

Triangulation Pipeline

The core challenge is converting 2D pixel coordinates (u, v) from multiple cameras into a single 3D world coordinate (X, Y, Z).

CAMERA 1 View A (u1, v1)
CAMERA 2 View B (u2, v2)
CAMERA 3 View C (u3, v3)
↓ Projection Matrices (P1, P2, P3)
LINEAR TRIANGULATION Solve AX = 0 (SVD)
3D COORDINATE P(X, Y, Z)

Data Processing Pipeline

RAW FEED 1080p @ 60fps
UNDISTORT Lens Correction
TRACKING Color Masking
KALMAN FILTER Smooth Trajectory

Calibration: The Foundation

Before tracking can begin, we must define the geometry of the setup.

Intrinsic Calibration (Per Camera)

Every lens distorts the world (e.g., fisheye effect). We capture 20+ images of a checkerboard to calculate the Camera Matrix (K) and Distortion Coefficients. This allows us to "undistort" the image, making lines straight again.

Extrinsic Calibration (Global)

We need to know where Camera 2 is relative to Camera 1. By placing a known reference object (checkerboard) visible to all cameras, we compute the Rotation (R) and Translation (T) vectors. This creates a unified "World Coordinate System."

Math: Direct Linear Transformation (DLT)

Undistort Points Construct Matrix A SVD Decomposition Normalize Vector 3D Point

For each camera, we have a projection matrix P = K[R|T]. A 3D point X projects to a 2D pixel x via x = PX.

With multiple cameras, we stack these equations into a system AX = 0. We solve this using Singular Value Decomposition (SVD). The solution is the eigenvector corresponding to the smallest singular value, which gives us the estimated 3D position minimizing reprojection error.