3-D Human Body Tracking (1996)
        Home Gavrila
        Research
        Pedestrians
        Chamfer System
        Facial Features
        3D Head Model
        3D Body Tracking
         Seminars
        Oratio

As part of my Ph.D. Thesis, I developed in 1996 the first system for the vision-based 3-D tracking of unconstrained human movement. Using multiple cameras, placed in the corner of a room, the system recovered 3-D body pose without requiring the human to wear special markers, as was (and still is) the norm in motion capture.

The system required an initialization phase where the four cameras were calibrated for their internal camera parameters (e.g. focal length) and their geometrical configuration. In this phase, a customized 3-D human model was derived from frontal and sideway views of a person.

dariu_ellen_small

On the left, a 3-D rendering is shown of the models that were derived from me and a (good) friend. The models had 22-degrees of freedom: six for torso and four for each arm and leg. Each body-segment was represented by a tapered superquadric.

The 3-D models ELLEN and DARIU say “hi!”

Of course these models are not the highly accurate body models used in computer animation, containing tens of thousands of polygons (e.g. obtained from a Cyberware scanner). But then again, the task here is not animation, rendering 3D onto 2D images, but the more difficult opposite task, model pose recovery, estimating 3-D pose from 2-D images. Here, only a 3-D model is needed which is accurately enough to support vision-based tracking, preferably described by a few parameters.

Once the system has acquired personalized 3-D models for the human body, tracking can begin. First, at each time instant, the incoming multi-view images are preprocessed with an edge detector

comp_orig

All background edges are subsequently filtered out; background edges are all edges which remained stationary over prolonged time periods. What remains are the edges corresponding to the human figure.

comp_filter_edge

Pose-recovery is now formulated as a search problem in the pose parameters space of the 3-D model. We seek to align the 3-D human model such, that its projected contours matches the actual edges of the scene image, in all four views. Once a good fit is found in all the camera views, we assume to have found the correct 3-D positioning of the body.

Edge
chamfer

The similarity measure between model edges (shown grey in left image) and scene edges (shown black in left image) is based on the so-called distance transform (see right image: the distance image of the edges on the left). It basically computes the average distance between the model edges and the closest scene edges. For details on how this works, see here.

A standard non-greedy algorithm is used to search through possible body configurations using the above similarity measure. Because the search space corresponding to 22 degrees of freedom is unfeasibly large, the search space is decomposed.

Unit_4_small
Unit_2_small
Unit_0_small
Unit_1_small
Unit_5_small
Unit_3_small

Searching all pose parameters simultaneously would be a bad idea!

Stage 1:
searching for  the torso position

Stage 2:
searching for  the arm positions

Stage 3:
searching for  the leg positions

Thus, first the torso is aligned, then the arms and finally the feet. Once body pose has been determined, the system performs a prediction, based on the current pose and that of the last few frames, of what the body pose will be in the next image frame. That represents the starting point for the search at the next time iteration.

In summary, the system performs the following steps while in tracking mode: it preprocess the image, aligns the 3-D models so that their projections match the preprocessed image, and predicts body pose. In order to initialize tracking, the system assumes an upright and unoccluded pose of the body, and determines the 3-D torso axis by a triangulation process of the 2-D torso axes, which are visible in the various views (see dark lines in Figure below).

comp_pca

Once the 3-D torso axis has been determined, the other body pose parameters are found by a similar search process as during tracking.

So how good does this work? Take a look at the video clips below ....

Multi-view 3-D Tracking of Humans (1996)

waving example 1

videoicon1

0.50 Mb mpeg

waving example 2

videoicon1

0.45 Mb mpeg

walking example

videoicon1

0.25 Mb mpeg

[Home Gavrila] [Resume] [Research] [People] [Publications] [Datasets] [Media Coverage] [Open Positions] [Search]

Copyright © 2001-2013 Gavrila. All rights reserved. Disclaimer.