Abstracts
        Home Gavrila
        Publications
        Abstracts


 
Real-time Vision for Intelligent Vehicles

D. M. Gavrila, U. Franke, S. Görzig and C. Wöhler

Abstract

Driver assistance systems play an increasingly important role in modern vehicles. This paper discusses on-board vision systems which, by evaluating the surroundings of a vehicle and taking appropriate action, can improve the safety, convenience and efficiency of driving. The focus lies on the important ability to recognize objects, such as elements of the traffic infrastructure and traffic participants. We present various recognition results (traffic lights, traffic signs and pedestrians) and provide a framework for the integration of sensor and control modules in a scalable multi-agent system. Finally, we introduce our demonstrator vehicle UTA.


pedestrian data sets

Daimler
Pedestrian
Detection
Benchmark


The Visual Analysis of Human Movement: A Survey

D. M. Gavrila

Abstract

The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially important applications, ``Looking at People'' is currently one of the most active application domains in computer vision. This survey identifies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, systems are reviewed. We conclude with some thoughts about future directions.

Full Article


Autonomous Driving approaches Downtown

U. Franke, D. Gavrila, S. Görzig, F. Lindner, F. Paetzold, C. Wöhler

Abstract

Most computer vision systems for vehicle guidance developed in the past were designed for the comparatively simple highway scenario. Autonomous driving in the much more complex scenario of urban traffic or driver assistance systems like Intelligent Stop&Go are new challenges not only from the algorithmic but also from the system architecture point of view. This contribution describes our current work on these topics. It includes the appropriate algorithms as well as approaches to control the various vision modules.

Full Article


3D Object Recognition from 2D Images
using Geometric Hashing

D. M. Gavrila and F. C. A Groen

Abstract

In this paper, a general technique for model-based recognition is discussed, called Geometric Hashing. Its purpose is to identify an object in the scene, together with its position and orientation. This technique is based on an intensive preprocessing stage, done off-line, where transformation invariant features of the models are indexed into a hash table. This makes the actual recognition particularly efficient. The algorithm stands out for its high inherent parallelism and its ability to deal with occluded scenes. This paper focuses on the use of Geometric Hashing for the case of 3D object recognition from 2D images. An efficient method to represent a 3D model by its 2D projections is proposed. Results are presented of experiments on random data and 3D objects. It has been found that distinguishing between different types of features in a model or scene results in a very efficient implementation of Geometric Hashing using a multi- dimensional hash table. The filtering ratio of this scheme turns out to be high enough to allow reliable recognition with the correct feature correspondence between model and scene. The algorithm performed successfully in dealing with scenes with up to 50% of occlusion and performed at speeds in the order of one second on a SPARC station.

Full Article


R-tree Index Optimization

D. M. Gavrila

Abstract

The optimization of spatial indexing is an increasingly important issue considering the fact that spatial databases, in such diverse areas as geographical, CAD/CAM and image applications, are growing rapidly in size and often contain on the order of millions of items or more. This necessitates the storage of the index on disk, which has the potential of slowing down the access time significantly. In this paper, we discuss ways of minimizing the disk access frequency by grouping together data items which are close to one another in the spatial domain ("packing"). The data structure which we seek to optimize here is the R-tree for a given set of data objects.

Existing methods of building an R-tree index based on space-filling curves (Peano, Hilbert) are computationally cheap, but they do not preserve spatial locality well, in particular when dealing with higher dimensional data of non-zero extent. On the other hand, existing methods of packing based on all dimensions of the data, such as the several proposed dynamic R-tree insertion algorithms, do not take advantage of the fact that all the data objects are known beforehand. Furthermore, they are essentially serial in nature.

In this paper, we regard packing as an optimization problem and propose an iterative method of finding a close-to-optimal solution to the packing of a given set of spatial objects in D dimensions. The method achieves a high degree of parallelism by constructing the R-tree bottom up. In experiments on data of various dimensions and distributions, we have found that the proposed method can significantly improve on the packing performance of the R* insertion algorithm and the Hilbert curve. It is shown that the improvements increase with the skewness of the data and, in some cases, can even amount to an order of magnitude in terms of decreased response time.

Full Article


Vision-based 3-D Tracking of Humans in Action

D. M. Gavrila

Abstract

The ability to recognize humans and their activities by vision is essential for future machines to interact intelligently and effortlessly with a human-inhabited environment. Some of the more promising applications are discussed. A prototype vision system is presented for the tracking of whole-body movement using multiple cameras. 3-D body pose is recovered at each time instant based on occluding contours. The pose-recovery problem is formulated as a search problem and entails finding the pose parameters of a graphical human model whose synthesized appearance is most similar to the actual appearance of the real human in the multi-view images. Hermite deformable contours are proposed as a tool for the 2-D contour tracking problem. The main contribution of this dissertation is that it demonstrates for the first time a set of techniques that allow accurate vision-based 3-D tracking of arbitrary whole-body movement without the use of markers.

Full Article


Vision-based Pedestrian Detection: the PROTECTOR System

D. M. Gavrila, J. Giebel and S. Munder

Abstract

This paper presents the results of the first large-scale field tests on vision-based pedestrian protection from a moving vehicle. Our PROTECTOR system combines pedestrian detection, trajectory estimation, risk assessment and driver warning.

The paper pursues a “system approach” related to the detection component. An optimization scheme models the system as a succession of individual modules and finds a good overall parameter setting by combining individual ROCs using a convex-hull technique. On the experimental side, we present a methodology for the validation of the pedestrian detection performance in an actual vehicle setting. We hope this test methodology to contribute towards the establishment of benchmark testing, enabling this application to mature. We validate the PROTECTOR system using the proposed methodology and present interesting quantitative results based on tens of thousands of images from hours of driving. Although results are promising, more research is needed before such systems can be placed at the hands of ordinary vehicle drivers.



Approach for Protection of Vulnerable Road Users using Sensor Fusion Techniques

M.-M. Meinecke, M. Obojski, M. Töns, R. Dörfler, P. Marchal, L. Letellier, D. M. Gavrila and R. Morris

Abstract

The evolution of automotive radar systems has started with Adaptive Cruise Control (ACC) application some year ago. First commercial products are available on the market today. Newly the range of applications increases rapidly. The automotive industry and sensor suppliers are developing advanced systems for short, mid, and long range applications. Precrash systems are operating in the short and mid range area with the hardest requirements for sensors and control units. For example, high dynamics of obstacles have to be measured and tracked exactly. The false alarm rate must fulfill very strong limitations to be able to deploy safety systems (like seat-belt pretensioners, or other reversible systems) without false positive. In addition to this, the decision algorithms for safety systems deployment need high accuracies and high measurement rates in high-dynamic street situations.

This paper deals with a special variant of precrash systems, namely precrash for pedestrian protection. The goal is to reduce the number of fatalities in collisions vehicle vs. pedestrian. Specific protection systems like active braking or seat-belt pre-tensioners are currently under investigation. To trigger these protection systems a high performance sensor platform is necessary. The object class information (type of object) ”pedestrian” or ”non-pedestrian” will be provided by a video image processor.

In this paper, approaches from the EC-funded project SAVE-U (5th frame program of the European Commission) are presented. The sensor platform consists of radar, cameras in the visible and infrared domain. The focus will be located on high-level- and low-level-data-fusion architectures to fulfill the strong requirements.

Full Article


FPGA-based Template Matching using Distance Transforms

S. Hezel, D. M. Gavrila, A. Kugel and R. Männer

Abstract

This paper presents a high-performance FPGA solution to generic shape-based object detection in images. The underlying detection method involves representing the target object by binary templates containing positional and directional edge information. A particular scene image is preprocessed by edge segmentation, edge cleaning and distance transforms. Matching involves correlating the templates with the distance-transformed scene image and determining the locations where the mismatch is below a certain user-defined threshold. Although successful in the past, a significant drawback of these matching methods has been their large computational cost when implemented on a sequential general-purpose processor.

Full Article


SAVE-U: An innovative sensor platform for Vulnerable Road User protection

P. Marchal, D. M. Gavrila, L. Letellier, M.-M. Meinecke, R. Morris and M. Töns

Abstract

Among other initiatives to improve safety of Vulnerable Road Users (VRUs), the European Commission is funding a research project called SAVE-U (IST-2001-34040): ”Sensors and system architecture for VulnerablE road Users protection” aimed at developing an integrated safety concept for pedestrians and cyclists. SAVE-U started in March 2002 and will last 3 years. This paper provides an overview of the results of work performed along the first two years of the project.

Full Article


A Visual Quality Inspection System Based on a Hierarchical 3D Pose Estimation Algorithm

 C. von Bank, D. M. Gavrila and C. Wöhler

Abstract

This paper presents a quality inspection system based on an efficient model and view based algorithm for locating objects in images and estimating their pose. Off-line, edge templates are generated from a 3D model. On-line a hierarchical edge template matching technique generates matching solutions from which the pose of the object is derived. This approach tackles the difficult typical tradeoff between tessellation size and efficiency. The proposed method works for arbitrary shaped 3D objects. The accuracy of pose estimation exceeds that of state-of-the-art algorithms even if the objects are viewed on a cluttered background. Since no high-level feature extraction is required, the algorithm is robust against changing ambient conditions such as illumination. The inspection system is successfully tested on two real-world inspection scenarios in the engine production.

Full Article


A Bayesian Framework for Multi-Cue 3D Object Tracking

J. Giebel, D. M. Gavrila and C. Schnörr

Abstract

This paper presents a Bayesian framework for multi-cue 3D object tracking of deformable objects. The proposed spatio-temporal object representation involves a set of distinct linear subspace models or Dynamic Point Distribution Models (DPDMs), which can deal with both continuous and discontinuous appearance changes; the representation is learned fully automatically from training data. The representation is enriched with texture information by means of intensity histograms, which are compared using the Bhattacharyya coefficient. Direct 3D measurement is furthermore provided by a stereo system. State propagation is achieved by a particle filter which combines the three cues shape, texture and depth, in its observation density function. The tracking framework integrates an independently operating object detection system by means of importance sampling.We illustrate the benefit of our integrated multi-cue tracking approach on pedestrian tracking from a moving vehicle.

 

Full Article


From Door to Door - Principles and Applications of Computer Vision for Driver Assistant Systems

Uwe Franke, Dariu Gavrila, Axel Gern, Steffen Görzig, Reinhard Janssen, Frank Paetzold and Christian Wöhler

Abstract

Modern cars will not only recover information about their internal driving state (e.g. speed, location) but will also extract information from their surroundings. Radar-based advanced cruise control has been commercialized by DaimlerChrysler (DC) in 1999 in their premium class vehicles. A vision-based Lane Departure Warning system for heavy trucks will be introduced by DC in 2000.

This will be the beginning for a variety of vision systems for driver information, warning and active assistance. We are convinced that future cars will have their own eyes, since no other sensor can deliver comparable rich information about the car’s local environment. Rapidly falling costs for the sensors and processors combined with increasing image resolution provide the basis for a continuos growth of the vehicle’s intelligence. Two cameras will look in front of the car, in stereo. They can be accompanied by other cameras looking backwards and to the side of the vehicle.

In this chapter, we describe the achievements in vision-based driver assistance at DaimlerChrysler. We present systems that have been developed for highways as well as for urban traffic and describe principles that have proven robustness and efficiency for image understanding in traffic scenes.

 


Full Article


Virtual Sample Generation
for Template-based Shape Matching

D. M. Gavrila and J. Giebel

Abstract

This paper presents a method for improving the performance of matching systems that correlate using shape templates. The basic idea involves extending an existing set of training shapes with generated "virtual" shapes, in order to improve representational capability. Yet no a-priori feature correspondence is necessary among the original shapes in the training set. Instead, an integrated clustering and registration approach partitions the original shape samples into clusters of similar and registered shapes; in each cluster a separate feature space is embedded. This allows for each cluster the derivation of standard compact parameterizations. This paper demonstrates that sampling these low-order spaces can produce an extended training set which facilitates a superior matching performance, as measured by a ROC curve. In the experiments, we consider a realistic application involving thousands of pedestrian shapes and perform correlation matching based on distance transforms.

Full Article


A multi-sensor approach for the protection of vulnerable traffic participants - the PROTECTOR project

D. M. Gavrila, M. Kunert and U. Lages

Abstract

This paper describes ongoing work in the E.U. project PROTECTOR (Preventive Safety for Unprotected Road User) on the important problem of detecting vulnerable traffic participants from a moving vehicle. We focus on the three sensor technologies that are pursued: laser scanner, radar and video. We discuss their characteristics and present the relevant pattern recognition techniques. First prototype systems have been integrated in our demonstrator vehicles.

Full Article


Pedestrian Detection from a Moving Vehicle

D. M. Gavrila

Abstract

This paper presents a prototype system for pedestrian detection on-board a moving vehicle. The system uses a generic two-step approach for efficient object detection. In the first step, contour features are used in a hierarchical template matching approach to efficiently "lock" onto candidate solutions. Shape matching is based on Distance Transforms. By capturing the objects shape variability by means of a template hierarchy and using a combined coarse-to-fine approach in shape and parameter space, this method achieves very large speed-ups compared to a brute-force method. We have measured gains of several orders of magnitude. The second step utilizes the richer set of intensity features in a pattern classification approach to verify the candidate solutions (i.e. using Radial Basis Functions). We present experimental results on pedestrian detection off-line and on-board our Urban Traffic Assistant vehicle and discuss the challenges that lie ahead.

Full Article


Real-time Object Detection for "Smart" Vehicles

D. M. Gavrila and V. Philomin

Abstract

This paper presents an efficient shape-based object detection method based on Distance Transforms and describes its use for real-time vision on-board vehicles. The method uses a template hierarchy to capture the variety of object shapes; efficient hierarchies can be generated off-line for given shape distributions using stochastic optimization techniques (i.e. simulated annealing). Online, matching involves a simultaneous coarse-to-fine approach over the shape hierarchy and over the transformation parameters. Very large speedup factors are typically obtained when comparing this approach with the equivalent brute-force formulation; we have measured gains of several orders of magnitudes.

We present experimental results on the real-time detection of traffic signs and pedestrians from a moving vehicle. Because of the highly time sensitive nature of these vision tasks, we also discuss some hardware-specific implementations of the proposed method as far as SIMD parallelism is concerned.

Full Article


Multi-feature Hierarchical Template Matching Using Distance Transforms

D. M. Gavrila

Abstract

We describe a multi-feature hierarchical algorithm to efficiently match N objects (templates) with an image using distance transforms (DTs). The matching is under translation, but it can cover more general transformations by generating the various transformed templates explicitly. The novel part of the algorithm is that, in addition to a coarse-to-fine search over the translation parameters, the N templates are grouped off-line into a template hierarchy based on their similarity. This way, multiple templates can be matched simultaneously at the coarse levels of the search, resulting in various speed-up factors. Furthermore, in matching, features are distinguished by type and separate DT's are computed for each type (e.g. based on edge orientations). These concepts are illustrated in the application of traffic sign detection.

Full Article


3-D Model-based Tracking of Humans in Action:
a Multi-View Approach

D. M. Gavrila and L.S. Davis

Abstract

We present a vision system for the 3-D model-based tracking of unconstrained human movement. Using image sequences acquired simultaneously from multiple views, we recover the 3-D body pose at each time instant without the use of markers. The pose-recovery problem is formulated as a search problem and entails finding the pose parameters of a graphical human model whose synthesized appearance is most similar to the actual appearance of the real human in the multi-view images. The models used for this purpose are acquired from the images. We use a decomposition approach and a best-first technique to search through the high dimensional pose parameter space. A robust variant of chamfer matching is used as a fast similarity measure between synthesized and real edge images.

We present initial tracking results from a large new Humans-In- Action (HIA) database containing more than 2500 frames in each of four orthogonal views. They contain subjects involved in a variety of activities, of various degrees of complexity, ranging from the more simple one-person hand waving to the challenging two-person close interaction in the Argentine Tango. 

Full Article


Hermite Deformable Contours

D. M. Gavrila

Abstract

We propose the Hermite representation for deformable contour finding. This representation compares favorably in terms of versatility and controllability with other local contour representations that have been used previously for this purpose. The Hermite representation allows a compact representation of curved shapes, without the smoothing out of corners. It is also well suited for both interactive and tracking applications.

The Hermite representation is used to formulate the contour finding problem as an optimization problem using a maximum a posterior energy criterion. Optimization is performed by dynamic programming. Our approach to contour tracking decouples the effects of transformation and deformation, using a template matching strategy to robustly account for the transformation effect. We demonstrate these ideas on a variety of images from different domains.

Full Article


Learning shape models from examples

D. M. Gavrila, J. Giebel and H. Neumann

Abstract

This paper addresses the problem of learning shape models from examples. The contributions are twofold. First, a comparative study is performed of various methods for establishing shape corres- pondence, based on shape decomposition, feature selection and alignment. Various registration methods using polygonal and Fourier features are extended to deal with shapes at multiple scales and the importance of doing so is illustrated. Second, we consider an appearance-based modeling technique which represents a shape distribution in terms of clusters containing similar shapes; each cluster is associated with a separate feature space. This representation is obtained by applying a novel simultaneous shape registration and clustering procedure on a set of training shapes. We illustrate the various techniques on pedestrian and plane shapes.

Full Article


The analysis of human motion and its application for visual surveillance

D: M. Gavrila

Abstract

``Looking at People'' is currently one of the most active application area in computer vision. This contribution provides a short overview of existing work on human motion as far as whole-body motion and gestures are concerned. The overview is based on a more extensive survey article [Gavrila 1999]; here, the emphasis lies on surveillance scenarios.  

Full Article


Traffic Sign Recognition Revisited

D. M. Gavrila

Abstract

The first part of this paper provides an overview of previous work on traffic sign recognition. Various components are discussed, such as detection, classification and temporal integration. The second part of this paper covers a recently developed shape-based system, based on distance transforms. This system has been quite successful in detecting and recognizing traffic signs in real-time; we report single-image recognition rates of above 90% in preliminary experiments both off-line as on-board our demo vehicle.       

Full Article


Fast Correlation Matching in Large (Edge) Image Databases

D.M. Gavrila. Larry S. Davis.

Abstract

Correlation-based matching methods are known to be very expensive when used on large image databases. In this paper, we will examine ways of speeding up correlation matching by phase coded filtering. Phase coded filtering is a technique to combine multiple patterns in one filter by assigning complex weights of unit magnitude to the individual patterns and summing them up in a composite filter. Several of the proposed composite filters are based on this idea, such as the Circular Harmonic Component (CHC) filters and the Linear Phase Coefficient Composite (LPCC) filters. We will consider the LPCC(1) filter in isolation and examine ways to improve its performance by assigning the complex weights to the individual patterns in a non-random manner so as to maximize the SNR of the filter w.r.t. the individual patterns.

In experiments on a database of 100 to 1000 edge images from the aerial-domain we examine the tradeoff between the speed-up (the number of patterns combined in a filter) and unreliability (the number of resulting false matches) of the composite filter. Results indicate that for binary patterns with point densities of about 0.05 we can safely combine more than 20 patterns in the optimized LPCC(1) filter, which represents a speed-up of an order of a magnitude over the brute force approach of matching the individual patterns.

Full Article


Real-Time Dense Stereo for Intelligent Vehicles

W. van der Mark and D.M. Gavrila

Abstract

Stereo vision is an attractive passive sensing technique for obtaining 3D measurements. Recent hardware advances have given rise to a new class of real-time dense disparity estimation algorithms. This paper examines their suitability for intelligent vehicle (IV) applications. In order to gain a better understanding of the performance and computational cost tradeoff, we created our own framework of real-time implementations. This consists of different methodical components based on Single Instruction Multiple Data (SIMD) techniques.

We furthermore compare the resulting algorithmic variations with other publicly available algorithms. We argue that existing, publicly available stereo data sets are not very suitable for the IV domain. Therefore, our evaluation of stereo algorithms is based on novel, realistically looking simulated data as well as real data from complex urban traffic scenes. In order to facilitate future benchmarks, all data used in this paper is made publicly available.

Our results reveal that there is a considerable influence of scene conditions on the performance of all tested algorithms. Approaches that aim for (global) search optimization are more affected by this than other approaches. The best overall performance is achieved by our multiple window algorithm which uses local matching and a left-right check for robust error rejection. Timing results show that the simplest of our SIMD variants are more than twice as fast than the most complex one. Nevertheless, the latter still achieve real-time processing speeds while their average accuracy is at least equal to that of publicly available non-SIMD algorithms.

 

Full Article


A Multiple Detector Approach to Low-Resolution FIR Pedestrian Recognition

M. Mählisch, M. Oberländer, O. Löhlein, D. Gavrila and W. Ritter

Abstract

In this paper we present a recognition scheme which is both reliable and fast. The scheme comprises the simultaneous harmonized use of three powerful detection algorithms, the hyper permutation network (HPN), a hierarchical contour matching (HCM) algorithm and a cascaded classifier approach. Each algorithm is evaluated separately and afterwards, based on the evaluation results, the fusion of the detection results is performed by a particle filter approach.

.

Full Article


An Experimental Study on Pedestrian Classification

S. Munder and D. M. Gavrila

Abstract

Detecting people in images is key for several important application domains in computer vision. This paper presents an in-depth experimental study on pedestrian classification; multiple feature-classifier combinations are examined with respect to their ROC performance and efficiency. We investigate global vs. local and adaptive vs. non-adaptive features, as exemplified by PCA coefficients, Haar wavelets and local receptive fields (LRFs). In terms of classifiers, we consider the popular Support Vector Machines (SVMs), feed-foward neural networks and K-nearest neighbor classifier.

Experiments are performed on a large data set consisting of 4000 pedestrian and over 25,000 non-pedestrian (labeled) images captured in outdoor urban environment. Statistically meaningful results are obtained by analyzing performance variances caused by varying training and test sets. We furthermore investigate how classification performance and training sample size are correlated. Sample size is adjusted by increasing the number of manually labeled training data, or by employing automatic bootstrapping or cascade techniques.

Our experiments show that the novel combination of SVMs with LRF features performs best. A boosted cascade of Haar wavelets can however reach quite competitive results, at a fraction of computational cost. The data set used in this paper is made public, establishing a benchmark for this important problem.

Full Article


Multi-cue Pedestrian Detection and Tracking
from a Moving Vehicle

D. M. Gavrila and S. Munder

Abstract This paper presents a multi-cue vision system for the real-time detection and tracking of pedestrians from a moving vehicle. The detection component involves a cascade of modules, each utilizing complementary visual criteria to successively narrow down the image search space, balancing robustness and efficiency considerations. Novel is the tight integration of the consecutive modules: (sparse) stereo-based ROI generation, shape-based detection, texture-based classification and (dense) stereo-based verification. For example, shape-based detection activates a weighted combination of texture-based classifiers, each attuned to a particular body pose.

Performance of individual modules and their interaction is analyzed by means of Receiver Operator Characteristics (ROCs). A sequential optimization technique allows the successive combination of individual ROCs, providing optimized system parameter settings in a systematic fashion, avoiding ad-hoc parameter tuning. Application-dependent processing constraints can be incorporated in the optimization procedure. Results from extensive field tests in difficult urban traffic conditions suggest system performance is at the leading edge.

Full Article


A Bayesian, Exemplar-based Approach to Hierarchical Shape Matching

D. M. Gavrila

Abstract This paper presents a novel probabilistic approach to hierarchical, exemplar-based shape matching.No feature correspondence is needed among exemplars, just a suitable pairwise similarity measure. The approach uses a template tree to efficiently represent and match the variety of shape exemplars. The tree is generated off-line by a bottom-up clustering approach using stochastic optimization. Online matching involves a simultaneous coarse-to-fine approach over the template tree and over the transformation parameters. The main contribution of this paper is a Bayesian model to estimate the a-posteriori probability of the object class, after a certain match at a node of the tree. This model takes into account object scale and saliency, and allows for a principled setting of the matching thresholds such that unpromising paths in the tree traversal process are eliminated early on.

The proposed approach was tested in a variety of application domains. Here, results are presented on one of the more challenging domains: real-time pedestrian detection from a moving vehicle. A significant speed-up is obtained when comparing the proposed probabilistic matching approach with a manually tuned non-probabilistic variant, both utilizing the same template tree structure.

Full Article


CASSANDRA: Audio-Video Sensor Fusion for Aggression Detection

W. Zajdel, D. Krijnders, T. Andringa and D.M. Gavrila

Abstract This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishingaspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: scream, passing train or articulation energy. At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene.

Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.intermediate-level events or features that summarize activities in the scene.

Full Article


Pedestrian Detection and Tracking using a Mixture of View-Based Shape-Texture Models

S. Munder, C. Schnörr and D.M. Gavrila

Abstract This paper presents a robust multi-cue approach to the integrated detection and tracking of pedestrians in cluttered urban environment. A novel spatio-temporal object representation is proposed that combines a generative shape model and a discriminative texture classifier, both composed of a mixture of pose-specific submodels. Shape is represented by a set of linear subspace models, an extension of Point Distribution Models, with shape transitions modeled by a first-order Markov process. Texture, i.e. the shape-normalized intensity pattern, is represented by a manifold implicitly delimited by a set of pattern classifiers, while texture transition is modeled by a random walk. Direct 3D measurements provided by a stereo system are furthermore incorporated into the observation density function. We employ a Bayesian framework based on particle filtering to achieve integrated object detection and tracking. Large-scale experiments involving pedestrian detection and tracking from a moving vehicle demonstrate the benefit of the proposed approach.

 

Full Article


Monocular Pedestrian Recognition using Motion Parallax

M. Enzweiler, P. Kanter and D.M. Gavrila

Abstract This paper presents a novel focus-of-attention strategy for monocular pedestrian recognition. It uses Bayes' rule to estimate the posterior for the presence of a pedestrian in a certain (rectangular) image region, based on motion parallax features. This posterior is used as a parameter to control the amount of regions of interest (ROIs) that is passed to subsequent verification stages. For the latter, we use a state-of-the-art pedestrian recognition scheme which consists of multiple modules in a cascade architecture. We obtain optimized settings for the control parameters of the combined cascade system by a sequential ROC convex hull technique.

Experiments are conducted on image data captured from a moving vehicle in an urban environment. We demonstrate that the proposed focus-of-attention strategy reduces the false positives of an otherwise identical monocular pedestrian recognition system by a factor of two, at equal detection rates. The overall system maintains processing rates close to real-time.

Full Article


A Mixed Generative-Discriminative Framework for Pedestrian Classification

M. Enzweiler and D.M. Gavrila

Abstract This paper presents a novel approach to pedestrian classification which involves utilizing the synthesized virtual samples of a learned generative model to enhance the classification performance of a discriminative model. Our generative model captures prior knowledge about the pedestrian class in terms of a number of probabilistic shape and texture models, each attuned to a particular pedestrian pose. Active learning provides the link between the generative and discriminative model, in the sense that the former is selectively sampled such that the training process is guided towards the most informative samples of the latter.

In large-scale experiments on real-world datasets of tens of thousands of samples, we demonstrate a significant improvement in classification performance of the combined generative-discriminative approach over the discriminative-only approach (the latter exemplified by a neural network with local receptive fields and a support vector machine using Haar wavelet features).

Full Article


Multi-view 3D Human Pose Estimation combining Single-frame Recovery, Temporal Integration and Model Adaptation

M. Hofmann and D.M. Gavrila

Abstract We present a system for the estimation of unconstrained 3D human upper body movement from multiple cameras.Its main novelty lies in the integration of three components:single-frame pose recovery, temporal integration and model adaptation.Single-frame pose recovery consists of a hypothesis generation stage, where candidate 3D poses are generated based on hierarchical shape matching in the individual camera views. In the subsequent hypothesis verification stage, candidate 3D poses are re-projected to the other camera views and ranked according to a multi-view matching score.

Temporal integration consists of computing best trajectories combining a motion model and observations in a Viterbi-style maximum likelihood approach. Poses that lie on the best trajectories are used to generate and adapt a texture model, which in turn enriches the shape component used for pose recovery. We demonstrate that our approach outperforms the state-of-the-art in experiments with large and challenging real-world data from an outdoor setting.The new data set is made public to facilitate benchmarking.

Full Article


Monocular Pedestrian Detection: Survey and Experiments

M. Enzweiler and D.M. Gavrila

Abstract Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective.

The first part of the paper consists of a survey. We cover the main components of a pedestrian detection system and the underlying models. The second (and larger) part of the paper contains a corresponding experimental study. We consider a diverse set of state-of-the-art systems: wavelet-based AdaBoost cascade, HOG/linSVM, NN/LRF and combined shape-texture detection.

Experiments are performed on an extensive dataset captured on-board a vehicle driving through urban environment. The dataset includes many thousands of training samples as well as a 27 minute test sequence involving more than 20000 images with annotated pedestrian locations. We consider a generic evaluation setting and one specific to pedestrian detection on-board a vehicle. Results indicate a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds, and a superiority of the wavelet-based AdaBoost cascade approach at lower image resolutions and (near) real-time processing speeds. The dataset (8.5GB) is made public for benchmarking purposes.

Full Article


Dense Stereo-based ROI Generation for Pedestrian Detection

C. Keller, D.F. Llorca and D.M. Gavrila

Abstract This paper investigates the benefit of dense stereo for the ROI generation stage of a pedestrian detection system. Dense disparity maps allow an accurate estimation of the camera height, pitch angle and vertical road profile, which in turn enables a more precise specification of the areas on the ground where pedestrians are to be expected.

An experimental comparison between sparse and dense stereo approaches is carried out on image data captured in complex urban environments (i.e. undulating roads, speed bumps). The ROI generation stage, based on dense stereo and specific camera and road parameter estimation, results in a detection performance improvement of factor five over the state-of-the-art based on ROI generation by sparse stereo. Interestingly, the added processing cost of computing dense disparity maps is at least partially amortized by the fewer ROIs that need to be processed at the system level.

Full Article

Copyright © 2001-2010 Gavrila. All rights reserved. Disclaimer.