In computer vision, we are interested in extracting information about the physical world from a set of images. In object recognition, for instance, we may ask whether a certain image portrays a given person or not. Unfortunately, images depend on a number of other uninteresting factors, such as viewpoint, illumination, and clutter, which makes inference of the relevant information difficult. The effect of some of this nuisance factors (for instance viewpoint) can be captured as a set of transformations of the images, and therefore can be explicitly eliminated in analysis. A large part of my work deals with representations and learning methods that can be used to do this.

Image Representations

Joint alignment. Invariant representation may be obtained by canonization. Joint alignment is the problem of canonizing simultaneously a large collection of data, while automatically determining the optimal canonical configurations. Inspired by image congealing, in [NIPS06] we develop a joint alignment technique based on a complexity-distortion trade-off. In [CVPR08] we generalize the approach, discuss when the problem is well posed, and study the effect of non-invertible image transformations and image boundaries. We also apply the method to the alignment of thousand of natural image patches.

Feature growing. Since there exist no occlusion invariant features, viewpoint invariant features are intrinsically local. But what is the optimal support of a feature? In [CVPR06] we determine the best feature support automatically during matching, while validating putative correspondences of not-so-distinctive features.

Natural deformation statistics. Robust viewpoint invariant (or insensitive) features are designed to handle only a limited set of image transformations. But which transformations are the most important? To answer this question we collect statistics of natural image deformations from realistic synthetic data [ECCV06].

General Viewpoint Invariant Features. Popular visual features (Harris-Affine, SIFT, ...) are invariant to a limited set of image transformations (affinity, similarity, ...). Here we prove the existence of viewpoint invariant features for generic 3-D shapes and arbitrary camera motion [ICCV05]. See [TR05] for a more detailed proof.

Learning

VicinalBoost. Inspired by vicinal risk and tangent distance, we introduce transformation invariance in boosting. This improves both generalization error and learning efficiency ([ICCV07], code).

Relaxed Matching Kernels. In [CVPR08B] we propose a framework which entails pyramid matching kernels, spatial pyramid matching kernels, proximity distributions kernels and a large number of similar kernels that measure the similarity of images. We study general properties of these kernels, propose an algorithm to compute them efficiently, and compare several of them on standard image categorization datastes.

Other projects

Segments, context, and bag-of-features. In [ICCV07B] we study the effect of context and segmentation in a bag-of-feature based visual category recognizer. Segments provide object boundaries and a simple anchor to attach contextual information. But obtaining correct segmentations is difficult at the low level. Therefore multiple segmentations are generated, and then segments are validated as supports of objects in context.

Structure from motion. In [ICCV05] we integrate RANSAC to a Kalman filter; the result is an estimator very robust to outliers which is suitable to on-line structure from motion estimation. In [CVPR07] we study the singularities caused by feature re-projection in forward motion and squared residual. In particular, we show that bounding the depth estimates makes the error surface continuous.