In computer vision, we are interested in extracting information about the physical world from a set of images. In object recognition, for instance, we may ask whether a certain image portrays a given person or not. Unfortunately, images depend on a number of other uninteresting factors, such as viewpoint, illumination, and clutter, which makes inference of the relevant information difficult. The effect of some of this nuisance factors (for instance viewpoint) can be captured as a set of transformations of the images, and therefore can be explicitly eliminated in analysis. A large part of my work deals with representations and learning methods that can be used to do this.

## Image Representations

**Joint alignment.** Invariant representation may be obtained by
canonization. Joint alignment is the problem of canonizing
simultaneously a large collection of data, while automatically
determining the optimal canonical configurations. Inspired by image
congealing,
in [NIPS06] we
develop a joint alignment technique based on a complexity-distortion
trade-off. In [CVPR08]
we generalize the approach, discuss when the problem is well posed,
and study the effect of non-invertible image transformations and
image boundaries. We also apply the method to the alignment of
thousand of natural image patches.

**Feature growing.** Since there exist no occlusion invariant
features, viewpoint invariant features are intrinsically local. But
what is the optimal support of a feature?
In [CVPR06]
we determine the best feature support automatically during matching,
while validating putative correspondences of not-so-distinctive
features.

**Natural deformation statistics.** Robust viewpoint invariant
(or insensitive) features are designed to handle only a limited set of
image transformations. But which transformations are the most
important? To answer this question we collect statistics of natural
image deformations from realistic synthetic
data [ECCV06].

**General Viewpoint Invariant Features.** Popular visual
features (Harris-Affine, SIFT, ...) are invariant to a limited set
of image transformations (affinity, similarity, ...). Here we prove
the existence of viewpoint invariant features for generic 3-D shapes
and arbitrary camera
motion [ICCV05].
See [TR05]
for a more detailed proof.

## Learning

**VicinalBoost.** Inspired by vicinal risk and tangent distance,
we introduce transformation invariance in boosting. This improves
both generalization error and learning efficiency
([ICCV07],
code).

**Relaxed Matching Kernels.**
In [CVPR08B]
we propose a framework which entails pyramid matching kernels,
spatial pyramid matching kernels, proximity distributions kernels
and a large number of similar kernels that measure the similarity of
images. We study general properties of these kernels, propose an
algorithm to compute them efficiently, and compare several of them
on standard image categorization datastes.

## Other projects

**Segments, context, and bag-of-features.** In
[ICCV07B]
we study the effect of context and segmentation in a bag-of-feature
based visual category recognizer. Segments provide object boundaries
and a simple anchor to attach contextual information. But obtaining
correct segmentations is difficult at the low level. Therefore
multiple segmentations are generated, and then segments are validated
as supports of objects in context.

**Structure from motion.**
In [ICCV05]
we integrate RANSAC to a Kalman filter; the result is an estimator
very robust to outliers which is suitable to on-line structure from
motion estimation.
In [CVPR07]
we study the singularities caused by feature re-projection in forward
motion and squared residual. In particular, we show that bounding the
depth estimates makes the error surface continuous.