Projects

Visual Motion (Dynamics)

Structure from motion

Interacting with a complex, unknown, dynamic
environment requires continuously updated knowledge of its shape and
motion. We propose several algorithms aimed at inferring shape, motion
and appearance causally and incrementally. See also the related project
on virtual
object insertion in live video.

Ambiguities and optimality in 3D motion estimation

Estimating 3D structure and motion can be
cast as a non-linear, high-dimensional optimization problem, prone to
local minima. Such local minima are intrinsic to the problem, and not
the algorithm or computational device used to solve it, and are
therefore true illusions. Can we identify, analyze, categorize such
illusions, and devise optimal algorithms to infer the global estimate
when possible?

Vision-based control

Real-time, vision-based navigation and interaction

Vision is a remote, distributed, passive
sensor crucial for primates to move within the environment. While
successful application of vision in the loop of a control system has
been demonstrated under partially controlled conditions (freeway
guidance, spacecraft landing), we tackle navigation and interaction
within unknown and dynamic environments by building representations
that can be used for localization, mapping and navigation.

Deforming motions

How can we capture the "overall motion" for a
deforming object? How can we "separate" the overall motion from the
deformation? How do we characterize what is "conserved" during motion?
We propose a framework for modeling deforming motion that entails
defining a "moving average shape" and that allows for the simultanoues
registration and matching of images and for tracking deformable
objects.

We segment videos into domains of homogeneous
motion by minimzing an appropriate cost functionals. Our method
allows tracking moving objects in video sequences, reconstructing the
different depth layers of a 3D scene filmed by a moving camera and
segmenting motion patterns which cannot be distinguished based on their
appearance.

Dynamic textures

Dynamic textures are sequences of images of
scenes that exhibit some form of temporal and possibly spatial
stationarity, such as fire, smoke, steam, foliage etc. Models of
dynamic textures can be used to generate novel synthetic sequences and
manipulate real ones.

How do we distinghish fog from steam? Models
of dynamic textures can be used to discriminate visual processes based
on their spatial as well as temporal statistics.

Human motion

At a fairly high level of abstraction, a
human moving about can be represented as a dynamical system, driven by
intentions (actions), and outputing actuator forces, resulting in joint
trajectories. We study how one can infer actions from remote
measurements of joint angles or trajectories. Ultimately we want to be
able to identify an action regardless of the particular individual, and
to identify the individual regardless of the action. Preliminary
results show that simple dynamical models allow for successful
classification of action classes, such as walking gaits.

Our goal in this project is to build
synthetic models of human faces that can be driven by a speech signal,
while retaining the distinctive features of a particular individual.

Shape (Geometry)

Modeling and representation

Is it possible to define a flexible
representation of shape that is linear, so that the sum of two shapes
is a shape, and operations like differentiation, averaging and
orthogonal projection make sense? We represent the shape of closed
planar contours as the zero level set of functions that satisfy certain
partial differential equations, so that they are (quasi) linear by
construction.

Planar contours can be easily recognized
despite being presented under various transformations, such as scaling,
translation, projective transformations, in addition to being subjected
to measurement noise. Is it possible to define a signature that is
invariant with respect to such transformations, and at the same time
insensitive to noise?

Statistics

By introducing prior knowledge on the shape
of objects of interest, one can drastically improve the robustness of
segmentation processes to noise, background clutter and partial
occlusion. We investigate methods to integrate such priors into level
set based segmentation schemes. By minimzing an appropriate cost
functional we simultaneously generate a knowledge-driven segmentation
of the input image and a decision about where to apply which prior. As
a result we can simultaneously reconstruct multiple familiar objects in
a given image.

Matching

We develop variational techniques for
matching closed planar contours without distinct landmark points.

Computational aesthetics

Certain objects elicit perceptual responses:
a face can appear attractive or friendly, a car can appear aggressive
or comfortable, etc. Since such objects are characterized by their
shape (and to a lesser extent by their radiance), there must be some
form of "map" between geometry and qualitative perception. How is this
map represented? How can it be inferred? Can it be inverted, so as to
allow purposeful changes in geometry to achieve a desired perceptual
response?

Multiple view geometry

Through most of the past decade we have been
engaged in the study of the geometry of multiple views, which plays a
key role in the reconstruction of the 3D structure of the scene, the
motion and calibration of the camera.

Given a sequence of images of a scene
containing multiple rigid objects moving independently, one can
estimate the number of objects, the motion of each object, and what
portion of the visual field corresponds to what object using algebraic
techniques.

Occluding boundaries are visually salient
because they offen result in discontinuities in image intensity.
T-junctions arise when a curve terminates at an occluding boundary
(forming a "T"). Unfortunately, T-junctions do not correspond to
physical points on the scene, as they move with the viewpoint.
Nevertheless, we show that the motion of T-junctions on the image plane
contains information about the scene that can be exploited for
reconstruction.

Visual Reconstruction (Photometry)

Radiance and shape estimation

Traditional stereo relies on the "brightness
constancy" assumption to establish correspondence between points in
different images. This allows "eliminating" photometry from the
equation and reduces stereo reconstruction to a purely geometric
problem. However, when the brightness assumption is not satisfied, one
cannot "separate" the reconstruction of shape from the reconstruction
of reflectance. We show under what condition such separation yields
optimal algorithms. The cost functional can be integrated either in the
image, or on the scene surface where the image back-projects. When
integrating on the scene, the optimality conditions involve derivatives
of the (noise-ridden, measured) images. However, when integrating on
the image, the optimality conditions only involve derivatives of the
(noiseless, ideal) model. Therefore, one can devise
infinite-dimensional gradient-based reconstruction algorithms that do
not involve derivatives of the data, with obvious improvement in
robustness.

Non-lambertian reflection

Traditional stereo relies on establishing
correspondence between points in different images. Unfortunately, such
correspondence cannot be established unless the scene is made of dull
matte objects, for instance with shiny, specular, or translucent
materials. We propose a novel approach that relies on matching image to
image, but on matching each image to an underlying model of the
geometry (shape)photometry (radiance tensor field) of the scene.
Discrepancy from the model is measured by the deviation from the ideal
rank of the radiance tensor field; we develop optimal algorithms to
infer shape and radiance from collections of images, based on
variational techniques and level set methods to integrate partial
differential equations.

Stereoscopic segmentation

When a scene contains no "features" (constant
albedo) or too many features (dense self-similar texture), traditional
stereo matching algorithms fail to find proper "correspondence." We
therefore seek to match image to image, but instead match all data to
an underlying model of the scene geometry and its photometry, subject
to the assumption of constant albedo.

Even when an object has constant albedo, the
measured irradiance is not, because of shading and other effects. While
one could model this effect explicitly (see Stereoscopic Shading
project), if illumination is static one can assume that it is the
albedo that is smooth, and exploit this assumption to recover shape and
albedo.

Many real objects (especially man-made) are
made by composing different materials, and therefore they have
piecewise constant reflectance properties. We have developed algorithms
for estimating the shape, albedo, and albedo boundaries from
collections of images. The process involves performing region-based
segmentation on evolving surfaces.

When neither motion, nor shape nor albedo are
known, under suitable conditions one can simultaneously estimate shape
and camera pose by jointly registering various "regions" of the scene.

Illumination and reflectance

Smooth objects with constant albedo result in
smooth measured images due to non-uniform illumination. We develop
techniques to estimate shape, albedo and illumination properties of the
scene under the assumption of constant albedo and finite point light
sources.

Visual accomodation

It is well-known that blur conveys spatial
information. However, to what extent does it? Can one characterize the
set of shapes that are indistinghishable from any number of defocused
images? Since the answer depends on the radiance of the scene, do there
exist radiances (e.g. structured light patterns) that allow
reconstructing any shape? We present a mathematical analysis of the
observability properties of shape from defocus. We also present novel
techniques to reconstruct shape and radiance

Under the conditions for which one can
reconstruct shape from defocused images, we develop inference
algorithms that are optimal in the sense of least-squares. By
exploiting the properties of semi-infinite orthogonal projectors in
Hilbert spaces we can transform an infinite-plus-one-dimensional
optimization problem into a much more efficient (regularized)
one-dimensional optimization, with obvious consequences to
computational efficiency.

We develop efficient algorithms for
reconstructing 3D shape and radiance from blurred images that are
optimal in the sense of relative entropy. The algorithms consist of
evolving a surface from an initial point towards a (local) minimum of
an energy functional, via the numerical integration of a suitable
partial differential equation.

Images depend on the shape of the scene, its
radiance, as well as the optical characteristics of the imaging device.
In this work we show that one can learn the optical characteristics
from data. Our approach is robust to the point where one can learn the
optical characteristic of a "virtual" camera using synthetic training
data, and apply the results to real cameras in order to reconstruct the
shape of real scenes.

Estimating shape and radiance from blurred
images is well-known to be a severely ill-posed inverse problem. In
this work we propose an efficient solution via the forward solution of
a diffusive partial differential equation with a space-varying stopping
time. This allows us to have a well-behaved, straightforward numerical
algorithm that has proven robust and efficient.

Since images are captured by integrating
photon count over an interval of time (exposure), moving objects appear
blurred in ways that depend upon their shape, motion and reflectance.
We propose a collection of algorithms to estimate shape and motion of
moving objects from one single blurred image.

Visual Modeling for Recognition

Visual features for correspondance

How can we decide whether two images portray
the same scene? What is the scene? How is it related to the image? Are
there representations that are invariant with respect to nuisance
factors (viewpoint, illumination)? Are there image statistics
("feeatures") that do not alter decision performance?

Filtering, control and identification

Given a process that exhibits complex dynamic
behavior, one can choose to model it globally with a very complex
model, or to choose a simple class of models and represent the process
locally, together with the partition of the data into neighborhoods. We
explore the problem of identifying simple local model and their domain
for dynamic processes.

Particle filters are flexible algorithms to
propagate the conditional density of a dynamical model, represented
weakly as a collection of samples drawn from it. We explore particle
algorithms for dynamical models whose state space has a non-trivial
geometric structure, such as a Lie group or a homogeneous space.

We are interested in controlling a
non-holonomic robot as to follow a prescribed trajectory with
guaranteed performance. We propose an algorithm inspired by model-based
predictive control that involves controlling the local approximation of
the trajectory to be tracked, computed in real-time.

Visual Prosthetics

We explore the use of various signal
processing algorithms to enhance the perception capabilities of
patients with retinal implants.

DARPA Grand Challenge

Center for Computational Biology

The convergence of the biomedical revolution and the information
technology revolution is a major event in the history of science. The
emerging discipline of Computational Biology is a natural result of
this convergence. The mathematical and computational sciences lie at
the center of this new endeavor, providing the tools and framework for
model building and quantitative analysis.

The Center for Computational Biology (CCB) was established to develop, implement and test computational biology methods that are applicable across spatial scales and biological systems. Our objective is to help elucidate characteristics and relationships that would otherwise be impossible to detect and measure.

Interactions fostered by this multi-disciplinary scientific network will spawn novel strategies and will initiate training opportunities for the next generation of relevant and promising biological endeavors.

Active Vision Control System

CoMotion

The MURI Project includes students, faculty and staff from StanFord
University, UC Berkeley and UCLA. The aim of the project is to develop
computational methods for the simulation of collaborative motion of
autonomous vehicles. The multi-disciplinary team consists of faculty
and researchers from applied mathematics, statistics, computer
science, electrical engineering and aeronautical engineering who
combine their expertise to derive practical control algorithms for
groups of collaborating vehicles. (Please follow the links to each of
the faculty members to obtain their publications and presentations).