Gianfranco Doretto / Research / Project
Dynamic Texture Recognition
A geometric approach for dynamic texture recognition
Description
In this project we consider the problem of recognizing a sequence of
images based upon a joint photometric-dynamic model. This allows us to
recognize not just steam from foliage, but fast turbulent steam from
haze, or to detect the presence of strong winds by looking at trees.
Recognition of objects based on their images is one of the central
problems in modern Computer Vision. We consider objects as being
described by their geometric, photometric and dynamic
properties. While a vast literature exists on recognition based on
geometry and photometry, less has been said about recognizing scenes
based upon their dynamics.
The type of objects that we are interested in recognizing are
sequences of mages of dynamic scenes that exhibit some form of
temporal regularity, such as waves, steam, foliage, etc. These type of
video sequences are commonly known as dynamic textures. While
most of the approaches to recognition of complex motion patterns use
local features or compute optical flow, we take a different approach:
we start from the assumption that sequences of images are realizations
of stochastic processes, and we set out to classify and recognize not
individual realizations but statistical models that generate
them. Therefore, we pose the problem of recognizing dynamic textures
in the space of models where each dynamic texture is uniquely
represented. This entails choosing a distance between models. This is
a non-trivial problem since the space of models is non-linear, and
therefore, computing distances naively can lead one to conclude that
very similar models are dramatically different. We propose and analyze
three different distances in the space of autoregressive moving
average models and assess their discriminative power.
The main contributions of our approach are:
- Posed the problem of recognizing and classifying dynamic textures in the space of dynamic systems where each dynamic texture is uniquely represented.
- Proposed three distances to compare dynamic texture models: simple Euclidean norm, Finsler distance, and Martin distance.
- Proposed two different learning schemes for dynamic textures: one that is based on PCA and another one that is based on ICA.
- Collected 200 video sequences that we used to analyze the discriminative power of the three distances combined with the two different learning schemes (PCA and ICA). The highest hit-ratio (89.5%) is achieved by using PCA and Martin distance.
Results
Nearest neighbor examples
Click the icon to see some results of the nearest neighbor computation
using PCA and Martin distance. The first column shows a sample from
one of the sequences in the database. The distance from the model of
this sequence to every other subsequence is computed, and a sample of
the sequence "closest" to the test is shown in the second
column. The third column shows the second closest sequence and so
on.
Confusion matrix: PCA and Martin distance
Click the icon to see the result of an experimental run on a small
subset of the database (40 dynamic textures out of the 200 of the
entire database), using Martin distance and PCA. The pairwise distance
between each sequence in the dataset is displayed in this plot
(confusion matrix). Each row/column of the matrix represents a
sequence, and a group of four sequences represents one category of
dynamic textures. Dark indicates a small distance, light a large
distance. The minimum distance is of course along the diagonal. Moving
along the vertical axis, we mark the first (o) and second (x) nearest
neighbors. For example, the closest dynamic texture to Smoke-1, along
the vertical axis, is Smoke-2 along the horizontal axis. Similarly,
the second closest dynamic texture to Smoke-1 is Water-Fall-b-1. From
this picture the discriminative power of the Martin distance is
already visible. By using the whole database we achieve a hit-ratio of
89.5%.
Confusion matrix: PCA and Euclidean distance
Click the icon to see the result of an experimental run on a small
subset of the database (40 dynamic textures), using the naive
Euclidean norm and PCA. The subset of the database is the same that we
used in the previous experiment. Again, moving along the vertical
axis, we mark the first (o) and second (x) nearest neighbors. The poor
recognition rate for the Euclidean distance is visible from the large
number of nearest neighbors (o) falling outside of the "same
category" grid lines. By using the whole database we achieve a
hit-ratio of 5.5%. This result was actually expected since the space
of models is non-linear, and demonstrates that computing distances
naively can lead one to conclude that very similar models are
dramatically different.
Related publications
- Saisan, P., Doretto, G., Wu, Y. N., and Soatto, S.
Dynamic texture recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 58–63, Kauai, Hawaii, USA, December 2001.
Details BibTeX PDF (315.7kB ) - Doretto, G., Chiuso, A., Wu, Y. N., and Soatto, S.
Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003.
Details BibTeX PDF (2.6MB )