Gianfranco Doretto / Research / Project
Dynamic Texture Modeling
Modeling temporal stationarity
Description
Dynamic textures are sequences of images of moving scenes that exhibit
temporal regularity, intended in a statistical sense, like sea-waves,
smoke, foliage, whirlwind but also talking faces, traffic scenes
etc. We present a characterization of dynamic textures that poses the
problems of modeling, learning, and synthesis of this type of
sequences.
Consider a sequence of images of a moving scene. Each image is an
array of positive numbers that depend upon the shape, pose and motion
of the scene as well as upon its material properties (reflectance) and
on the light distribution of the environment. It is well known that
the joint reconstruction of photometry and geometry is an
intrinsically ill-posed problem: from any (finite) number of images it
is not possible to uniquely recover all unknowns (shape, motion,
reflectance and light distribution). Given this arbitrariness in the
reconstruction and interpretation of visual scenes, it is clear that
there is no notion of a "true" interpretation, and the
criterion for correctness is somewhat arbitrary (with the exception of
humans that can use prior information and other sensory modalities,
such as touch). For this reason, in this project we analyze sequences
of images of moving scenes solely as visual signals, and
interpreting and understanding a signal amounts to
inferring a stochastic model that generates it. The goodness of
the model can be measured in terms of the total likelihood of the
measurements or in terms of its predicting power: a model should be
able to give accurate predictions of future signals. In a sense, we
look for an explanation of the image data that allows us to
recreate and extrapolate it. It can therefore be thought of as the
compressed version or the essence of the sequence of images.
By making the assumption that temporal regularity of video
sequences translates into statistical stationarity of video
signals, the general model for a dynamic texture is a dynamical
system. We proved that even the simplest instance of the model
(i.e. linear dynamic sistem) can capture a variety of natural
phenomena.
The main contributions of our approach are:
- Representation: we present a novel definition of dynamic texture that is general (even the simplest instance can capture the second-order statistics of a video signal), and precise (it allows making analytical statements and drawing from the rich literature on system identification).
- Learning: we propose two criteria: total likelihood or prediction error. For the simplest instance of the model we propose a closed-form solution for the learning problem.
- Recognition: we found that textures alike tend to cluster in model space, and assessed potential for recognition of dynamic visual processes.
- Synthesis: we found that even the simplest model (first-order autoregressive moving average model with Gaussian input) captures a wide range of natural phenomena.
- Implementation: our algorithm is simple to implement, efficient to learn and fast to simulate; it allows one to generate infinitely long sequences from short input sequences.
Results
The following examples demonstrate the power of our model to
extrapolate new video sequences. Given a training sequence we apply
the learning procedure and extract the parameters of the model. We
then simulate the model to synthesize new video sequences.
Note that the learning procedure has been applied directly to the raw
data, and no preprocessing has been performed. Also, for portability
issues, the .avi movies are MPEG compressed (video coder V1), and the
quality of the synthesized images has degraded accordingly.
Grayscale sequences
In the following example, from four training sequences (smoke,
fountain, river waves, and curtain) of 100 grayscale frames each, we
synthesize 300 frames. (The training sequences have been borrowed from
the
MIT temporal texture database.)
Download .avi movie [2.3Mb]
Download .avi movie [2.3Mb]
Fountain
This example shows 100 frames of a color training
sequence and 200 synthesized frames. Note that we do not require the video
sequence to exhibit spatial regularity. In fact, the model aims at
modeling temporal correlation only, while spatial correlation can be
different at every point of the image plane.
Download .avi movie [1.04Mb]
Download .avi movie [1.04Mb]
Ocean waves
This example shows 100 frames of a
color training sequence and 200 synthesized frames. Here the video
sequence exhibits a lot of spatial regularity. Notice that the little
highlights of the training sequence are filtered out in the synthesized
sequence. In fact, one could use our model to perform video sequence
denoising.
Download .avi movie [722Kb]
Download .avi movie [722Kb]
Fire
This example shows 100 frames of a
color training sequence and 200 synthesized frames. This video sequence is
far from being Gaussian, and while the model is linear, the synthesized
outcome still has the temporal dynamics very well preserved, and the
images that look appealing. (The training sequence has been borrowed from the
Artbeats Digital Film Library.)
Download .avi movie [890Kb]
Download .avi movie [890Kb]
Water
This example shows 100 frames of a
color training sequence and 200 synthesized frames. Note that the training
sequence is full of highlights and this makes the leaning procedure much
more difficult. Nevertheless, the synthesized video sequence has the
temporal dynamics very well preserved, while the quality of the images
has clearly degraded.
Download .avi movie [1.37Kb]
Download .avi movie [1.37Kb]
Related publications
- Doretto, G., Chiuso, A., Wu, Y. N., and Soatto, S.
Dynamic textures.
International Journal of Computer Vision, 51(2):91–109, 2003.
Details BibTeX PDF (2.6MB ) - Soatto, S., Doretto, G., and Wu, Y. N.
Dynamic textures.
In Proceedings of IEEE International Conference on Computer Vision, pp. 439–446, Vancouver, BC, Canada, July 2001.
Oral Presentation
Details BibTeX PDF (929.6kB ) - Doretto, G., Pundir, P., Wu, Y. N., and Soatto, S.
Dynamic textures.
Technical Report TR200032, UCLA Computer Science Department, 2000.
Details BibTeX PDF (586.6kB )