Gianfranco Doretto / Research / Project
Dynamic Texture Modeling
Modeling spatial and temporal stationarity
Description
Dynamic textures are sequences of images of moving scenes that exhibit
temporal regularity, like sea-waves, smoke, foliage, traffic scenes
etc. An important subset of dynamic textures is the one where the sequences
exhibit not only temporal regularity, but also spatial regularity. We present
a characterization of this kind of dynamic textures, and pose the
problems of modeling, learning, and synthesis of this type of sequences.
Dynamic textures that are spatially regular (or homogeneous) are
commonplace in several regions of video sequences of natural scenes as
well as 2D texture images are. Therefore, having a model that is able
to jointly capture the essence of spatial and temporal structure of
this kind of video sequences is a fundamental step in a variety of
applications ranging from video compression/transmission to video
segmentation, and ultimately recognition.
While it has been observed that the distribution of intensity levels
in natural images is highly kurtotic, such a distribution is
mainly due to the presence of occlusions or boundaries delimiting
statistically homogeneous regions. Therefore, within such
regions it makes sense to employ the simplest possible model that
can capture at least the second-order statistics. As far as capturing
the temporal regularity, it has been shown that linear Gaussian models
of high enough order produce synthetic sequences that are perceptually
indistinguishable from the originals, for sequences of natural
phenomena that are well-approximated by stationary processes.
In this work we make the assumption that temporal and spatial
regularity of video sequences translates into statistical
temporal and spatial stationarity of video signals. We propose to
model the spatio-temporal stationarity of video signals with an
extension of a simple class of multiscale autoregressive models. We
show how model parameters can be efficiently learned, and how they can
be employed to synthesize sequences that extend in both space and time
the original ones.
The main contributions of our approach are:
- Modeling: we characterize dynamic textures that are spatially homogeneous, and propose a new model that is an extension of a class of multiscale autoregressive models.
- Learning: we propose to learn the model using maximum likelihood, but also derive a closed-form sub-optimal solution for the efficient computation of the parameters that is based on SVD and least squares.
- Synthesis: we found that even the simplest model (that in space simulates a second-order Markov random field) captures a wide range of natural phenomena.
- Compression: we found that modeling spatio-temporal stationarity instead of only temporal stationarity allows an increase of the compression ratio of the order of hundreds.
- Implementation: our algorithm is simple to implement, efficient to learn and fast to simulate; it allows one to generate sequences that extend in both space and time the original ones.
Results
The following examples demonstrate the power of our model to
extrapolate new video sequences in both space and time. Given a
training sequence we apply the learning procedure and extract the
parameters of the model. We then simulate the model to synthesize new
video sequences.
To better satisfy the hypothesis of the model that requires
stationarity in both space and time, we normalize the mean and
variance of each sequence before running the learning
algorithm. Notice that the training sequences are, of course, not
perfectly stationary (especially in space), and the model infers the
"average" spatial structure of the original sequence. Also,
for portability issues, the .avi movies are MPEG compressed (video
coder V1), and the quality of the synthesized images has degraded
accordingly.
Boiling water
This example shows 100 frames of a color training sequence and 300
synthesized frames. As one can see from the synthesis results the model
captures the spatial structure as well as the very vibrating temporal
dynamics. We stress the fact that the training sequences are,
of course, not perfectly stationary (especially in space), and the
model infers the "average" spatial
structure of the original sequence.
Download .avi movie [1.46MB]
Download .avi movie [1.46MB]
Fountain
This example shows 100 frames of a color training sequence and 300
synthesized frames. As one can see from the synthesis results the model
captures the spatial structure as well as the temporal
dynamics. In fact, even with the spatial extension, one can clearly perceive
the water falling down consistently.
Download .avi movie [1.27MB]
Download .avi movie [1.27MB]
Ocean waves
This example shows 100 frames of a color training sequence and 300
synthesized frames. The synthesis results show that the waves appearance and
movement are well captured by the model. Notice that the little highlights
of the training sequence are filtered out in the synthesized sequence.
In fact, one could use our model to perform video sequence denoising.
Download .avi movie [371KB]
Download .avi movie [371KB]
Waterfall
This example shows 100 frames of a color training sequence and 300
synthesized frames. Also in this example spatial structure and dynamics are
well captured.
Download .avi movie [540KB]
Download .avi movie [540KB]
Fire
Also in this example from 100 frames of a color training sequence we
synthesize 300 frames. This example has been inserted to show what happen
when the hypothesis of spatial homogeneity is broken. In fact, in this
sequence of fire the spatial stationarity assumption is strongly violated,
and the model captures a "homogenized"
spatial structure that generates rather different images from those of
the training sequence. Moreover, since the learning procedure
factorizes the training set by first learning the spatial parameters,
and relies on these estimates to infer the temporal parameters,
also the temporal statistics (temporal correlation) appear corrupted,
if compared with the one of the original sequence.
Download .avi movie [2.02MB]
Download .avi movie [2.02MB]
Related publications
- Doretto, G., Jones, E., and Soatto, S.
Spatially homogeneous dynamic textures. In Proceedings of European Conference on Computer Vision, pp. 591–602, Prague, Czech Republic, May 2004.
Oral Presentation
Details BibTeX PDF (559.5kB ) - Doretto, G., Chiuso, A., Wu, Y. N., and Soatto, S.
Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003.
Details BibTeX PDF (2.6MB )