Gianfranco Doretto / Research / Project
Dynamic Texture Recognition
A geometric approach for dynamic texture recognition
In this project we consider the problem of recognizing a sequence of images based upon a joint photometric-dynamic model. This allows us to recognize not just steam from foliage, but fast turbulent steam from haze, or to detect the presence of strong winds by looking at trees.
Recognition of objects based on their images is one of the central problems in modern Computer Vision. We consider objects as being described by their geometric, photometric and dynamic properties. While a vast literature exists on recognition based on geometry and photometry, less has been said about recognizing scenes based upon their dynamics.
The type of objects that we are interested in recognizing are sequences of mages of dynamic scenes that exhibit some form of temporal regularity, such as waves, steam, foliage, etc. These type of video sequences are commonly known as dynamic textures. While most of the approaches to recognition of complex motion patterns use local features or compute optical flow, we take a different approach: we start from the assumption that sequences of images are realizations of stochastic processes, and we set out to classify and recognize not individual realizations but statistical models that generate them. Therefore, we pose the problem of recognizing dynamic textures in the space of models where each dynamic texture is uniquely represented. This entails choosing a distance between models. This is a non-trivial problem since the space of models is non-linear, and therefore, computing distances naively can lead one to conclude that very similar models are dramatically different. We propose and analyze three different distances in the space of autoregressive moving average models and assess their discriminative power.
The main contributions of our approach are:
- Posed the problem of recognizing and classifying dynamic textures in the space of dynamic systems where each dynamic texture is uniquely represented.
- Proposed three distances to compare dynamic texture models: simple Euclidean norm, Finsler distance, and Martin distance.
- Proposed two different learning schemes for dynamic textures: one that is based on PCA and another one that is based on ICA.
- Collected 200 video sequences that we used to analyze the discriminative power of the three distances combined with the two different learning schemes (PCA and ICA). The highest hit-ratio (89.5%) is achieved by using PCA and Martin distance.
Nearest neighbor examples
Click the icon to see some results of the nearest neighbor computation using PCA and Martin distance. The first column shows a sample from one of the sequences in the database. The distance from the model of this sequence to every other subsequence is computed, and a sample of the sequence "closest" to the test is shown in the second column. The third column shows the second closest sequence and so on.
Confusion matrix: PCA and Martin distance
Click the icon to see the result of an experimental run on a small subset of the database (40 dynamic textures out of the 200 of the entire database), using Martin distance and PCA. The pairwise distance between each sequence in the dataset is displayed in this plot (confusion matrix). Each row/column of the matrix represents a sequence, and a group of four sequences represents one category of dynamic textures. Dark indicates a small distance, light a large distance. The minimum distance is of course along the diagonal. Moving along the vertical axis, we mark the first (o) and second (x) nearest neighbors. For example, the closest dynamic texture to Smoke-1, along the vertical axis, is Smoke-2 along the horizontal axis. Similarly, the second closest dynamic texture to Smoke-1 is Water-Fall-b-1. From this picture the discriminative power of the Martin distance is already visible. By using the whole database we achieve a hit-ratio of 89.5%.
Confusion matrix: PCA and Euclidean distance
Click the icon to see the result of an experimental run on a small subset of the database (40 dynamic textures), using the naive Euclidean norm and PCA. The subset of the database is the same that we used in the previous experiment. Again, moving along the vertical axis, we mark the first (o) and second (x) nearest neighbors. The poor recognition rate for the Euclidean distance is visible from the large number of nearest neighbors (o) falling outside of the "same category" grid lines. By using the whole database we achieve a hit-ratio of 5.5%. This result was actually expected since the space of models is non-linear, and demonstrates that computing distances naively can lead one to conclude that very similar models are dramatically different.
- Saisan, P., Doretto, G., Wu, Y. N., and Soatto, S.
Dynamic texture recognition.
In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 58–63, Kauai, Hawaii, USA, December 2001.
Details BibTeX PDF (315.7kB )
- Doretto, G., Chiuso, A., Wu, Y. N., and Soatto, S.
International Journal of Computer Vision, 51(2):91–109, 2003.
Details BibTeX PDF (2.6MB )