Real-time Virtual Object Insertion

Hailin Jin
hljin@cs.ucla.edu
Washington University
Paolo Favaro
fava@ee.wustl.edu
Washington University
Stefano Soatto
soatto@ucla.edu
UCLA
example example
Project summary
We are developing a real-time system for Structure from Motion that allows for virtual objects insertion on the fly! (See here for more information on real-time motion estimation. Code available!) While virtual object insertion has been demostrated before, and is even part of commercial video editing products nowadays (see for instance 2d3), current systems require several batch steps, including feature tracking, outlier rejection, epipolar bundle adjustment, some of which involve human intervention. What our demo shows is that results of similar quality can be obtained in a completely automatic fashion (no human intervention whatsoever), while processing the sequence causally. This enables a whole new range of applications for real-time interaction with mixed reality (real as well as virtual objects), all on commerical off-the-shelf hardware while the camera is undergoing arbitrary motion, including hand-held. The system consists of off-the-shelf hardware (a camera connected to a Pentium PC) and software to
  • automatically select and track region features despite changes in illumination
  • estimate three-dimensional position and orientation of surface patches relative to an inertial reference frame despite individual point-features appearing and disappearing
  • insert a texture-mapped virtual object into the scene so as to make it appear to be part of the scene and moving with it.
This is all done in real-time! Additional graphic features, such as cast shadows, inter-reflections, occlusions, can be handled off-line.

In the following links, we show a little demo of our system. The original footage is taken with a still camera (Canon XL1) in front of which an object rotates on a turntable. We generate three sequences with a texture-mapped virtual object inserted. In Color Vase the sequence goes back and forth showing the frames where the virtual object is never occluded. In Gray Vase the object undergoes a complete turn to show the accuracy of motion estimation. (If you watch very carefully, you may notice a small jump when the sequence goes back to the beginning). Such an error is a well known, but unavoidable problem in Structure from Motion due to the change of scale factor (in a complete turn any point in the scene will disappear at least once). However, our estimated overall accumulated error is reasonably low. In Bouncing Ball an animated demo is shown to stimulate your imagination on how many special effects are possible using this system. We want to make two remarks. First, the processing is completely automatic (we didn't touch any single data of anything that can be though of). Second, in this demo we didn't deal with the relative occlusion of the synthetic and real object, but such a task is possible indeed! What is necessary is only to estimate the structure of the scene. (See here for more information on how to estimate the structure.)