I do computer vision, mainly using "physics-based" approaches that explicitly consider illumination, reflection, refraction, scattering, and imaging. A brief description of recent and ongoing research threads can be found below. For more, check out the publications and people pages.
We are looking at different types of active and passive sensors that produce useful information about the world from measurements of radiant flux.
|Wide-angle Micro Sensors|
At least in the near term, micro-scale platforms like micro air vehicles and small sensor nodes are unlikely to have power, volume, and mass budgets to support conventional imaging and post-capture processing for detection, tracking, and so on. To help overcome this, we're considering sensor designs that allow some components of scene analysis to happen optically, before light strikes the sensor. This includes optical designs (PAMI 2013) and methods for learning useful optical projections (NIPS 2011) . For more, see Sanjeev's project page. This is part of, and largely motivated by, the RoboBees Project.
|Spectral Image Models for Sensing|
Collecting spectral image measurements, be they trichromatic or something else, requires giving up some spatial or temporal resolution. Depending on the task, certain schemes for sampling spectrum, space, and time will be more desirable than others; but regardless of the sampling pattern, reconstruction and scene analysis should be informed by the statistics of the underlying spectral light field. For this reason, we've collected a database of real-world hyperspectral images and taken an initial look at its statistics (CVPR 2011). We have also designed a camera that leverages joint spatial and spectral statistics to provide depth information and an augmented depth-of-field (ECCV 2012).
|Consumer Cameras as Radiometric Devices|
Most Internet images exist in narrow-gamut formats, with pixel values that are severely distorted by unknown tone-mapping operators. This limits our machines' abilities to use radiometric reasoning when interpreting these images, and it limits the utility of this vast weakly-labeled image collection for training future vision systems. To overcome these limitations, we are searching for reliable and scalable ways to undo the color distortions of consumer cameras. So far we have tried one approach that is deterministic (BMVC 2009) and an evolved one that is probabilistic (CVPR 2012).
Information about shape, materials, motion, and illumination are encoded in an image in various ways, and reconstruction is the process of recovering this scene information from the bottom up. Some examples we are exploring:
|Shape from Specular Reflections|
The way that a curved mirror distorts its environment tells us about its shape, and this information becomes even more accessible when the object moves. We've derived the PDE that relates specular motion to shape and analyzed conditions for a unique solution. We began with the 2D case and a non-linear formulation in three-dimensions (ICCV 2007). Recently we've used a re-parameterization in terms of the reflection vector to derive a simple, linear PDE in 3D that is easy to analyze and use (ICCV 2009), and often allows complete reconstruction from a single flow field (CVPR 2011).
|Inferring Reflectance with Real-world Illumination|
Optical material properties tell us something about how an object will behave when acted upon, so inferring these properties from an image seems useful. Motivated by this goal, we've considered this toy problem: Given a single image of a known shape under unknown natural lighting, infer its surface reflectance function (BRDF). We solve this by reducing the dimension of the BRDF domain using a new symmetry constraint (ECCV 2008) and exploiting the statistics of natural lighting in a Bayesian framework (ECCV 2010). For a summary, see our recent technical report.
|Shape from Reflectance Symmetries|
When recovering shape from diffuse shading we are faced with an intrinsic shape/lighting ambiguity. We show that almost any additive specular reflection component resolves this ambiguity. The basic idea is to exploit symmetries (reciprocity and isotropy) in the BRDF, which induce joint constraints on shape, lighting, and viewpoint. These constraints can be described on the Gaussian sphere (CVPR 2007), or more conveniently on its abstraction, the real projective plane (CVPR 2009). For a summary, see the journal version in PAMI 2011.
|Object Color from Image Color|
The appearance of an object depends on the spectrum of the illuminant, and for object recognition and other tasks, it is helpful to "undo" these illumination effects. Toward this goal of "color constancy", we've described the conditions for linear and diagonal maps to be sufficient for mapping a color image to its canonical form (the image that would have been obtained by a standard observer under a standard illuminant) (ICCV 2007, SIGGRAPH 2008) and we've shown that spatial image decompositions allow accurate and efficient recovery of map parameters (PAMI 2012), especially if we have multiple images of the same object as input (CVPR 2011).
|Scene information from spatially-varying blur|
Blur is a form of image degradation caused by scene motion, camera motion, and scene relief. Since it is correlated with scene structure, blur can be used as a source of scene information. To exploit this cue, we need to answer two questions for each small image patch: Is the patch blurry or sharp? If it's blurry, what is the associated blur kernel? Our first crack at this uses a Gaussian scale mixture model for sharp image patches, and it works pretty well, at least for motion blur (CVPR 2010).
|Color-based isolation of diffuse appearance|
Many powerful vision algorithms are predicated on the presence of diffuse (Lambertian) reflectance, and their performance often diminishes when this assumption is violated. It turns out that for dielectric materials under known light color, a "diffuse-only" image can often be obtained just by linearly transforming the RGB color space (IJCV 2008).
Scene understanding through processes like those above requires structural and statistical knowledge of the visual world. To this end, we're developing approaches to measure real-world shape and material information very efficiently and accurately. In addition to providing good "priors" for computer vision systems, these approaches are useful for "capturing appearance" to enhance physical realism in graphics applications.
What distinguishes appearance capture from the passive scene understanding described above is that we have: 1) the luxury of multiple images and/or active manipulation of lighting to induce additional constraints on the scene; and 2) higher expectations in terms of precision and physical accuracy. We're exploring two complimentary types of approaches:
|Active Shape and Reflectance Capture|
For accurate results, reflectance must be obtained along with shape from a single set of images. We've shown that certain imaging configurations allow one to decouple shape and reflectance information in image data, so that each can be inferred without making assumptions about the other. This includes methods that exploit reciprocity (Helmholtz stereopsis), tangent plane symmetries (SIGGRAPH Asia 2008), and reciprocal images with structured lighting (SIGGRAPH 2010). Alternatively, we're exploring methods for inferring reflectance and shape at the same time, while relaxing the restrictions on both as much as possible (CVPR 2008). A general overview of the area can be found in our tutorial and survey: FnT CVG 2008.
|Internet-based Appearance Capture|
An alternative approach is to learn about appearance by exploiting the billions of images and videos being shared online. For example, we've shown that Facebook's social incentives for humans to manually associate names with faces has produced a wealth of labeled face data (Proc. IEEE 2010). We've also shown that radiometrically-calibrated web-cam image sequences of outdoor scenes can be used to recover geometry and materials (CVPR 2008). Ideally, we'd like to extract veridical shape and materials from other online imagery as well (Flickr, Google Images, etc.) but as described above, this would require reliable methods for undoing the "blackbox" tone-mapping operators of consumer cameras (CVPR 2012).