TVCG Submission

Video Snapshots: Creating high-quality images from video clips

(Supplementary material)


This webpage shows the results from the paper submission in their full resolution. For each video sequence, we show the input video clip and the user-specified reference frame. We then show the results of improving the quality of the reference frame using our weighted multi-image enhancement framework in combination with different sets of weighting criteria (motion confidence, local sharpness, saliency, and time). We compare these results with other image enhancement operations such as lanczos sampling, single-image super-resolution, and multi-image super-resolution.

Please move your mouse over the links below each image to switch between different results. This page uses javascript to switch between images so please make sure that Javascript/ActiveX are enabled in your browser.


1. bounce (image fusion with saliency and temporal weights)

This is a 25-frame 960X540 video clip of a bouncing ball, shot with a static camera. While the original video is fairly high quality, each frame of the video only captures the ball at one instant in time. By combining all the frames in the clip using saliency weights we can produce a snapshot that summarizes the entire video clip in a static image. While the direction the ball is moving in might be ambiguous in such a snapshot, we use temporal weighting (linear ramp, overlaying) to encode the perception of motion from left to right. We can also manipulate the temporal weighting to create the perception of the ball moving in the opposite direction.

Reference frame | Video Clip Video snapshot with saliency weights and temporal overlaying (default view)
Video snapshot with saliency weights and temporal reverse-overlaying
Video snapshot with saliency and linear ramp time weighting
Video snapshot with saliency and time-sampling
Video snapshot with motion confidence
Multi-image fusion



2. ditchjump (image fusion with saliency and temporal weights)

This is a 35-frame 960X540 video clip of a bicyclist jumping over a ditch, shot with a static camera. By using saliency weights in our framework, we can produce an "action" snapshot that depicts the motion in the clip. In addition, by applying temporal-weighting (sampling, linear ramp, and overlaying) we reinforce the perception of the bicyclist moving from left to right in the snapshot.

Reference frame | Video Clip Video snapshot with saliency weights, time-sampling and temporal overlaying (default view)
Video snapshot with saliency and linear ramp time weighting
Video snapshot fusion with saliency and time-sampling
Video snapshot with motion confidence
Traditional multi-image fusion



3. mural (super-resolution, denoising, and sharpening with local sharpness weights)

This is a 21-frame 640X360 video clip shot with a moving camera. When photographing a scene with a moving camera, it is often the case that the some of the frames, possibly even the frame captured at the right moment, are motion blurred. This is illustrated in this result; the reference frame is chosen because it has the best composition of the scene, but the image itself is blurred. Upsampling the reference frame using lanczos filtering or using the method of Yang et al. [1] can not recover a sharp image. Multi-image super-resolution [2] produces a sharper result but the details are still not crisp. Applying local sharpness weights allows us to recover a super-resolved and sharp reference frame by making use of only the sharp pixels in the video clip.

Reference frame | Video Clip Video snapshot with local sharpness weights (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]




4. focus (super-resolution, denoising, and sharpening with local sharpness weights)

In this 21-frame 640X360 video clip shot with a moving camera, the focal plane is being shifted from back to front. As a result, only parts of the scene are sharp in each frame; for example, only part of the beam on the left is in focus in the reference frame. We make use of the camera motion to super-resolve the reference frame. In addition, by making use of local sharpness weights, we propagate each part of scene from the frames it is sharpest in. Doing this allows us to create the best focused high-resolution snapshot. In comparison, lanczos upsampling is blurry; single-image super-resolution [1] is only marginally sharper than lanczos upsampling; and multi-image super-resolution [2] without any weighting does not leverage the sharp frames appropriately.

Reference frame | Video Clip Multi-image super-resolution with local sharpness weights (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]




5. dunks (super-resolution, denoising, and sharpening with motion confidence, and saliency weights)

This is an 25-frame 640X360 video clip of a man dunking a basketball. As can be seen in the reference frame (please see the wall at the back), the noise in this video is quite high. As expected, lanczos upsampling produces a blurry and noisy image, and while the method of Yang et al. [1] improves the resolution, the noise is still an issue. Traditional multi-image super-resolution [2] is able to leverage the frames in the video clips to drastically reduce the noise, but blurs out the moving man. By using our weighted image enhancement framework, we are able to retain the high-quality background of the multi-image super-resolution result, while addding a number of other effects; by using motion confidence weights, we are able to produce snapshots where we can either retain the man in mid-flight (as in the reference frame), by using saliency weights we can capture his motion as he dunks the ball, and by combining saliency with temporal weights we can encode the direction of motion in the snapshot.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency, time-sampling, and overlaying (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency, time-sampling, and overlaying (default view)
Video snapshot with saliency and time-sampling
Video snapshot with motion confidence



6. jump (super-resolution, denoising, and sharpening with motion confidence, and saliency weights)

This is a 9-frame 480X640 video clip shot with a hand-held camera that captures a man jumping off a cliff. As is clear from the lanczos upsampled result, the textures on the rocks and the trees in the original video are lost, and there are strong compression artifacts in the water. Single-image super-resolution using the method of Yang et al. [1] sharpens the reference frame marginally, but can not remove the artifacts (see the blocking artifacts in the water). By leveraging the multiple frames we are able to super-resolve the reference frame to 960X1280 while reducing the noise and blocking artifacts. In addition to this, we use saliency weights to produce snapshots that capture the jumping man's motion over time, or we can use motion confidence weights to freeze the man in mid-air.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency weights and time-sampling (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency weights and time-sampling (default view)
Video snapshot with motion confidence
Video snapshot with saliency




7. dive (super-resolution, denoising, and sharpening with saliency weights)

This is a 28-frame 640X480 video clip of a diving girl. The low-resolution and noise of the original video are clear from the lanczos upsampled reference frame as well as the single-image super-resolution result created using the the method of Yang et al. [1]. Our framework is able to combine all the frames appropriately to produce an "action" snapshot, where the background scene is super-resolved and denoised by combining multiple frames, while the motion of the diving girl is preserved from individual frames. One limitation of our importance measures is that they are not able to distinguish high-frequency motions and stochastic textures like the rippling water in the pool. Because of this, while we reduce the noise in the water by averaging many frames, we also over-smooth the ripples out to an extent.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency weights, time-sampling, and overlaying (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency weights, time-sampling, and overlaying (default view)
Video snapshot with saliency and time-sampling
Video snapshot with motion confidence



8. basketball (super-resolution, denoising, and sharpening with saliency weights)

This is a 31-frame 640X360 video clip of a basketball game, shot with a hand-held camera. In addition to its low-resolution, the noise in the clip can be seen on the walls at back in both the reference frame, the lanczos upsampled version, and the single-image super-resolution result [1]. Our framework leverages all the frames in the clip to produce a high-quality snapshot that has better resolution and noise characteristics, and captures the motion of the basketball and the players in a single image.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency weights and time-sampling (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency weights and time-sampling (default view)
Video snapshot with motion confidence
Video snapshot with saliency





9. walk (super-resolution, denoising, and sharpening with saliency weights)

This is a 13-frame 640X360 video clip of a person walking, shot with a hand-held camera. Compared to the lanczos-upsampled result as well as the single-image super-resolution result [1], our framework leverages all the frames in the clip to produce a high-quality snapshot that has better resolution and noise characteristics, and captures the walk in a single image. Alternatively, by using inverse saliency weights, we can create a high-quality image of the background without the person.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency weights and time-sampling (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency weights and time-sampling (default view)
Video snapshot with inverse saliency
Video snapshot with motion confidence




10. juggle (super-resolution, denoising, and sharpening with saliency weights)

This is a 24-frame 640X360 video clip of a kid juggling two balls. As can be seen, compared to the single-image super-resolution result, our video snapshot uses all the frames to produce a result that has better resolution (the letters on the blackboard are clearer) and has lower noise (note the color noise near the duster in the bottom left). We can create a snapshot to capture the motion of the hands and the balls, use time-sampling to reduce the cluttering, or even freeze the motion to capture a single instant in time.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency weights and time-sampling (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency weights and time-sampling (default view)
Video snapshot with saliency
Video snapshot with motion confidence




11. gokarts (super-resolution, denoising, and sharpening with saliency weights)

This is a 38-frame 640X360 video clip showing go-karts racing at a track. Our snapshot uses saliency and time-sampling to capture the how the blue go-kart racer takes the inside track to overtake the black go-kart racer. At the same time, the background has a better resolution (the video snapshot resolves the number 2 and 1 on the go-karts in the middle) and less noise than the lanczos up-sampled and the single-image super-resolution results.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency weights and time-sampling (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency weights and time-sampling (default view)
Video snapshot with saliency
Video snapshot with motion confidence




12. traffic (super-resolution, denoising, and sharpening with saliency weights)

This is a 21-frame 640X360 video clip of traffic at a busy roundabout. This is an example of how we can create snapshots even when there are many objects moving in complex ways in the video clip. Creating a snaphsot with saliency weights captures the flow of the traffic clearly, but because there are so many vehicles, the result is cluttered. By sampling over time, and overlaying later events over previous ones, we can unclutter the snapshot, while giving a sense of the traffic in the original clip. We can also use the inverse saliency weights to remove the moving objects, and create a high-quality snapshot of the background to get a sense of the objects in the scene that don't move at all.

Reference frame | Video Clip Comparisons
Enhancements
Video snapshot with saliency weights, time-sampling, and overlaying (default view)
Lanczos up-sampling
Single-image super-resolution [1]
Multi-image super-resolution [2]

Video snapshot with saliency weightstime-sampling, and overlaying (default view)
Video snapshot with saliency
Video snapshot with inverse saliency
Video snapshot with motion confidence




12. calendar, foliage

Here we show comparisons between our results and results from a state-of-the-art video super-resolution technique [3].

Calendar dataset, comparisons
Foliage dataset, comparisons
Video snapshot(default view)
Video super-resolution [3]
Bicubic up-sampling
Video snapshot with motion confidence (default view)
Video super-resolution [3]
Bicubic up-sampling



References

[1] J. Yang, J. Wright, T. Huang, and Y. Ma, "Image super-resolution via sparse representation", IEEE Transactions on Image Processing, vol. 19, pp. 2861-2873, Nov 2010.

[2] M. Irani and S. Peleg, "Improving resolution by image registration", CVGIP: Graphical Models and Image Processing, vol. 53, pp. 231-239, May 1991.

[3] C. Liu and D. Sun, "A Bayesian approach to adaptive video super resolution", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.