Multi-view approach to motion and stereo

US 6,487,304 B1
Filed: 06/16/1999
Issued: 11/26/2002
Est. Priority Date: 06/16/1999
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented process for estimating motion or depth values for multiple images of a 3D scene, comprising using a computer to perform the following acts:

inputting the multiple images of the 3D scene;

selecting at least two images from the multiple images, hereafter referred to as keyframes;

creating a motion or depth map for each keyframe by estimating a motion or depth value for each pixel of each keyframe using motion or depth information from images neighboring the keyframe in viewpoint or time.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and process for computing motion or depth estimates from multiple images. In general terms this is accomplished by associating a depth or motion map with each input image (or some subset of the images equal or greater than two), rather that computing a single map for all the images. In addition, consistency between the estimates associated with different images is ensured. More particularly, this involves minimizing a three-part cost function, which consists of an intensity (or color) compatibility constraint, a motion/depth compatibility constraint, and a flow smoothness constraint. In addition, a visibility term is added to the intensity (or color) compatibility and motion/depth compatibility constraints to prevent the matching of pixels into areas that are occluded. In operation, the cost function is computed in two phases. During an initializing phase, the motion or depth for each image being examined are estimated independently. Since there are not yet any estimates for other frames to employ in the calculation, the motion or depth compatibility term is ignored. In addition, no visibilities are computed and it is assumed all pixels are visible. Once an initial set of motion estimates have been computed, the visibilities are computed and the motion or depth estimates recalculated using the visibility terms and the motion or depth compatibility constraint. The foregoing process can then be repeated several times using the revised estimates from the previous iteration as the initializing estimates for the new iteration, to obtain better estimates of motion/depth values and visibility.

Citations

105 Claims

1. A computer-implemented process for estimating motion or depth values for multiple images of a 3D scene, comprising using a computer to perform the following acts:
- inputting the multiple images of the 3D scene;
  
  selecting at least two images from the multiple images, hereafter referred to as keyframes;
  
  creating a motion or depth map for each keyframe by estimating a motion or depth value for each pixel of each keyframe using motion or depth information from images neighboring the keyframe in viewpoint or time.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 2. The process of claim 1, wherein the act of estimating comprises the acts of:
3. The process of claim 2, wherein the act of computing the initial estimates for the pixels of a keyframe comprises the acts of,:
- identifying one or more images which are adjacent in time or viewpoint to the keyframe and designating each of said images as an neighboring image;
  
  generating a series of candidate motion or depth values for the pixel of the keyframe;
  
  for each candidate motion or depth value, computing an indictor for each neighboring image indicative of the difference between a desired characteristic exhibited by a pixel in the neighboring image which corresponds to a pixel of the keyframe and that exhibited by the keyframe'"'"'s pixel, weighting the difference indicator for each neighboring image based on the degree to which the neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the keyframe to produce a weighted indicator, summing the weighted indicators associated with the neighboring images to produce a cost factor;
  
  identifying the lowest overall cost factor for each pixel of the keyframe among those produced with each candidate motion or depth value; and
  
  assigning the candidate motion or depth value corresponding to the lowest cost factor as the initial estimate of the motion or depth value for the associated pixel of the keyframe.
4. The process of claim 3, wherein the series of candidate motion or depth values includes zero (0) as a baseline value.
5. The process of claim 3, further comprising the act of aggregating the computed first indicator spatially prior to performing the act of applying a weighting factor to the first indicator.
6. The process of claim 5, wherein the act of aggregating the computed first indicator spatially comprises the act of employing a spatial convolution process.
7. The process of claim 3, further comprising, following the act of assigning the motion or depth value associated with the lowest cost factor as the initial estimate, performing the act of refining the initial estimate for the chosen pixel via a fractional motion or depth estimation process.
8. The process of claim 3, wherein the act of identifying the desired characteristic comprises the act of identifying the intensity exhibited by the matching pixel.
9. The process of claim 3, wherein the act of identifying the desired characteristic comprises the act of identifying the color exhibited by the matching pixel.
10. The process of claim 3, wherein the act of computing the indictor comprises the act of employing a robust penalty function.
11. The process of claim 10, wherein the robust penalty function is based on contaminated Gaussian distribution.
12. The process of claim 10, wherein the robust penalty function is generalized to account for global bias and gain changes between the selected keyframe and the chosen neighboring image.
13. The process of claim 3, wherein the act of generating a series of candidate motion or depth values comprises the act of employing a correlation-style search process.
14. The process of claim 3, further comprising the act of aggregating the computed first indicator spatially prior to performing the act of applying a weighting factor to the first indicator.
15. The process of claim 14, wherein the act of aggregating the computed first indicator spatially comprises the act of employing a spatial convolution process.
16. The process of claim 2, further comprising the act of refining the initial estimate of the motion or depth value for a pixel of a keyframe via a multi-resolution estimation procedure, said multi-resolution estimation procedure comprising the acts of:
- creating a multi-resolution pyramid from each image of the 3D scene;
  
  computing an estimate of the motion or depth value for each pixel of a lowest resolution level of each keyframe;
  
  for each keyframe at a next higher resolution level, modifying the estimates of the motion or depth values computed for the keyframe at the next lower resolution level to compensate for the increase in resolution in the current keyframe resolution level, computing an estimate of the motion or depth value for each pixel of the keyframe at its current resolution level using the modified estimates as initializing values;
  
  repeating the modifying and second computing acts for each keyframe at a prescribed number of next higher resolution levels.
17. The process of claim 16, wherein the last resolution level of the prescribed number of resolution levels corresponds to the highest resolution level of the multi-resolution pyramid for each keyframe.
18. The process of claim 2, further comprising the act of refining the initial estimate of the motion or depth value for a pixel of a keyframe via an iterative procedure, said iterative procedure comprising the acts of:
- assigning a number of iterations to be completed to produce the refined estimate of the motion or depth value for the pixel of the keyframe;
  
  for the first iteration, computing a new estimate of the motion or depth value associated with the keyframe pixel using the previously computed initial estimate as an initializing value;
  
  for each subsequent iteration, if any, up to the assigned number, computing a new estimate of the motion or depth value associated with the keyframe pixel using the estimate of the motion or depth value computed in the last preceding iteration as an initializing value; and
  
  assigning the last computed motion or depth value as the refined initial estimate for the pixel of the keyframe.
19. The process of claim 2, wherein the act of computing the final estimates for the pixels of a keyframe comprises the acts of:
- identifying one or more images which are adjacent in time or viewpoint to the keyframe and designating each of said images as a neighboring image;
  
  generating a series of candidate motion or depth values for the pixel of the keyframe using the previously computed initial estimate of the motion or depth value for the keyframe pixel as a baseline value;
  
  for each candidate motion or depth value, starting with the previously computed initial estimate of the motion or depth value for the keyframe pixel, computing an indictor for each neighboring image indicative of the difference between a desired characteristic exhibited by a pixel in the neighboring image which corresponds to a pixel of the keyframe and that exhibited by the keyframe'"'"'s pixel, weighting the difference indicator for each neighboring image based on the degree to which the neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the keyframe to produce a weighted indicator, summing the weighted indicators associated with the neighboring images to produce a cost factor;
  
  identifying the lowest cost factor for each pixel of the keyframe among those produced with each candidate motion or depth value; and
  
  assigning the candidate motion or depth value corresponding to the lowest cost factor as the final estimate of the motion or depth value for the associated pixel of the keyframe.
20. The process of claim 19, further comprising, performing the acts of:
- for each neighboring image whose pixels lack a previously estimated motion or depth value, estimating the motion or depth value of a pixel in each of the neighboring images that correspond to a keyframe pixel based on the previously computed estimate of the motion or depth for the keyframe pixel;
  
  determining for each neighboring image whether each keyframe pixel is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said keyframe pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a keyframe pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the pixel of interest for which the motion or depth value is being estimated to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.
21. The process of claim 2, wherein the act of computing the final estimates for the pixels of a keyframe comprises the acts of:
- identifying one or more images which are adjacent in time or viewpoint to the keyframe and designating each of said images as a neighboring image;
  
  generating a series of candidate motion or depth values for the pixel of the keyframe using the previously computed initial estimate of the motion or depth value for the keyframe pixel as a baseline value;
  
  for each candidate motion or depth value, starting with the previously computed initial estimate of the motion or depth value for the keyframe pixel, for each neighboring image whose pixels lack a previously estimated motion or depth value, estimating the motion or depth value of a pixel in each of the neighboring images that correspond to a keyframe pixel based on the previously computed estimate of the motion or depth for the keyframe pixel, computing a first indictor for each neighboring image indicative of the difference between a desired characteristic exhibited by a pixel in the neighboring image which corresponds to a pixel of the keyframe and that exhibited by the keyframe'"'"'s pixel, computing a second indictor for each neighboring image indicative of the difference between the motion or depth value previously estimated for a keyframe pixel and that of its corresponding pixel in the neighboring image, adding the first indicator and the second indicator associated with each keyframe pixel, respectively, for each neighboring image to produce a combined indicator, weighting the combined indicator for each neighboring image based on the degree to which the neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the keyframe to produce a combined weighted indicator, summing the combined weighted indicators associated with the neighboring images to produce a cost factor, identifying the lowest cost factor for each pixel of the keyframe among those produced using each candidate motion or depth value; and
  
  assigning the candidate motion or depth value corresponding to the lowest cost factor as the final estimate of the motion or depth value for the associated keyframe pixel.
22. The process of claim 21, further comprising, performing the acts of:
- determining for each neighboring image whether each keyframe pixel is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said keyframe pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a keyframe pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the pixel of interest for which the motion or depth value is being estimated to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.
23. The process of claim 2, wherein the act of computing the final estimates for the pixels of a keyframe comprises the acts of:
- identifying one or more images which are adjacent in time or viewpoint to the keyframe and designating each of said images as a neighboring image;
  
  identifying a group of pixels in the keyframe which are physically adjacent to the pixel for which the final estimate is being computed and designating said pixels as neighboring pixels;
  
  generating a series of candidate motion or depth values for the pixel of the keyframe using the previously computed initial estimate of the motion or depth value for the keyframe pixel as a baseline value;
  
  for each candidate motion or depth value, starting with the previously computed initial estimate of the motion or depth value for the keyframe pixel, for each neighboring image whose pixels lack a previously estimated motion or depth value, estimating the motion or depth value of a pixel in each of the neighboring images that correspond to a keyframe pixel based on the previously computed estimate of the motion or depth for the keyframe pixel, computing a first indictor for each neighboring image indicative of the difference between a desired characteristic exhibited by a pixel in the neighboring image which corresponds to a pixel of the keyframe and that exhibited by the keyframe'"'"'s pixel, computing a second indictor for each neighboring image indicative of the difference between the motion or depth value previously estimated for a keyframe pixel and that of its corresponding pixel in the neighboring image, adding the first indicator and the second indicator associated with each keyframe pixel, respectively, for each neighboring image to produce a combined indicator, weighting the combined indicator for each neighboring image based on the degree to which the neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the keyframe to produce a combined weighted indicator, summing the combined weighted indicators associated with the neighboring images to produce a first cost factor, computing a third indictor for each neighboring pixel indicative of the difference between the candidate motion or depth value currently associated with a pixel of the keyframe for which the final estimate is being computed and a previously assigned motion or depth value of the neighboring pixel, summing the computed third indicators associated with the neighboring pixels to produce a second cost factor, adding the first and second cost factors for each pixel of the keyframe, respectively, to produce a combined cost for each keyframe pixel;
  
  identifying the lowest combined cost for each pixel of the keyframe among those produced using each candidate motion or depth value; and
  
  assigning the candidate motion or depth value corresponding to the lowest cost as the final estimate of the motion or depth value for the associated keyframe pixel.
24. The process of claim 23 further comprising, performing the acts of:
- determining for each neighboring image whether each keyframe pixel is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said keyframe pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a keyframe pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the pixel of interest for which the motion or depth value is being estimated to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.
25. The process of claim 23, wherein the act of identifying the desired characteristic comprises the act of identifying the intensity exhibited by the matching pixel.
26. The process of claim 23, wherein the act of identifying the desired characteristic comprises the act of identifying the color exhibited by the matching pixel.
27. The process of claim 23, wherein the act of computing the first indictor comprises the act of employing a robust penalty function.
28. The process of claim 27, wherein the robust penalty function is based on contaminated Gaussian distribution.
29. The process of claim 27, wherein the robust penalty function is generalized to account for global bias and gain changes between the selected keyframe and the chosen neighboring image.
30. The process of claim 23, wherein the act of generating a series of candidate motion or depth values comprises the act of employing a correlation style search process.
31. The process of claim 23, further comprising the act of aggregating the combined indicator spatially prior to performing the act of weighting the combined indicator.
32. The process of claim 31, wherein the act of aggregating the computed first indicator spatially comprises the act of employing a spatial convolution process.
33. The process of claim 23, further comprising, following the act of assigning the candidate motion or depth value corresponding to the lowest cost as the final estimate, performing the act of refining the final estimate for the keyframe pixel via a fractional motion or depth estimation process.
34. The process of claim 2, further comprising the act of refining the final estimate of the motion or depth value for a pixel of a keyframe via a multi-resolution estimation procedure, said multi-resolution estimation procedure comprising the acts of:
- creating a multi-resolution pyramid from each image of the 3D scene;
  
  computing an estimate of the motion or depth value for each pixel of a lowest resolution level of each keyframe;
  
  for each keyframe at a next higher resolution level, modifying the estimates of the motion or depth values computed for the keyframe at the next lower resolution level to compensate for the increase in resolution in the current keyframe resolution level, computing an estimate of the motion or depth value for each pixel of the keyframe at its current resolution level using the modified estimates as initializing values;
  
  repeating the modifying and second computing acts for each keyframe at a prescribed number of next higher resolution levels.
35. The process of claim 34, wherein the last resolution level of the prescribed number of resolution levels corresponds to the highest resolution level of the multi-resolution pyramid for each keyframe.
36. The process of claim 2, further comprising the act of refining the final estimate of the motion or depth value for a pixel of a keyframe via an iterative procedure, said iterative procedure comprising the acts of:
- assigning a number of iterations to be completed to produce the refined estimate of the motion or depth value for the pixel of the keyframe;
  
  for the first iteration, computing a new estimate of the motion or depth value associated with the keyframe pixel using the previously computed initial estimate as an initializing value;
  
  for each subsequent iteration, if any, up to the assigned number, computing a new estimate of the motion or depth value associated with the keyframe pixel using the estimate of the motion or depth value computed in the last preceding iteration as an initializing value; and
  
  assigning the last computed motion or depth value as the refined final estimate for the pixel of the keyframe.

37. A system for estimating motion or depth values for multiple images of a 3D scene, comprising:
- a general purpose computing device; and
  
  a computer program comprising program modules executable by the computing device, wherein the computing device is directed by the program modules of the computer program to, input the multiple images of the 3D scene, select at least two images from the multiple images, hereafter referred to as keyframes, create a motion or depth map for each keyframe by estimating a motion or depth value for each pixel of each keyframe using motion or depth information from images neighboring the keyframe in viewpoint or time.
- View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70)
- - 38. The system of claim 37, wherein the estimating program module comprises sub-modules for:
39. The system of claim 38, wherein the sub-module for computing the initial estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe, (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values;
  
  (e) choosing one of the neighboring images;
  
  (f) for each pixel of the selected keyframe, computing the location of a matching pixel in the chosen neighboring image which corresponds to a pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (i) for each pixel of the selected keyframe, applying a weighting factor to the first indicator to produce a weighted first indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (j) repeating sub-modules (e) through (i) for each of the remaining neighboring images;
  
  (k) for each pixel of the selected keyframe, summing the weighted first indicators associated with the neighboring images to produce a first cost factor for each keyframe pixel;
  
  (l) repeating sub-modules (d) through (k) for each of the remaining candidate motion or depth values;
  
  (m) identifying the lowest first cost factor for each pixel of the selected keyframe; and
  
  (n) respectively assigning the motion or depth value associated with each lowest first cost factor as the initial estimate of the motion or depth value for the associated pixel of the selected keyframe.
40. The system of claim 39, wherein the series of candidate motion or depth values includes zero (0) as a baseline value.
41. The system of claim 39, further comprising a sub-module for aggregating the computed first indicator spatially prior to performing the sub-module (j).
42. The system of claim 41, wherein the sub-module for aggregating the computed first indicator spatially comprises the sub-module for employing a spatial convolution process.
43. The system of claim 39, further comprising, following sub-module (n), performing a sub-module for refining the initial estimate of the motion or depth value for the chosen pixel via a fractional motion or depth estimation process.
44. The system of claim 39, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the intensity exhibited by the matching pixel.
45. The system of claim 39, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the color exhibited by the matching pixel.
46. The system of claim 39, wherein the sub-module for computing the first indictor comprises a sub-module for employing a robust penalty function.
47. The system of claim 46, wherein the robust penalty function is based on contaminated Gaussian distribution.
48. The system of claim 46, wherein the robust penalty function is generalized to account for global bias and gain changes between the selected keyframe and the chosen neighboring image.
49. The system of claim 39, wherein the sub-module for generating a series of candidate motion or depth values comprises a sub-module for employing a correlation-style search process.
50. The system of claim 38, further comprising a sub-module for refining the initial estimate of the motion or depth value for a pixel of a keyframe via a multi-resolution estimation procedure, said multi-resolution estimation procedure comprising sub-modules for:
- creating a multi-resolution pyramid from each image of the 3D scene;
  
  computing an estimate of the motion or depth value for each pixel of a lowest resolution level of each keyframe;
  
  for each keyframe at a next higher resolution level, modifying the estimates of the motion or depth values computed for the keyframe at the next lower resolution level to compensate for the increase in resolution in the current keyframe resolution level, computing an estimate of the motion or depth value for each pixel of the keyframe at its current resolution level using the modified estimates as initializing values;
  
  repeating the modifying and second computing sub-modules each keyframe at a prescribed number of next higher resolution levels.
51. The system of claim 50, wherein the last resolution level of the prescribed number of resolution levels corresponds to the highest resolution level of the multi-resolution pyramid for each keyframe.
52. The system of claim 38, further comprising a sub-module for refining the initial estimate of the motion or depth value for a pixel of a keyframe via an iterative procedure, said iterative procedure comprising sub-modules for:
- assigning a number of iterations to be completed to produce the refined estimate of the motion or depth value for the pixel of the keyframe;
  
  for the first iteration, computing a new estimate of the motion or depth value associated with the keyframe pixel using the previously computed initial estimate as an initializing value;
  
  for each subsequent iteration, if any, up to the assigned number computing a new estimate of the motion or depth value associated with the keyframe pixel using the estimate of the motion or depth value computed in the last preceding iteration as an initializing value; and
  
  assigning the last computed motion or depth value as the refined initial estimate for the pixel of the keyframe.
53. The system of claim 38, wherein the sub-module for computing the final estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe;
  
  (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values;
  
  (e) choosing one of the neighboring images;
  
  (f) for each pixel of the selected keyframe, computing the location of a matching pixel in the chosen neighboring image which corresponds to a pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (i) for each pixel of the selected keyframe, applying a weighting factor to the first indicator to produce a weighted first indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (j) repeating sub-modules (e) through (i) for each of the remaining neighboring images;
  
  (k) for each pixel of the selected keyframe, summing the weighted first indicators associated with the neighboring images to produce a first cost factor for each keyframe pixel;
  
  (l) repeating sub-modules (d) through (k) for each of the remaining candidate motion or depth values;
  
  (m) identifying the lowest first cost factor for each pixel of the selected keyframe; and
  
  (n) respectively assigning the motion or depth value associated with each lowest first cost factor as the initial estimate of the motion or depth value for the associated pixel of the selected keyframe.
54. The system of claim 53 further comprising, performing sub-modules for:
- whenever the selected neighboring image pixels lack a previously estimated motion or depth value, computing the motion or depth value of a pixel in each of the neighboring images that correspond to a pixel of the selected keyframe based on the previously computed estimate of the motion or depth for the chosen pixel;
  
  determining for each neighboring image whether each pixel of the selected keyframe is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said chosen pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the chosen pixel to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.
55. The system of claim 38, wherein the sub-module for computing the final estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe using the previously computed initial estimate of the motion or depth value for each pixel as a baseline value;
  
  (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values starting with the previously computed initial estimate of the motion or depth value for the chosen pixel;
  
  (e) choosing one of the neighboring images;
  
  (f) respectively computing the location of a matching pixel in the chosen neighboring image which corresponds to each pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) whenever the selected neighboring image pixels lack a previously estimated motion or depth value, computing the motion or depth value of a pixel in each of the neighboring images that correspond to a pixel of the selected keyframe based on the previously computed estimate of the motion or depth for the chosen pixel;
  
  (i) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (j) for each pixel of the selected keyframe, computing a second indictor indicative of the difference between the motion or depth value previously estimated for a pixel of the selected keyframe and that of its corresponding pixel in the neighboring image;
  
  (k) adding the first indicator and the second indicator associated with each keyframe pixel respectively to produce a combined indicator;
  
  (l) for each pixel of the selected keyframe, applying a weighting factor to the combined indicator to produce a combined weighted indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (m) repeating sub-module (e) through (l) for each of the remaining neighboring images;
  
  (n) for each pixel of the selected keyframe, summing the combined weighted indicators associated with the neighboring images to produce a cost factor for each keyframe pixel;
  
  (o) repeating sub-modules (d) through (n) for each of the remaining candidate motion or depth values;
  
  (p) identifying the lowest cost factor for each pixel of the selected keyframe among those produced using each candidate motion or depth value; and
  
  (q) respectively assigning the motion or depth value associated with each lowest cost factor as the final estimate of the motion or depth value for the associated pixel of the selected keyframe.
56. The system of claim 55 further comprising, performing sub-modules for:
- determining for each neighboring image whether each pixel of the selected keyframe is visible in the neighboring image by comparing the-similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said chosen pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the chosen pixel to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.
57. The system of claim 55, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the intensity exhibited by the matching pixel.
58. The system of claim 55, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the color exhibited by the matching pixel.
59. The system of claim 55, wherein the sub-module for computing the first indictor comprises a sub-module for employing a robust penalty function.
60. The system of claim 59, wherein the robust penalty function is based on contaminated Gaussian distribution.
61. The system of claim 59, wherein the robust penalty function is generalized to account for global bias and gain changes between the selected keyframe and the chosen neighboring image.
62. The system of claim 55, wherein the sub-module for generating a series of candidate motion or depth values comprises a sub-module for employing a correlation-style search process.
63. The system of claim 55, further comprising a sub-module for aggregating the combined indicator spatially prior to performing sub-module (m).
64. The system of claim 63, wherein the sub-module for aggregating the computed first indicator spatially comprises a sub-module for employing a spatial convolution process.
65. The system of claim 55, further comprising, following sub-module (s), performing a sub-module for refining the initial estimate of the motion or depth value for the chosen pixel via a fractional motion or depth estimation process.
66. The system of claim 38, further comprising a sub-module for refining the final estimate of the motion or depth value for a pixel of a keyframe via a multi-resolution estimation procedure, said multi-resolution estimation procedure comprising the sub-modules for:
- creating a multi-resolution pyramid from each image of the 3D scene;
  
  computing an estimate of the motion or depth value for each pixel of a lowest resolution level of each keyframe;
  
  for each keyframe at a next higher resolution level, modifying the estimates of the motion or depth values computed for the keyframe at the next lower resolution level to compensate for the increase in resolution in the current keyframe resolution level, computing an estimate of the motion or depth value for each pixel of the keyframe at its current resolution level using the modified estimates as initializing values;
  
  repeating the modifying and second computing sub-modules for each keyframe at a prescribed number of next higher resolution levels.
67. The system of claim 66, wherein the last resolution level of the prescribed number of resolution levels corresponds to the highest resolution level of the multi-resolution pyramid for each keyframe.
68. The system of claim 38, further comprising a sub-module for refining the final estimate of the motion or depth value for a pixel of a keyframe via an iterative procedure, said iterative procedure comprising sub-modules for:
- assigning a number of iterations to be completed to produce the refined estimate of the motion or depth value for the pixel of the keyframe;
  
  for the first iteration, computing a new estimate of the motion or depth value associated with the keyframe pixel using the previously computed initial estimate as an initializing value;
  
  for each subsequent iteration, if any, up to the assigned number, computing a new estimate of the motion or depth value associated with the keyframe pixel using the estimate of the motion or depth value computed in the last preceding iteration as an initializing value; and
  
  assigning the last computed motion or depth value as the refined final estimate for the pixel of the keyframe.
69. The system of claim 38, wherein the sub-module for computing the final estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe using the previously computed initial estimate of the motion or depth value for each pixel as a baseline value;
  
  (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values starting with the previously computed initial estimate of the motion or depth value for the chosen pixel;
  
  (e) choosing one of the neighboring images;
  
  (f) respectively computing the location of a matching pixel in the chosen neighboring image which corresponds to each pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) whenever the selected neighboring image pixels lack a previously estimated motion or depth value, computing the motion or depth value of a pixel in each of the neighboring images that correspond to a pixel of the selected keyframe based on the previously computed estimate of the motion or depth for the chosen pixel;
  
  (i) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (j) for each pixel of the selected keyframe, computing a second indictor indicative of the difference between the motion or depth value previously estimated for a pixel of the selected keyframe and that of its corresponding pixel in the neighboring image;
  
  (k) adding the first indicator and the second indicator associated with each keyframe pixel, respectively, to produce a combined indicator, (l) for each pixel of the selected keyframe, applying a weighting factor to the combined indicator to produce a combined weighted indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (m) repeating sub-modules (e) through (l) for each of the remaining neighboring images;
  
  (n) for each pixel of the selected keyframe, summing the combined weighted indicators associated with the neighboring images to produce a first cost factor for each keyframe pixel;
  
  (o) choosing a previously unselected pixel of the selected keyframe;
  
  (p) identifying a group of pixels in the selected keyframe which are physically adjacent to the chosen pixel and designating said pixels as neighboring pixels;
  
  (q) choosing one of the neighboring pixels;
  
  (r) computing a third indictor indicative of the difference between the chosen candidate motion or depth value of the selected keyframe pixel and a previously assigned motion or depth value of the chosen neighboring pixel;
  
  (s) repeating sub-modules (q) and (r) for each of the remaining neighboring pixels;
  
  (t) summing the computed third indicators associated with the neighboring pixels to produce a second cost factor;
  
  (u) adding the second cost factor associated with the chosen pixel of the selected keyframe to the pixel'"'"'s first cost factor to produce a combined cost;
  
  (v) repeating sub-modules (o) through (u) for each of the remaining pixels of the keyframe image;
  
  (w) repeating sub-modules (d) through (v) for each of the remaining candidate motion or depth values;
  
  (x) identifying the lowest combined cost for each pixel of the selected keyframe among those produced using each candidate motion or depth value; and
  
  (y) respectively assigning the motion or depth value associated with each lowest combined cost as the final estimate of the motion or depth value for the associated pixel of the selected keyframe.
70. The system of claim 69 further comprising, performing sub-modules for:
- determining for each neighboring image whether each pixel of the selected keyframe is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said chosen pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the chosen pixel to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.

71. A computer-readable memory for estimating motion or depth values for multiple images of a 3D scene, comprising:
- a computer-readable storage medium; and
  
  a computer program comprising program modules stored in the storage medium, wherein the storage medium is so configured by the compute r program that it causes a computer to, input the multiple images of the 3D scene, select at least two images from the multiple images, hereafter referred to as keyframes, create a motion or depth map for each keyframe by estimating a motion or depth value for each pixel of each keyframe using motion or depth information from images neighboring the keyframe in viewpoint or time.
- View Dependent Claims (72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104)
- - 72. The computer-readable memory of claim 71, wherein the estimating program module comprises sub-modules for:
73. The computer-readable memory of claim 72, wherein the sub-module for computing the initial estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe;
  
  (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values;
  
  (e) choosing one of the neighboring images;
  
  (f) for each pixel of the selected keyframe, computing the location of a matching pixel in the chosen neighboring image which corresponds to a pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (i) for each pixel of the selected keyframe, applying a weighting factor to the first indicator to produce a weighted first indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (j) repeating sub-modules (e) through (i) for each of the remaining neighboring images;
  
  (k) for each pixel of the selected keyframe, summing the weighted first indicators associated with the neighboring images to produce a first cost factor for each keyframe pixel;
  
  (l) repeating sub-modules (d) through (k) for each of the remaining candidate motion or depth values;
  
  (m) identifying the lowest first cost factor for each pixel of the selected keyframe; and
  
  (n) respectively assigning the motion or depth value associated with each lowest first cost factor as the initial estimate of the motion or depth value for the associated pixel of the selected keyframe.
74. The computer-readable memory of claim 73, wherein the series of candidate motion or depth values includes zero (0) as a baseline value.
75. The computer-readable memory of claim 73, further comprising a sub-module for aggregating the computed first indicator spatially prior to performing the sub-module (j).
76. The computer-readable memory of claim 75, wherein the sub-module for aggregating the computed first indicator spatially comprises the sub-module for employing a spatial convolution process.
77. The computer-readable memory of claim 73, further comprising, following sub-module (p), performing a sub-module for refining the initial estimate of the motion or depth value for the chosen pixel via a fractional motion or depth estimation process.
78. The computer-readable memory of claim 73, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the intensity exhibited by the matching pixel.
79. The computer-readable memory of claim 73, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the color exhibited by the matching pixel.
80. The computer-readable memory of claim 73, wherein the sub-module for computing the first indictor comprises a sub-module for employing a robust penalty function.
81. The computer-readable memory of claim 80, wherein the robust penalty function is based on contaminated Gaussian distribution.
82. The computer-readable memory of claim 80, wherein the robust penalty function is generalized to account for global bias and gain changes between the selected keyframe and the chosen neighboring image.
83. The computer-readable memory of claim 73, wherein the sub-module for generating a series of candidate motion or depth values comprises a sub-module for employing a correlation-style search process.
84. The computer-readable memory of claim 72, further comprising a sub-module for refining the initial estimate of the motion or depth value for a pixel of a keyframe via a multi-resolution estimation procedure, said multi-resolution estimation procedure comprising sub-modules for:
- creating a multi-resolution pyramid from each image of the 3D scene;
  
  computing an estimate of the motion or depth value for each pixel of a lowest resolution level of each keyframe;
  
  for each keyframe at a next higher resolution level, modifying the estimates of the motion or depth values computed for the keyframe at the next lower resolution level to compensate for the increase in resolution in the current keyframe resolution level, computing an estimate of the motion or depth value for each pixel of the keyframe at its current resolution level using the modified estimates as initializing values;
  
  repeating the modifying and second computing sub-modules each keyframe at a prescribed number of next higher resolution levels.
85. The computer-readable memory of claim 84, wherein the last resolution level of the prescribed number of resolution levels corresponds to the highest resolution level of the multi-resolution pyramid for each keyframe.
86. The computer-readable memory of claim 72, further comprising a sub-module for refining the initial estimate of the motion or depth value for a pixel of a keyframe via an iterative procedure, said iterative procedure comprising sub-modules for:
- assigning a number of iterations to be completed to produce the refined estimate of the motion or depth value for the pixel of the keyframe;
  
  for the first iteration, computing a new estimate of the motion or depth value associated with the keyframe pixel using the previously computed initial estimate as an initializing value;
  
  for each subsequent iteration, if any, up to the assigned number, computing a new estimate of the motion or depth value associated with the keyframe pixel using the estimate of the motion or depth value computed in the last preceding iteration as an initializing value; and
  
  assigning the last computed motion or depth value as the refined initial estimate for the pixel of the keyframe.
87. The computer-readable memory of claim 72, wherein the sub-module for computing the final estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe;
  
  (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values;
  
  (e) choosing one of the neighboring images;
  
  (f) for each pixel of the selected keyframe, computing the location of a matching pixel in the chosen neighboring image which corresponds to a pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (i) for each pixel of the selected keyframe, applying a weighting factor to the first indicator to produce a weighted first indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (j) repeating sub-modules (e) through (i) for each of the remaining neighboring images;
  
  (k) for each pixel of the selected keyframe, summing the weighted first indicators associated with the neighboring images to produce a first cost factor for each keyframe pixel;
  
  (l) repeating sub-modules (d) through (k) for each of the remaining candidate motion or depth values;
  
  (m) identifying the lowest first cost factor for each pixel of the selected keyframe; and
  
  (n) respectively assigning the motion or depth value associated with each lowest first cost factor as the initial estimate of the motion or depth value for the associated pixel of the selected keyframe.
88. The computer-readable memory of claim 87 further comprising, performing sub-modules for:
- whenever the selected neighboring image pixels lack a previously estimated motion or depth value, estimating the motion or depth value of a pixel in each of the neighboring images that correspond to a pixel of the selected keyframe based on the previously computed estimate of the motion or depth for the chosen pixel;
  
  determining for each neighboring image whether each pixel of the selected keyframe is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said chosen pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the chosen pixel to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.
89. The computer-readable memory of claim 87, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the intensity exhibited by the matching pixel.
90. The computer-readable memory of claim 87, wherein the sub-module for identifying the desired characteristic comprises a sub-module for identifying the color exhibited by the matching pixel.
91. The computer-readable memory of claim 87, wherein the sub-module for computing the first indictor comprises a sub-module for employing a robust penalty function.
92. The computer-readable memory of claim 91, wherein the robust penalty function is based on contaminated Gaussian distribution.
93. The computer-readable memory of claim 91, wherein the robust penalty function is generalized to account for global bias and gain changes between the selected keyframe and the chosen neighboring image.
94. The computer-readable memory of claim 87, wherein the sub-module for generating a series of candidate motion or depth values comprises a sub-module for employing a correlation-style search process.
95. The computer-readable memory of claim 87, further comprising a sub-module for aggregating the combined indicator spatially prior to performing sub-module (j).
96. The computer-readable memory of claim 95, wherein the sub-module for aggregating the computed first indicator spatially comprises a sub-module for employing a spatial convolution process.
97. The computer-readable memory of claim 87, further comprising, following sub-module (n), performing a sub-module for refining the final estimate of the motion or depth value for the chosen pixel via a fractional motion or depth estimation process.
98. The computer-readable memory of claim 72, further comprising a sub-module for refining the final estimate of the motion or depth value for a pixel of a keyframe via a multi-resolution estimation procedure, said multi-resolution estimation procedure comprising the sub-modules for:
- creating a multi-resolution pyramid from each image of the 3D scene;
  
  computing an estimate of the motion or depth value for each pixel of a lowest resolution level of each keyframe;
  
  for each keyframe at a next higher resolution level, modifying the estimates of the motion or depth values computed for the keyframe at the next lower resolution level to compensate for the increase in resolution in the current keyframe resolution level, computing an estimate of the motion or depth value for each pixel of the keyframe at its current resolution level using the modified estimates as initializing values;
  
  repeating the modifying and second computing sub-modules for each keyframe at a prescribed number of next higher resolution levels.
99. The computer-readable memory of claim 98, wherein the last resolution level of the prescribed number of resolution levels corresponds to the highest resolution level of the multi-resolution pyramid for each keyframe.
100. The computer-readable memory of claim 72, further comprising a sub-module for refining the final estimate of the motion or depth value for a pixel of a keyframe via an iterative procedure, said iterative procedure comprising sub-modules for:
- assigning a number of iterations to be completed to produce the refined estimate of the motion or depth value for the pixel of the keyframe;
  
  for the first iteration, computing a new estimate of the motion or depth value associated with the keyframe pixel using the previously computed initial estimate as an initializing value;
  
  for each subsequent iteration, if any, up to the assigned number, computing a new estimate of the motion or depth value associated with the keyframe pixel using the estimate of the motion or depth value computed in the last preceding iteration as an initializing value; and
  
  assigning the last computed motion or depth value as the refined final estimate for the pixel of the keyframe.
101. The computer-readable memory of claim 72, wherein the sub-module for computing the final estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe using the previously computed initial estimate of the motion or depth value for each pixel as a baseline value;
  
  (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values starting with the previously computed initial estimate of the motion or depth value for the chosen pixel;
  
  (e) choosing one of the neighboring images;
  
  (f) respectively computing the location of a matching pixel in the chosen neighboring image which corresponds to each pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) whenever the selected neighboring image pixels lack a previously estimated motion or depth value, computing the motion or depth value of a pixel in each of the neighboring images that correspond to a pixel of the selected keyframe based on the previously computed estimate of the motion or depth for the chosen pixel;
  
  (i) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (j) for each pixel of the selected keyframe, computing a second indictor indicative of the difference between the motion or depth value previously estimated for a pixel of the selected keyframe and that of its corresponding pixel in the neighboring image;
  
  (k) adding the first indicator and the second indicator associated with each keyframe pixel respectively to produce a combined indicator;
  
  (I) for each pixel of the selected keyframe, applying a weighting factor to the combined indicator to produce a combined weighted indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (m) repeating sub-module (e) through (l) for each of the remaining neighboring images;
  
  (n) for each pixel of the selected keyframe, summing the combined weighted indicators associated with the neighboring images to produce a cost factor for each keyframe pixel;
  
  (o) repeating sub-modules (d) through (n) for each of the remaining candidate motion or depth values;
  
  (p) identifying the lowest cost factor for each pixel of the selected keyframe among those produced using each candidate motion or depth value; and
  
  (q) respectively assigning the motion or depth value associated with each lowest cost factor as the final estimate of the motion or depth value for the associated pixel of the selected keyframe.
102. The computer-readable memory of claim 101 further comprising, performing sub-modules for:
- determining for each neighboring image whether each pixel of the selected keyframe is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said chosen pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the chosen pixel to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.
103. The computer-readable memory of claim 72, wherein the sub-module for computing the final estimates for the pixels of a keyframe comprises sub-modules for:
- (a) selecting one of the keyframes;
  
  (b) generating a series of candidate motion or depth values for each pixel of the selected keyframe using the previously computed initial estimate of the motion or depth value for each pixel as a baseline value;
  
  (c) identifying one or more images which are adjacent in time or viewpoint to the selected keyframe and designating each of said images as a neighboring image;
  
  (d) choosing one of the candidate motion or depth values starting with the previously computed initial estimate of the motion or depth value for the chosen pixel;
  
  (e) choosing one of the neighboring images;
  
  (f) respectively computing the location of a matching pixel in the chosen neighboring image which corresponds to each pixel of the selected image;
  
  (g) identifying a desired characteristic exhibited by each matching pixel;
  
  (h) whenever the selected neighboring image pixels lack a previously estimated motion or depth value, computing the motion or depth value of a pixel in each of the neighboring images that correspond to a pixel of the selected keyframe based on the previously computed estimate of the motion or depth for the chosen pixel;
  
  (i) for each pixel of the selected keyframe, computing a first indictor indicative of the difference between said desired characteristic exhibited by a matching pixel and that exhibited by the corresponding pixel of the selected keyframe;
  
  (j) for each pixel of the selected keyframe, computing a second indictor indicative of the difference between the motion or depth value previously estimated for a pixel of the selected keyframe and that of its corresponding pixel in the neighboring image;
  
  (k) adding the first indicator and the second indicator associated with each keyframe pixel, respectively, to produce a combined indicator, (l) for each pixel of the selected keyframe, applying a weighting factor to the combined indicator to produce a combined weighted indicator, said weighting factor dictating the degree to which the chosen neighboring image will contribute to the estimation of the motion or depth values associated with the pixels of the selected keyframe;
  
  (m) repeating sub-modules (e) through (l) for each of the remaining neighboring images;
  
  (n) for each pixel of the selected keyframe, summing the combined weighted indicators associated with the neighboring images to produce a first cost factor for each keyframe pixel;
  
  (o) choosing a previously unselected pixel of the selected keyframe;
  
  (p) identifying a group of pixels in the selected keyframe which are physically adjacent to the chosen pixel and designating said pixels as neighboring pixels;
  
  (q) choosing one of the neighboring pixels;
  
  (r) computing a third indictor indicative of the difference between the chosen candidate motion or depth value of the selected keyframe pixel and a previously assigned motion or depth value of the chosen neighboring pixel;
  
  (s) repeating sub-modules (q) and (r) for each of the remaining neighboring pixels;
  
  (t) summing the computed third indicators associated with the neighboring pixels to produce a second cost factor;
  
  (u) adding the second cost factor associated with the chosen pixel of the selected keyframe to the pixel'"'"'s first cost factor to produce a combined cost;
  
  (v) repeating sub-modules (o) through (u) for each of the remaining pixels of the keyframe image;
  
  (w) repeating sub-modules (d) through (v) for each of the remaining candidate motion or depth values;
  
  (x) identifying the lowest combined cost for each pixel of the selected keyframe among those produced using each candidate motion or depth value; and
  
  (y) respectively assigning the motion or depth value associated with each lowest combined cost as the final estimate of the motion or depth value for the associated pixel of the selected keyframe.
104. The computer-readable memory of claim 103 further comprising, performing sub-modules for:
- determining for each neighboring image whether each pixel of the selected keyframe is visible in the neighboring image by comparing the similarity between the motion or depth value previously computed for a keyframe pixel and the motion or depth value associated with the corresponding pixel of the neighboring image, said chosen pixel being visible in the neighboring image if the compared motion or depth values are similar within a prescribed error threshold; and
  
  whenever it is determined that a pixel is not visible in a neighboring image, employing other keyframe pixels in the vicinity of the chosen pixel to derive any pixel characteristic needed in estimating the motion or depth value for the keyframe pixel of interest, rather than using the characteristic actually exhibited by the pixel determined not to be visible.

105. A computer-implemented process for estimating motion or depth values for multiple images of a 3D scene, comprising using a computer to perform the following acts:
- inputting the multiple images of a the 3D scene;
  
  selecting at least two images from the multiple images, hereafter referred to as keyframes;
  
  estimating a motion or depth value for each pixel of each keyframe by determining which values produce the minimum cost based on a three-part cost function comprising a pixel intensity compatibility term which characterizes the difference between the intensity exhibited by a pixel of a keyframe and that of a corresponding pixel in neighboring images, a motion or depth value compatibility term which characterizes the difference between the motion or depth estimate for a pixel of a keyframe and that of a corresponding pixel in neighboring images, and a flow smoothness term which characterizes the difference between the motion or depth estimate for a pixel of a keyframe and that of neighboring pixels in the same keyframe.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Szeliski, Richard S.
Primary Examiner(s)
Johns, Andrew W.
Assistant Examiner(s)
AZARIAN, SEYED H

Application Number

US09/334,857
Time in Patent Office

1,259 Days
Field of Search

382/167, 382/294, 382/154, 382/107, 382/284, 345/419, 345/502, 725/34
US Class Current

382/107
CPC Class Codes

G06T 2207/10012   Stereo images

G06T 2207/10021   Stereoscopic video; Stereos...

G06T 7/246   using feature-based methods...

G06V 10/10   Image acquisition document ...

G06V 2201/12   Acquisition of 3D measureme...

H04N 13/189   Recording image signals; Re...

H04N 2013/0081   Depth or disparity estimati...

H04N 2013/0085   Motion estimation from ster...

Multi-view approach to motion and stereo

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

105 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-view approach to motion and stereo

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

105 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links