Scene model generation from video for use in video processing

US 6,738,424 B1
Filed: 07/03/2000
Issued: 05/18/2004
Est. Priority Date: 12/27/1999
Status: Active Grant

First Claim

Patent Images

1. A method of generating and utilizing a three-dimensional cube-based scene model from a sequence of video frames, comprising the steps of:

separating background data from foreground data for each of said frames;

using an estimate of relative observer motion, projecting each frame onto a coordinate system used in generating said scene model;

merging the background data of the frame with the scene model;

detecting whether or not a current focal length in a frame has changed relative to a previous focal length of a previous frame, comprising the sub-steps of;

choosing a point on the right edge of a frame;

converting said point to cube coordinates;

choosing a corresponding point on the left edge of said frame, said corresponding point having a row coordinate in common with said point on the right edge of the frame;

converting said corresponding point to cube coordinates;

determining the size of an angle between vectors formed by connecting said points converted to cube coordinates with the origin of said cube coordinates; and

if said angle differs from that corresponding to said previous frame in an amount exceeding a particular value, determining that the focal length has changed; and

if said detecting step determines that the focal length has changed, creating a new scene model layer corresponding to the current focal length.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of generating and utilizing a scene model from a sequence of video frames produces a three-dimensional scene model, useful for video processing. The method separates foreground and background data. It uses an estimate of relative motion of an observer to project each frame onto a coordinate system of the three-dimensional scene model. It then merges the background data of a given frame into the scene model.

212 Citations

82 Claims

1. A method of generating and utilizing a three-dimensional cube-based scene model from a sequence of video frames, comprising the steps of:
- separating background data from foreground data for each of said frames;
  
  using an estimate of relative observer motion, projecting each frame onto a coordinate system used in generating said scene model;
  
  merging the background data of the frame with the scene model;
  
  detecting whether or not a current focal length in a frame has changed relative to a previous focal length of a previous frame, comprising the sub-steps of;
  
  choosing a point on the right edge of a frame;
  
  converting said point to cube coordinates;
  
  choosing a corresponding point on the left edge of said frame, said corresponding point having a row coordinate in common with said point on the right edge of the frame;
  
  converting said corresponding point to cube coordinates;
  
  determining the size of an angle between vectors formed by connecting said points converted to cube coordinates with the origin of said cube coordinates; and
  
  if said angle differs from that corresponding to said previous frame in an amount exceeding a particular value, determining that the focal length has changed; and
  
  if said detecting step determines that the focal length has changed, creating a new scene model layer corresponding to the current focal length.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19)
- - 2. The method according to claim 1, said step of merging comprising the step of:
3. The method according to claim 1, said step of merging comprising the step of:
- updating the scene model by combining those data points of said background data of the frame that differ from corresponding data points of said scene model with said corresponding points of said scene model, said combining comprising at least one of the following methods;
  
  averaging, replacing, and blending.
4. The method according to claim 1, further comprising the steps of:
- compressing said scene model to obtain compressed scene model data;
  
  combining said compressed scene model data with compressed foreground data to obtain combined compressed data; and
  
  transmitting said combined compressed data to a destination.
5. The method according to claim 4, further comprising the steps of:
- receiving said combined compressed data at said destination;
  
  separating said combined compressed data into received compressed scene model data and received compressed foreground data;
  
  decompressing said received compressed scene model data to obtain decompressed scene model data;
  
  decompressing said received compressed foreground data to obtain decompressed foreground data; and
  
  combining said decompressed scene model data with said decompressed foreground data to reconstruct at least one frame of said sequence of video frames.
6. The method according to claim 1, wherein said scene model is a spherical scene model.
7. The method according to claim 1, wherein said merging step comprises the step of:
- storing a row or column of data where two cube faces meet as part of the data associated with both faces.
8. The method according to claim 1, wherein said merging step comprises the step of:
- minimizing storage requirements for the scene model.
9. The method according to claim 8, said step of minimizing storage requirements comprising the following sub-steps:
- allocating memory only to portions of cube faces that are necessary; and
  
  allocating a row of column pointers.
10. The method according to claim 9, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer pans, setting appropriate ones of said column pointers to point to columns of data that need to be allocated as a result of the panning.
11. The method according to claim 9, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer tilts, setting appropriate ones of said column pointers to point to columns of data that need to be allocated and extending columns of data as needed.
12. The method according to claim 8, said step of minimizing storage requirements comprising the following sub-steps:
- allocating memory only to portions of cube faces that are necessary; and
  
  allocating a column of row pointers.
13. The method according to claim 12, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer pans, extending rows of data that need to be extended as a result of the panning.
14. The method according to claim 12, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer tilts, setting appropriate ones of said row pointers to point to rows of data that need to be allocated and extending rows of data as needed.
19. A computer-readable medium containing software implementing the method of claim 1.

15. A method of generating and utilizing a three-dimensional cube-based scene model from a sequence of video frames, comprising the steps of:
- separating background data from foreground data for each of said frames;
  
  using an estimate of relative observer motion, projecting each frame onto a coordinate system used in generating said scene model; and
  
  mapping each point on the edge of a frame, F, through M−
  
  1, the inverse of observer motion estimation matrix M, to convert said point to a point in scene model coordinates; and
  
  for each cube face of the scene model to which an edge point of F is mapped, performing the following sub-steps;
  
  finding a box bounding all edge points of F mapped to said cube face; and
  
  for each point, (x,y) within said box, performing the following sub-sub-steps;
  
  mapping a vector defined by (x,y,FL), where FL is a focal length of said observer, through M to convert the vector to an image coordinate (a,b);
  
  using interpolation to determine a pixel value for (a,b) in F; and
  
  placing said pixel value for (a,b) at (x,y).
- View Dependent Claims (16)
- - 16. The method according to claim 15, wherein said step of using interpolation comprises the step of using bilinear interpolation to determine a pixel value for (a,b) in F.

17. A method of generating and utilizing a three-dimensional cube-based scene model from a sequence of video frames, comprising the steps of:
- separating background data from foreground data for each of said frames;
  
  using an estimate of relative observer motion projecting each frame onto a coordinate system used in generating said scene model;
  
  merging the background data of the frame with the scene model;
  
  reprojecting said scene model to an image, comprising the sub-steps of;
  
  for each point (x,y) in said image, performing the sub-sub-steps of;
  
  mapping a vector (x,y,FL), where FL is a focal length of said observer, through M−
  
  1, the inverse of an observer motion estimation matrix M, to obtain a vector V in the coordinate system of the cube-based scene model;
  
  transforming V to a point (a,b) on a cube face in the scene model;
  
  using interpolation to find a pixel value at the point (a,b); and
  
  placing the pixel value corresponding to point (a,b) at the point (x,y) in said image.
- View Dependent Claims (18)
- - 18. A method according to claim 17, said step of using interpolation comprising the step of using bilinear interpolation to find a pixel value at the point (a,b).

20. A method of generating and utilizing a three-dimensional scene model from a sequence of video frames, comprising the steps of:
- separating background data from foreground data for each of said frame;
  
  using an estimate of relative observer motion, projecting each frame onto a coordinate system used in generating said scene model; and
  
  merging the background data of the frame with the scene model, wherein said scene model is based on a three-dimensional surface;
  
  detecting whether or not a current focal length in a frame has changed relative to a previous focal length of a previous frame, comprising the sub-steps of;
  
  choosing a point on the right edge of a frame;
  
  converting said point to a point in a coordinate system of said three-dimensional surface;
  
  choosing a corresponding point on the left edge of said frame, said corresponding point having a row coordinate in common with said point on the right edge of the frame;
  
  converting said corresponding point to a point in the coordinate system of said three-dimensional surface;
  
  determining the size of an angle between vectors formed by connecting said points converted to points in the coordinate system of said three-dimensional surface with the origin of the coordinate system; and
  
  if said angle differs from that corresponding to said previous frame in an amount exceeding a particular value, determining that the focal length has changed; and
  
  if said detecting step determines that the focal length has changed, creating a new scene model layer corresponding to the current focal length.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The method according to claim 20, said step of merging further comprising the step of mapping points of an image to points in said scene model using a coordinate transformation corresponding to said three-dimensional surface.
  - 22. The method according to claim 20, further comprising the step of reprojecting said scene model back to an image using a coordinate transformation corresponding to said three-dimensional surface.
  - 23. The method according to claim 20, wherein said merging step comprises the step of:
24. The method according to claim 20, wherein said merging step comprises the step of:
- minimizing storage requirements for the scene model.
25. The method according to claim 20, further comprising the steps of:
- compressing said scene model to obtain compressed scene model data;
  
  combining said compressed scene model data with compressed foreground data to obtain combined compressed data; and
  
  storing said combined compressed data on a computer-readable medium.
26. The method according to claim 25, further comprising the steps of:
- retrieving said combined compressed data from said computer-readable medium;
  
  separating said combined compressed data into retrieved compressed scene model data and retrieved compressed foreground data;
  
  decompressing said retrieved compressed scene model data to obtain decompressed scene model data;
  
  decompressing said retrieved compressed foreground data to obtain decompressed foreground data; and
  
  combining said decompressed scene model data with said decompressed foreground data to reconstruct at least one frame of said sequence of video frames.

27. A method of compressing and decompressing digital video data obtained from a video source, comprising the steps of:
- decomposing said digital video data into an integral sequence of frames obtained from said video source and corresponding to a single observer;
  
  computing a relative position and orientation of said observer based on a plurality of corresponding points from a plurality of frames of said sequence of frames;
  
  classifying motion by said observer;
  
  identifying regions of a video image corresponding to said digital video data containing moving objects and designating such objects as foreground objects;
  
  designating the remaining portions of said video image as background;
  
  encoding said foreground objects separately from said background;
  
  encoding said background by generating a three-dimensional cube-based scene model, comprising the following sub-steps;
  
  using an estimate of observer motion, projecting each frame onto a coordinate system used in generating said scene model; and
  
  merging the portion of said digital video data of said frame corresponding to said background with said scene model;
  
  detecting whether or not a current focal length in a frame has changed relative to a previous focal length of a previous frame, comprising the sub-steps of;
  
  choosing a point on the right edge of a frame;
  
  converting said point to cube coordinates;
  
  choosing a corresponding point on the left edge of said frame, said corresponding point having a row coordinate in common with said point on the right edge of the frame;
  
  converting said corresponding point to cube coordinates;
  
  determining the size of an angle between vectors formed by connecting said points converted to cube coordinates with the origin of said cube coordinates; and
  
  if said angle differs from that corresponding to said previous frame in an amount exceeding a particular value, determining that the focal length has changed; and
  
  if said detecting step determines that the focal length has changed, creating a new scene model layer corresponding to the current focal length.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45)
- - 28. The method according to claim 27, said step of merging comprising the step of:
29. The method according to claim 27, said step of merging comprising the step of:
- updating the scene model by combining those data points of said background data of the frame that differ from corresponding data points of said scene model with said corresponding points of said scene model, said combining comprising at least one of the following methods;
  
  averaging, replacing, and blending.
30. The method according to claim 27, further comprising the steps of:
- compressing said scene model to obtain compressed scene model data;
  
  combining said compressed scene model data with compressed foreground data to obtain combined compressed data; and
  
  transmitting said combined compressed data to a destination.
31. The method according to claim 30, further comprising the steps of:
- receiving said combined compressed data at said destination;
  
  separating said combined compressed data into received compressed scene model data and received compressed foreground data;
  
  decompressing said received compressed scene model data to obtain decompressed scene model data;
  
  decompressing said received compressed foreground data to obtain decompressed foreground data; and
  
  combining said decompressed scene model data with said decompressed foreground data to reconstruct at least one frame of said sequence of video frames.
32. The method according to claim 27, wherein said scene model is a spherical scene model.
33. The method according to claim 27, wherein said merging step comprises the step of:
- storing a row or column of data where two cube faces meet as part of the data associated with both faces.
34. The method according to claim 27, wherein said merging step comprises the step of:
- minimizing storage requirements for the scene model.
35. The method according to claim 34, said step of minimizing storage requirements comprising the following sub-steps:
- allocating memory only to portions of cube faces that are necessary; and
  
  allocating a row of column pointers.
36. The method according to claim 35, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer pans, setting appropriate ones of said column pointers to point to columns of data that need to be allocated as a result of the panning.
37. The method according to claim 35, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer tilts, setting appropriate ones of said column pointers to point to columns of data that need to be allocated and extending columns of data as needed.
38. The method according to claim 35, said step of minimizing storage requirements comprising the following sub-steps:
- allocating memory only to portions of cube faces that are necessary; and
  
  allocating a column of row pointers.
39. The method according to claim 38, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer pans, extending rows of data that need to be extended as a result of the panning.
40. The method according to claim 38, said step of minimizing storage requirements further comprising the sub-step of:
- if said observer tilts, setting appropriate ones of said row pointers to point to rows of data that need to be allocated and extending rows of data as needed.
45. A computer-readable medium containing software implementing the method of claim 27.

41. A method of compressing and decompressing digital video data obtained from a video source, comprising the steps of:
- decomposing said digital video data into an integral sequence of frames obtained from said video source and corresponding to a single observer;
  
  computing a relative position and orientation of said observer based on a plurality of corresponding points from a plurality of frames of said sequence of frames;
  
  classifying motion by said observer;
  
  identifying regions of a video image corresponding to said digital video data containing moving objects and designating such objects as foreground objects;
  
  designating the remaining portions of said video image as background;
  
  encoding said foreground objects separately from said background; and
  
  encoding said background by generating a three-dimensional cube-based scene model, comprising the following sub-steps;
  
  using an estimate of observer motion, protecting each frame onto a coordinate system used in generating said scene model; and
  
  mapping each point on the edge of a frame, F, through M−
  
  1, the inverse of observer motion estimation matrix M, to convert said point to a point in scene model coordinates; and
  
  for each cube face of the scene model to which an edge point of F is mapped, performing the following sub-steps;
  
  finding a box bounding all edge points of F mapped to said cube face; and
  
  for each point, (x,y) within said box, performing the following sub-sub-steps;
  
  mapping a vector defined by (x,y,FL), where FL is a focal length of said observer, through M to convert the vector to an image coordinate (a,b);
  
  using interpolation to determine a pixel value for (a,b) in F; and
  
  placing said pixel value for (a,b) at (x,y).
- View Dependent Claims (42)
- - 42. The method according to claim 41, wherein said step of using interpolation comprises the step of using bilinear interpolation to determine a pixel value for (a,b) in F.

43. A method of compressing and decompressing digital video data obtained from a video source, comprising the steps of:
- decomposing said digital video data into an integral sequence of frames obtained from said video source and corresponding to a single observer;
  
  computing a relative position and orientation of said observer based on a plurality of corresponding points from a plurality of frames of said sequence of frames;
  
  classifying motion by said observer;
  
  identifying regions of a video image corresponding to said digital video data containing moving objects and designating such objects as foreground objects;
  
  designating the remaining portions of said video image as background;
  
  encoding said foreground objects separately from said background; and
  
  encoding said background by generating a three-dimensional cube-based scene model, comprising the following sub-steps;
  
  using an estimate of observer motion, projecting each frame onto a coordinate system used in generating said scene model; and
  
  merging the portion of said digital video data of said frame corresponding to said background with said scene model;
  
  reprojecting said scene model to an image, comprising the sub-steps of;
  
  for each point (x,y) in said image, performing the sub-sub-steps of;
  
  mapping a vector (x,y,FL), where FL is a focal length of said observer, through M−
  
  1, the inverse of an observer motion estimation matrix M, to obtain a vector V in the coordinate system of the cube-based scene model;
  
  transforming V to a point (a,b) on a cube face in the scene model;
  
  using interpolation to find a pixel value at the point (a,b); and
  
  placing the pixel value corresponding to point (a,b) at the point (x,y) in said image.
- View Dependent Claims (44)
- - 44. A method according to claim 43, said step of using interpolation comprising the step of using bilinear interpolation to find a pixel value at the point (a,b).

46. A method of compressing and decompressing digital video data obtained from a video source, comprising the steps of:
- decomposing said digital video data into an integral sequence of frames obtained from said video source and corresponding to a single observer;
  
  computing a relative position and orientation of said observer based on a plurality of corresponding points from a plurality of frames of said sequence of frames;
  
  classifying motion by said observer;
  
  identifying regions of a video image corresponding to said digital video data containing moving objects and designating such objects as foreground objects;
  
  designating the remaining portions of said video image as background;
  
  encoding said foreground objects separately from said background;
  
  encoding said background by generating a three-dimensional scene model, comprising the following sub-steps;
  
  using an estimate of observer motion, projecting each frame onto a coordinate system used in generating said scene model; and
  
  merging the portion of said digital video data of said frame corresponding to said background with said scene model, wherein said scene model is based on a three-dimensional surface;
  
  detecting whether or not a current focal length in a frame has changed relative to a previous focal length of a previous frame, comprising the sub-steps of;
  
  choosing a point on the right edge of a frame;
  
  converting said point to a point in a coordinate system of said three-dimensional surface;
  
  choosing a corresponding point on the left edge of said frame, said corresponding point having a row coordinate in common with said point on the right edge of the frame;
  
  converting said corresponding point to a point in the coordinate system of said three-dimensional surface;
  
  determining the size of an angle between vectors formed by connecting said points converted to points in the coordinate system of said three-dimensional surface with the origin of the coordinate system; and
  
  if said angle differs from that corresponding to said previous frame in an amount exceeding a particular value, determining that the focal length has changed; and
  
  if said detecting step determines that the focal length has changed, creating a new scene model layer corresponding to the current focal length.
- View Dependent Claims (47, 48, 49, 50, 51, 52)
- - 47. The method according to claim 46, said step of merging further comprising the step of mapping points of an image to points in said scene model using a coordinate transformation corresponding to said three-dimensional surface.
  - 48. The method according to claim 46, further comprising the step of reprojecting said scene model back to an image using a coordinate transformation corresponding to said three-dimensional surface.
  - 49. The method according to claim 46, wherein said merging step comprises the step of:
50. The method according to claim 46, wherein said merging step comprises the step of:
- minimizing storage requirements for the scene model.
51. The method according to claim 46, further comprising the steps of:
- compressing said scene model to obtain compressed scene model data;
  
  combining said compressed scene model data with compressed foreground data to obtain combined compressed data; and
  
  storing said combined compressed data on a computer-readable medium.
52. The method according to claim 51, further comprising the steps of:
- retrieving said combined compressed data from said computer-readable medium;
  
  separating said combined compressed data into retrieved compressed scene model data and retrieved compressed foreground data;
  
  decompressing said retrieved compressed scene model data to obtain decompressed scene model data;
  
  decompressing said retrieved compressed foreground data to obtain decompressed foreground data; and
  
  combining said decompressed scene model data with said decompressed foreground data to reconstruct at least one frame of said sequence of video frames.

53. A computer system capable of generating and utilizing a three-dimensional cube-based scene model from a sequence of video frames, comprising:
- a computer, including;
  
  storage means;
  
  input/output means; and
  
  processing means; and
  
  software means, programmed on a medium readable by said computer, comprising;
  
  means for separating background data from foreground data for each of said frames;
  
  means for projecting each of said frames onto a coordinate system used in generating said scene model, said means for projecting using an estimate of relative motion of an observer;
  
  means for merging the background data of the frame with the scene model;
  
  means for detecting whether or not a current focal length in a frame has changed relative to a previous focal length of a previous frame, comprising;
  
  means for choosing a point on the right edge of a frame;
  
  means for converting said point to cube coordinates;
  
  means for choosing a corresponding point on the left edge of said frame, said corresponding point having a row coordinate in common with said point on the right edge of the frame;
  
  means for converting said corresponding point to cube coordinates;
  
  means for determining the size of an angle between vectors formed by connecting said points converted to cube coordinates with the origin of said cube coordinates; and
  
  means for determining that the focal length has changed if said angle differs from that corresponding to said previous frame in an amount exceeding a particular value; and
  
  means for creating a new scene model layer corresponding to the current focal length if said means for detecting determines that the focal length has changed.
- View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66)
- - 54. The computer system according to claim 53, said means for merging comprising:
55. The computer system according to claim 53, said means for merging comprising:
- means for updating the scene model by combining those data points of said background data of the frame that differ from corresponding data points of said scene model with said corresponding points of said scene model, said combining comprising at least one of the following;
  
  averaging, replacing, and blending.
56. The computer system according to claim 53, further comprising:
- means for compressing said scene model to obtain compressed scene model data;
  
  means for combining said compressed scene model data with compressed foreground data to obtain combined compressed data; and
  
  means for transmitting said combined compressed data to a destination.
57. The computer system according to claim 56, further comprising:
- means for receiving said combined compressed data at said destination;
  
  means for separating said combined compressed data into received compressed scene model data and received compressed foreground data;
  
  means for decompressing said received compressed scene model data to obtain decompressed scene model data;
  
  means for decompressing said received compressed foreground data to obtain decompressed foreground data; and
  
  means for combining said decompressed scene model data with said decompressed foreground data to reconstruct at least one frame of said sequence of video frames.
58. The computer system according to claim 53, wherein said scene model is a spherical scene model.
59. The computer system according to claims 53, wherein said means for merging comprises:
- means for storing in said storage means a row or column of data where two cube faces meet as part of the data associated with both faces.
60. The computer system according to claim 53, wherein said means for merging comprises:
- means for minimizing storage requirements for the scene model.
61. The computer system according to claim 60, wherein said means for minimizing storage requirements comprises:
- means for allocating memory only to portions of cube faces that are necessary;
  
  means for allocating a row of column pointers.
62. The computer system according to claim 61, wherein said means for minimizing storage requirements further comprises:
- means for setting appropriate ones of said column pointers to point to columns of data that need to be allocated as a result of any panning by said observer.
63. The computer system according to claim 61, wherein said means for minimizing storage requirements further comprises:
- means for setting appropriate ones of said column pointers to point to columns of data that need to be allocated and extending columns of data as needed if said observer tilts.
64. The computer system according to claim 60, wherein said means for minimizing storage requirements comprises:
- means for allocating memory only to portions of cube faces that are necessary; and
  
  means for allocating a row of column pointers.
65. The computer system according to claim 64, wherein said means for minimizing storage requirements further comprises:
- means for extending rows of data that need to be extended if said observer pans.
66. The computer system according to claim 64, wherein said means for minimizing storage requirements further comprises:
- means for setting appropriate ones of said row pointers to point to rows of data that need to be allocated and for extending rows of data as needed, if said observer tilts.

67. A computer system capable of generating and utilizing a three-dimensional cube-based scene model from a sequence of video frames, comprising:
- a computer, including;
  
  storage means;
  
  input/output means; and
  
  processing means; and
  
  software means, programmed on a medium readable by said computer, comprising;
  
  means for separating background data from foreground data for each of said frames;
  
  means for projecting each of said frames onto a coordinate system used in generating said scene model, said means for projecting using an estimate of relative motion of an observer;
  
  means for mapping each point on the edge of a frame, F, through M−
  
  1, the inverse of observer motion estimation matrix M, to convert said point to a point in scene model coordinates; and
  
  means for processing each cube face of the scene model to which an edge point of F is mapped, comprising;
  
  means for finding a box bounding all edge points of F mapped to said cube face; and
  
  means for manipulating each point, (x,y) within said box, comprising;
  
  means for mapping a vector defined by (x,y,FL), where FL is a focal length of said observer, through M to convert the vector to an image coordinate (a,b);
  
  interpolation means for determining a pixel value for (a,b) in F; and
  
  means for placing said pixel value for (a,b) at (x,y).
- View Dependent Claims (68)
- - 68. The computer system according to claim 67, wherein said interpolation means comprises bilinear interpolation means.

69. A computer system capable of generating and utilizing a three-dimensional cube-based scene model from a sequence of video frames, comprising:
- a computer, including;
  
  storage means;
  
  input/output means; and
  
  processing means; and
  
  software means, programmed on a medium readable by said computer, comprising;
  
  means for separating background data from foreground data for each of said frames;
  
  means for projecting each of said frames onto a coordinate system used in generating said scene model, said means for projecting using an estimate of relative motion of an observer; and
  
  means for merging the background data of the frame with the scene model, said software means further comprising;
  
  means for reprojecting said scene model to an image, comprising;
  
  means for processing each point (x,y) in said image, comprising;
  
  means for mapping a vector (x,y,FL), where FL is a focal length of said observer, through M−
  
  1, the inverse of an observer motion estimation matrix M, to obtain a vector V in the coordinate system of the cube-based scene model;
  
  means for transforming V to a point (a,b) on a cube face in the scene model;
  
  interpolation means for finding a pixel value at the point (a,b); and
  
  means for placing the pixel value corresponding to point (a,b) at the point (x,y) in said image.
- View Dependent Claims (70)
- - 70. The computer system according to claim 69, wherein said interpolation means comprises bilinear interpolation means.

71. A computer system capable of generating and utilizing a three-dimensional scene model from a sequence of video frames, comprising:
- a computer, including;
  
  storage means;
  
  input/output means; and
  
  processing means; and
  
  software means, programmed on a medium readable by said computer, comprising;
  
  means for separating background data from foreground data for each of said frames;
  
  means for projecting each of said frames onto a coordinate system used in generating said scene model, said means for projecting using an estimate of relative motion of an observer; and
  
  means for merging the background data of the frame with the scene model, wherein said scene model is based on a three-dimensional surface;
  
  means for detecting whether or not a current focal length in a frame has changed relative to a previous focal length of a previous frame, comprising;
  
  means for choosing a point on the right edge of a frame;
  
  means for converting said point to a point in a coordinate system corresponding to said three-dimensional surface;
  
  means for choosing a corresponding point on the left edge of said frame, said corresponding point having a row coordinate in common with said point on the right edge of the frame;
  
  means for converting said corresponding point to a point in the coordinate system corresponding to said three-dimensional surface;
  
  means for determining the size of an angle between vectors formed by connecting said points converted to points in the coordinate system corresponding to said three-dimensional surface with the origin of the coordinate system corresponding to said three-dimensional surface; and
  
  means for determining that the focal length has changed if said angle differs from that corresponding to said previous frame in an amount exceeding a particular value; and
  
  means for creating a new scene model layer corresponding to the current focal length if said means for detecting determines that the focal length has changed.
- View Dependent Claims (72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82)
- - 72. The computer system according to claim 71, said means for merging comprising:
73. The computer system according to claim 71, further comprising:
- means for reprojecting said scene model back to an image using a coordinate transformation corresponding to said three-dimensional surface.
74. The computer system according to claim 71, wherein said means for merging comprises:
- means for storing in said storage means data where two faces of said three-dimensional surface meet as part of the data associated with both faces.
75. The computer system according to claim 71, wherein said means for merging comprises:
- means for minimizing storage requirements for the scene model.
76. The computer system according to claim 71, further comprising:
- means for compressing said scene model to obtain compressed scene model data;
  
  means for combining said compressed scene model data with compressed foreground data to obtain combined compressed data; and
  
  means for storing said combined compressed data in a computer-readable medium.
77. The computer system according to claim 76, further comprising:
- means for retrieving said combined compressed data from said computer-readable medium;
  
  means for separating said combined compressed data into retrieved compressed scene model data and retrieved compressed foreground data;
  
  means for decompressing said retrieved compressed scene model data to obtain decompressed scene model data;
  
  means for decompressing said retrieved compressed foreground data to obtain decompressed foreground data; and
  
  means for combining said decompressed scene model data with said decompressed foreground data to reconstruct at least one frame of said sequence of video frames.
78. The computer system of claims 71, said software means further comprising:
- means for decomposing a video image into an integral sequence of frames;
  
  means for computing a relative position and orientation of a single observer generating said video data, based on a plurality of corresponding points taken from a plurality of said frames; and
  
  means for classifying motion of said observer; and
  
  means for encoding said foreground data separately from said background data;
  
  wherein said foreground data corresponds to regions of a video image containing moving objects, and said background data corresponds to remaining regions of a video image.
79. A system for compressing and decompressing digital video data obtained from a video source, the system being connected to a communication network, the system comprising:
- the computer system according to claim 71, communicatively connected to said communication network;
  
  at least one viewing system, communicatively connected to said communication network, comprising;
  
  means for decompressing compressed video data; and
  
  means for displaying decompressed video data.
80. The system according to claim 79, further comprising:
- a video server, communicatively connected to said communication network, for uploading compressed video data from said computer system and for downloading said compressed video data to said at least one viewing system.
81. The system according to claim 79, wherein compressed video data from said computer system is transmitted directly to said at least one viewing system.
82. The system according to claim 79, wherein compressed video data is stored on a computer-readable medium by said computer system, and said at least one viewing system further comprises means for reading said computer-readable medium.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Avigilon Fortress Corporation
Original Assignee
ObjectVideo Incorporated (Alarm.com Holdings, Inc.)
Inventors
Allmen, Mark, Severson, William, Debrunner, Chris, Strat, Thomas M.
Primary Examiner(s)
Kelley, Chris
Assistant Examiner(s)
CZEKAJ, DAVID J

Application Number

US09/609,919
Time in Patent Office

1,415 Days
Field of Search

375/240.01, 375/240.08, 382/154
US Class Current

375/240.08
CPC Class Codes

G06T 9/001 Model-based coding, e.g. wi...

H04N 19/23 with coding of regions that...

Scene model generation from video for use in video processing

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

212 Citations

82 Claims

Specification

Solutions

Use Cases

Quick Links

Scene model generation from video for use in video processing

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

212 Citations

82 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links