Image insertion in video streams using a combination of physical sensors and pattern recognition

US 6,100,925 A
Filed: 01/19/1999
Issued: 08/08/2000
Est. Priority Date: 11/27/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having one or more hardware sensor devices, the method comprising the steps of:

a) establishing an array of idealized x and y coordinates representing a reference array having a plurality of landmarks where each landmark has unique x and y coordinates;

b) mapping x and y coordinates in a current image to said x and y coordinates in said reference array;

c) acquiring camera sensor data from said hardware sensor device, said camera sensor data representing the position and orientation of the camera;

d) predicting the future location of said landmark coordinates, x'"'"' and y'"'"', using said camera sensor data,wherein prediction errors due to changes between two successive fields are minimized by adding (i) the field to field difference in landmark location calculated from said camera sensor data to (ii) the landmark position x, y previously located.

View all claims

13 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A live video insertion system (LVIS) is disclosed that allows insertion of static or dynamic images into a live video broadcast in a realistic fashion on a real time basis. Initially, natural landmarks in a scene that are suitable for subsequent detection and tracking are selected. The landmarks are typically distributed throughout the entire scene, such as a ballpark or football stadium. The field of view of the camera at any instant is normally significantly smaller than the full scene that may be panned. The LVIS uses a combination of pattern recognition techniques and camera sensor data (e.g., pan, tilt, zoom, etc.) to locate, verify and track target data. Camera sensors are well suited for the searching requirements of an LVIS, while pattern recognition and landmark tracking techniques are better suited for the image tracking requirements of LVIS.

282 Citations

29 Claims

1. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having one or more hardware sensor devices, the method comprising the steps of:
- a) establishing an array of idealized x and y coordinates representing a reference array having a plurality of landmarks where each landmark has unique x and y coordinates;
  
  b) mapping x and y coordinates in a current image to said x and y coordinates in said reference array;
  
  c) acquiring camera sensor data from said hardware sensor device, said camera sensor data representing the position and orientation of the camera;
  
  d) predicting the future location of said landmark coordinates, x'"'"' and y'"'"', using said camera sensor data,wherein prediction errors due to changes between two successive fields are minimized by adding (i) the field to field difference in landmark location calculated from said camera sensor data to (ii) the landmark position x, y previously located.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1 wherein said mapping is achieved according to the following relationships;
    - space="preserve" listing-type="equation">x'"'"'=a+bx+cy
      space="preserve" listing-type="equation">y'"'"'=d+ex+fy
      where;
      
      x is a horizontal coordinate in the reference array,y is a vertical coordinate in the reference array,x'"'"' is a horizontal coordinate in the current scene,y'"'"' is a vertical coordinate in the current scene,a is a warp parameter for horizontal translation of the object in the x direction,b is a warp parameter for magnification between the reference array and the current image in the x direction,c is a warp parameter for a combination of rotation and skew in the x direction,d is a warp parameter for vertical translation of the object in the y direction,e is a warp parameter for a combination of rotation and skew in the y direction, andf is a warp parameter for magnification between the reference array and the current image in the y direction.
  - 3. The method of claim 2 wherein said video images are vertically interlaced where images from field to field alternate between like and unlike fields.
  - 4. The method of claim 3 wherein said predicting the future location of said landmark coordinates, x'"'"' and y'"'"', for said interlaced video images is based on a detected change of position of said landmark from the previous like field.
  - 5. The method of claim 4 further comprising the steps of:
    - e) searching for one of said landmarks in said current image by means of correlation using a template where the search is conducted over a substantial region spanning the predicted location of said landmark;
      
      f) multiplying the results of said correlation search in step (e) by a weighting function giving greater weight to correlations closer in distance to the predicted location of said landmark to yield a weighted correlation surface;
      
      g) searching said weighted correlation surface for its peak value.
  - 6. The method of claim 5 further comprising the steps of:
    - h) determining new warp parameters a,b,c,d,e and f for a current image based on said landmark'"'"'s current position in a current image weighted by said weighting function for that landmark,wherein emphasis is given to landmarks which are closer to their predicted position.
  - 7. The method of claim 6 wherein said weighting function comprises the following relationship:
    - ##EQU6## where;
      
      g,h,i,j,k, and l are numerical constants;
      
      xp is the predicted x coordinate location of said landmark;
      
      xm is the measured x coordinate position of said landmark;
      
      yp is the predicted y coordinate location of said landmark; and
      
      ,ym is the measured y coordinate position of said landmark.
  - 8. The method of claim 7 further including the step of:
    - i) updating said landmark locations in said reference array according to the location of said landmarks in said current image,wherein said updating is performed based upon well identified landmarks and according to said landmark weighting function.
  - 9. The method of claim 8 further comprising the step ofi) establishing three types of reference arrays prior to broadcast including;
    - i) a code reference array having landmark coordinates equal to said reference landmark coordinates,ii) a game reference array having landmark coordinates initially set equal to said code reference array coordinates, and,iii) a tracking reference array having landmark coordinates initially set equal to said code reference array coordinates.
  - 10. The method of claim 9 further comprising the steps of:
    - k) changing said tracking reference array of coordinates during a broadcast; and
      
      ,l) resetting the tracking reference array of coordinates to said game reference array of coordinates after a scene cut.
  - 11. The method of claim 10 wherein said video system is controlled by an operator and said method further comprises the step of:
    - m) selectively choosing to set said current tracking reference array of coordinates equal to said game reference array of coordinates or to set said game reference array of coordinates back to said code reference array of coordinates,wherein said operator can update or override the game or tracking reference array of coordinates.
  - 12. The method of claim 11 further comprising the steps of:
    - n) establishing a set of sensor points in a pattern around the location of each said landmark said sensor points being able to detect changes in color and illumination;
      
      o) determining if said sensor points are different in color or illumination from the expected color or illumination; and
      
      ,p) excluding said landmark from future calculations if said color or illumination is substantially different from what was expected,wherein said landmark is deemed to be occluded if said color or illumination at said sensor points is substantially different from the expected color or illumination.
  - 13. The method of claim 12 wherein said correlation template is a 15 by 15 pixel window.
  - 14. The method of claim 1 wherein said mapping is achieved according to the following relationships;
    - space="preserve" listing-type="equation">x'"'"'=a+bx
      space="preserve" listing-type="equation">y'"'"'=d+by
      where;
      
      x is a horizontal coordinate in the reference array,y is a vertical coordinate in the reference array,x'"'"' is a horizontal coordinate in the current scene,y'"'"' is a vertical coordinate in the current scene,b is a warp parameter for magnification between the reference array and the current image,a is a warp parameter for horizontal translation of the object in the x direction, and,d is a warp parameter for vertical translation of the object in the y direction.
  - 15. The method of claim 4 further comprising the steps of:
    - q) searching for one of said landmarks in said current image by means of correlation using a template where the starting point of the search is substantially centered at the predicted location of said landmark;
      
      r) performing said search beginning from said predicted location and proceeding outward looking for a match; and
      
      ,s) discontinuing said search for said landmark when said match exceeds a threshold value.
  - 16. The method of claim 6 wherein said weighting function comprises the following relationship:
    - ##EQU7## where;
      
      xp is the predicted x coordinate location of said landmark;
      
      xm is the measured x coordinate position of said landmark;
      
      yp is the predicted y coordinate location of said landmark; and
      
      ,ym is the measured y coordinate position of said landmark.

17. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
- t) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera,u) converting the camera sensor data to a format suitable for transmission,v) transmitting the converted camera sensor data to a live video insertion system,w) converting the camera sensor data to affine form,x) predicting where landmarks in the previous field of video will be in the current field of video based upon said camera sensor data,y) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data, andz) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located landmarks.
- View Dependent Claims (18, 19)
- - 18. The method of claim 17 wherein the orientation and settings of said at least one camera comprise focus, zoom, pan, and tilt.
  - 19. The method of claim 17 wherein the format suitable for transmission is a numeric series obtained by converting the acquired camera sensor data from an analog base to a digital base.

20. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
- aa) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera,bb) converting the camera sensor data to a format suitable for transmission,cc) transmitting the converted camera sensor data to a live video insertion system,dd) converting the camera sensor data to affine form,ee) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data,ff) creating virtual landmarks using said camera sensor data, said virtual landmarks appropriately weighted for camera sensor data error, andgg) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located and virtual landmarks.
- View Dependent Claims (21, 22)
- - 21. The method of claim 20 wherein the orientation and settings of said at least one camera comprise focus, zoom, pan, and tilt.
  - 22. The method of claim 20 wherein the format suitable for transmission is a numeric series obtained by converting the acquired camera sensor data from an analog base to a digital base.

23. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having one or more hardware sensor devices, the method comprising the steps of:
- hh) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory;
  
  ii) acquiring camera sensor data from said hardware sensing device, said camera sensor data representing the position and orientation of the camera;
  
  jj) using said camera sensor data in determining the position of each stored image template with respect to the current image;
  
  kk) calculating a transform model using the determined template position with respect to the current image, said transform model to be used to correspond reference position data to current image position data;
  
  ll) purging image templates from memory that do not meet certain template retention criteria; and
  
  mm) obtaining new image templates from said current image to replace the image templates that were purged.

24. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having hardware sensor devices, said hardware sensor devices to include an accelerometer, the method comprising the steps of:
- nn) establishing an array of idealized x and y coordinates representing a reference array having a plurality of landmarks where each landmark has unique x and y coordinates;
  
  oo) mapping x and y coordinates in a current image to said x and y coordinates in said reference array;
  
  pp) acquiring camera sensor data from said hardware sensor devices, said camera sensor data representing the position, orientation, and oscillation of the camera;
  
  qq) predicting the future location of said landmark coordinates, x'"'"' and y'"'"', using said camera sensor data,wherein prediction errors due to changes between two successive fields are minimized by adding (i) the field to field difference in landmark location calculated from said camera sensor data to (ii) the landmark position x, y previously located.

25. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
- rr) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory,ss) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera,tt) converting the camera sensor data to a format suitable for transmission,uu) transmitting the converted camera sensor data to a live video insertion system,vv) converting the camera sensor data to affine form,ww) predicting where image templates in the previous field of video will be in the current field of video based upon said camera sensor data,xx) performing correlations to detect image template positions centered about image template positions predicted by the camera sensor data, andyy) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all image templates,zz) purging image templates from memory that do not meet certain template retention criteria, andaaa) obtaining new image templates from said current image to replace the image templates that were purged.

26. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by camera oscillation and changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the motion, orientation and settings of the primary video stream source camera, said method comprising the steps of:
- bbb) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the acceleration, orientation and settings of the camera,ccc) converting the camera sensor data to a format suitable for transmission,ddd) transmitting the converted camera sensor data to a live video insertion system,eee) converting the camera sensor data to affine form,fff) predicting where landmarks in the previous field of video will be in the current field of video based upon said camera sensor data,ggg) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data, andhhh) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located landmarks.

27. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
- iii) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory,jjj) acquiring camera sensor data from at least one camera outfit with hardware sensors which measure the orientation and settings of the camera,kkk) converting the camera sensor data to a format suitable for transmission,lll) transmitting the converted camera sensor data to a live video insertion system,mmm) converting the camera sensor data to affine form,nnn) performing correlations to detect image template positions centered about image template positions predicted by the camera sensor data,ooo) creating virtual image templates using said camera sensor data, said virtual image templates appropriately weighted for camera sensor data error,ppp) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located and virtual image templates,qqq) purging image templates from memory that do not meet certain template retention criteria, andrrr) obtaining new image templates from said current image to replace the image templates that were purged.

28. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by camera oscillation and changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the acceleration, orientation and settings of the primary video stream source camera, said method comprising the steps of:
- sss) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the acceleration, orientation and settings of the camera,ttt) converting the camera sensor data to a format suitable for transmission,uuu) transmitting the converted camera sensor data to a live video insertion system,vvv) converting the camera sensor data to affine form,www) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data,xxx) creating virtual landmarks using said camera sensor data, said virtual landmarks appropriately weighted for camera sensor data error, andyyy) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located and virtual landmarks.

29. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
- zzz) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera,aaaa) converting the camera sensor data to a format suitable for transmission,bbbb) transmitting the converted camera sensor data to a live video insertion system,cccc) converting the camera sensor data to a form and a coordinate system useable by the live video insertion system,dddd) predicting where landmarks will be in the current field of video based on said camera sensor data,eeee) creating a model relating a reference field of video to the current field of video using a weighted least mean squares fit for all located landmarks,ffff) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory,gggg) in subsequent fields of video using the predicted positions of said image templates as a starting point to determine the current position of each stored image template,hhhh) in subsequent fields of video calculating a transform model using the determined template positions to correspond reference position date to image position data in those subsequent fields,iiii) purging image templates from memory that do not meet certain template retention criteria, andjjjj) obtaining new image templates from said current image to replace the image templates that were purged.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Disney Enterprises Incorporated (The Walt Disney Company)
Original Assignee
Princeton Video Image, Inc. (Gabriel Technologies Corporation)
Inventors
Tan, Yi, Gong, Ximin, Jeffers, James L., DiCicco, Darrell S., Rosser, Roy J., Kennedy, Howard J. Jr.
Primary Examiner(s)
Tung, Bryan

Application Number

US09/230,099
Time in Patent Office

567 Days
Field of Search

348/169, 348/584, 348/588, 348/590, 382/103, 382/284
US Class Current

348/169
CPC Class Codes

G01S 17/66   Tracking systems using elec...

G01S 3/7864   T.V. type tracking systems

H04N 5/272   Means for inserting a foreg...

H04N 5/2723   Insertion of virtual advert...

Image insertion in video streams using a combination of physical sensors and pattern recognition

First Claim

13 Assignments

0 Petitions

Accused Products

Abstract

282 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Image insertion in video streams using a combination of physical sensors and pattern recognition

First Claim

13 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

282 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links