Image insertion in video streams using a combination of physical sensors and pattern recognition
First Claim
1. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having one or more hardware sensor devices, the method comprising the steps of:
- a) establishing an array of idealized x and y coordinates representing a reference array having a plurality of landmarks where each landmark has unique x and y coordinates;
b) mapping x and y coordinates in a current image to said x and y coordinates in said reference array;
c) acquiring camera sensor data from said hardware sensor device, said camera sensor data representing the position and orientation of the camera;
d) predicting the future location of said landmark coordinates, x'"'"' and y'"'"', using said camera sensor data,wherein prediction errors due to changes between two successive fields are minimized by adding (i) the field to field difference in landmark location calculated from said camera sensor data to (ii) the landmark position x, y previously located.
13 Assignments
0 Petitions
Accused Products
Abstract
A live video insertion system (LVIS) is disclosed that allows insertion of static or dynamic images into a live video broadcast in a realistic fashion on a real time basis. Initially, natural landmarks in a scene that are suitable for subsequent detection and tracking are selected. The landmarks are typically distributed throughout the entire scene, such as a ballpark or football stadium. The field of view of the camera at any instant is normally significantly smaller than the full scene that may be panned. The LVIS uses a combination of pattern recognition techniques and camera sensor data (e.g., pan, tilt, zoom, etc.) to locate, verify and track target data. Camera sensors are well suited for the searching requirements of an LVIS, while pattern recognition and landmark tracking techniques are better suited for the image tracking requirements of LVIS.
282 Citations
29 Claims
-
1. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having one or more hardware sensor devices, the method comprising the steps of:
-
a) establishing an array of idealized x and y coordinates representing a reference array having a plurality of landmarks where each landmark has unique x and y coordinates; b) mapping x and y coordinates in a current image to said x and y coordinates in said reference array; c) acquiring camera sensor data from said hardware sensor device, said camera sensor data representing the position and orientation of the camera; d) predicting the future location of said landmark coordinates, x'"'"' and y'"'"', using said camera sensor data, wherein prediction errors due to changes between two successive fields are minimized by adding (i) the field to field difference in landmark location calculated from said camera sensor data to (ii) the landmark position x, y previously located. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
-
t) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera, u) converting the camera sensor data to a format suitable for transmission, v) transmitting the converted camera sensor data to a live video insertion system, w) converting the camera sensor data to affine form, x) predicting where landmarks in the previous field of video will be in the current field of video based upon said camera sensor data, y) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data, and z) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located landmarks. - View Dependent Claims (18, 19)
-
-
20. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
-
aa) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera, bb) converting the camera sensor data to a format suitable for transmission, cc) transmitting the converted camera sensor data to a live video insertion system, dd) converting the camera sensor data to affine form, ee) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data, ff) creating virtual landmarks using said camera sensor data, said virtual landmarks appropriately weighted for camera sensor data error, and gg) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located and virtual landmarks. - View Dependent Claims (21, 22)
-
-
23. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having one or more hardware sensor devices, the method comprising the steps of:
-
hh) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory; ii) acquiring camera sensor data from said hardware sensing device, said camera sensor data representing the position and orientation of the camera; jj) using said camera sensor data in determining the position of each stored image template with respect to the current image; kk) calculating a transform model using the determined template position with respect to the current image, said transform model to be used to correspond reference position data to current image position data; ll) purging image templates from memory that do not meet certain template retention criteria; and mm) obtaining new image templates from said current image to replace the image templates that were purged.
-
-
24. A method for tracking motion from field to field in a sequence of related video images that are scanned by at least one camera having hardware sensor devices, said hardware sensor devices to include an accelerometer, the method comprising the steps of:
-
nn) establishing an array of idealized x and y coordinates representing a reference array having a plurality of landmarks where each landmark has unique x and y coordinates; oo) mapping x and y coordinates in a current image to said x and y coordinates in said reference array; pp) acquiring camera sensor data from said hardware sensor devices, said camera sensor data representing the position, orientation, and oscillation of the camera; qq) predicting the future location of said landmark coordinates, x'"'"' and y'"'"', using said camera sensor data, wherein prediction errors due to changes between two successive fields are minimized by adding (i) the field to field difference in landmark location calculated from said camera sensor data to (ii) the landmark position x, y previously located.
-
-
25. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
-
rr) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory, ss) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera, tt) converting the camera sensor data to a format suitable for transmission, uu) transmitting the converted camera sensor data to a live video insertion system, vv) converting the camera sensor data to affine form, ww) predicting where image templates in the previous field of video will be in the current field of video based upon said camera sensor data, xx) performing correlations to detect image template positions centered about image template positions predicted by the camera sensor data, and yy) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all image templates, zz) purging image templates from memory that do not meet certain template retention criteria, and aaa) obtaining new image templates from said current image to replace the image templates that were purged.
-
-
26. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by camera oscillation and changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the motion, orientation and settings of the primary video stream source camera, said method comprising the steps of:
-
bbb) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the acceleration, orientation and settings of the camera, ccc) converting the camera sensor data to a format suitable for transmission, ddd) transmitting the converted camera sensor data to a live video insertion system, eee) converting the camera sensor data to affine form, fff) predicting where landmarks in the previous field of video will be in the current field of video based upon said camera sensor data, ggg) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data, and hhh) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located landmarks.
-
-
27. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
-
iii) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory, jjj) acquiring camera sensor data from at least one camera outfit with hardware sensors which measure the orientation and settings of the camera, kkk) converting the camera sensor data to a format suitable for transmission, lll) transmitting the converted camera sensor data to a live video insertion system, mmm) converting the camera sensor data to affine form, nnn) performing correlations to detect image template positions centered about image template positions predicted by the camera sensor data, ooo) creating virtual image templates using said camera sensor data, said virtual image templates appropriately weighted for camera sensor data error, ppp) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located and virtual image templates, qqq) purging image templates from memory that do not meet certain template retention criteria, and rrr) obtaining new image templates from said current image to replace the image templates that were purged.
-
-
28. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by camera oscillation and changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the acceleration, orientation and settings of the primary video stream source camera, said method comprising the steps of:
-
sss) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the acceleration, orientation and settings of the camera, ttt) converting the camera sensor data to a format suitable for transmission, uuu) transmitting the converted camera sensor data to a live video insertion system, vvv) converting the camera sensor data to affine form, www) performing correlations to detect landmark positions centered about landmark positions predicted by the camera sensor data, xxx) creating virtual landmarks using said camera sensor data, said virtual landmarks appropriately weighted for camera sensor data error, and yyy) creating a model relating a reference field of video to the current field of video using a weighted least mean square fit for all located and virtual landmarks.
-
-
29. A method of merging a primary video stream into a secondary video stream so that the combined video stream appears to have a common origin from video field to video field even as the primary video stream is modulated by changes in camera orientation and settings, said apparent common origin achieved by using pattern recognition analysis of the primary video stream to stabilize and refine camera sensor data representing the orientation and settings of the primary video stream source camera, said method comprising the steps of:
-
zzz) acquiring camera sensor data from at least one camera outfitted with hardware sensors which measure the orientation and settings of the camera, aaaa) converting the camera sensor data to a format suitable for transmission, bbbb) transmitting the converted camera sensor data to a live video insertion system, cccc) converting the camera sensor data to a form and a coordinate system useable by the live video insertion system, dddd) predicting where landmarks will be in the current field of video based on said camera sensor data, eeee) creating a model relating a reference field of video to the current field of video using a weighted least mean squares fit for all located landmarks, ffff) obtaining a set of image templates from a current video image that meet certain template capturing criteria and storing said image templates in memory, gggg) in subsequent fields of video using the predicted positions of said image templates as a starting point to determine the current position of each stored image template, hhhh) in subsequent fields of video calculating a transform model using the determined template positions to correspond reference position date to image position data in those subsequent fields, iiii) purging image templates from memory that do not meet certain template retention criteria, and jjjj) obtaining new image templates from said current image to replace the image templates that were purged.
-
Specification