Automated synchronization of video image sequences to new soundtracks
First Claim
1. A method for modifying a video recording having an accompanying audio track to produce a new video presentation with a different audio track, comprising the steps of:
- analyzing said accompanying audio track by means of automatic speech recognition techniques to identify video frames in the video recording that are associated with individual speech characteristics in said accompanying audio track, and storing video image information from each of said frames in a database;
analyzing video image information from said frames to identify predetermined features associated with the video image, and annotating the video image information stored in said database with data relating to said features;
analyzing a sound utterance to identify individual speech characteristics in said sound utterance;
selecting video image information stored in said database according to the identified speech characteristics in said sound utterance, and assembling the selected items of image information to form a sequence; and
smoothly fitting the selected items of information in said sequence to one another in accordance with the annotated data to produce a video presentation that is synchronized to said sound utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
The synchronization of an existing video to a new soundtrack is carried out through the phonetic analysis of the original soundtrack and the new soundtrack. Individual speech sounds, such as phones, are identified in the soundtrack for the original video recording, and the images corresponding thereto are stored. The new soundtrack is similarly analyzed to identify individual speech sounds, which are used to select the stored images and create a new video sequence. The sequence of images are then smoothly fitted to one another, to provide a video stream that is synchronized to the new soundtrack. This approach permits a given video sequence to be synchronized to any arbitrary utterance. Furthermore, the matching of the video images to the new speech sounds can be carried out in a highly automated manner, thereby reducing required manual effort.
237 Citations
39 Claims
-
1. A method for modifying a video recording having an accompanying audio track to produce a new video presentation with a different audio track, comprising the steps of:
-
analyzing said accompanying audio track by means of automatic speech recognition techniques to identify video frames in the video recording that are associated with individual speech characteristics in said accompanying audio track, and storing video image information from each of said frames in a database; analyzing video image information from said frames to identify predetermined features associated with the video image, and annotating the video image information stored in said database with data relating to said features; analyzing a sound utterance to identify individual speech characteristics in said sound utterance; selecting video image information stored in said database according to the identified speech characteristics in said sound utterance, and assembling the selected items of image information to form a sequence; and smoothly fitting the selected items of information in said sequence to one another in accordance with the annotated data to produce a video presentation that is synchronized to said sound utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for synchronizing a video sequence having an accompanying audio track with a different audio track, comprising the steps of:
-
analyzing the audio track accompanying said video sequence by means of automatic speech recognition techniques to identify individual speech characteristics in said accompanying audio track; analyzing a sound utterance in said different audio track by means of automatic speech recognition techniques to identify individual speech characteristics in said sound utterance; and temporally modifying said video sequence so that identified individual speech characteristics in said video sequence are temporally aligned with corresponding individual speech characteristics in said sound utterance. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A system for modifying a recorded video image stream to synchronize it to a soundtrack which is generated separately from the recorded video image stream, comprising:
-
means for automatically analyzing the recorded video image stream to identify sequences of images that are associated with individual speech characteristics; a memory storing a database containing said identified sequences of images; means for automatically analyzing said soundtrack to identify individual speech characteristics contained therein; and means for selecting sequences of images contained in said database that correspond to individual speech characteristics that are identified in said soundtrack and assembling the selected sequences of images into a video image stream that is synchronized with said soundtrack. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A system for modifying a recorded video image stream to synchronize it to a soundtrack which is generated separately from the recorded video image stream, comprising:
-
means for analyzing the recorded video image stream to identify images that are associated with individual speech characteristics; a memory storing a first database containing sub images, each of which comprises a predetermined portion of one of said identified images; means for analyzing said identified images to define control features within the subimage portions of said images; means for annotating said stored subimages with data relating to said defined control features; a memory storing a second database containing full-frame images from said video image sequence, together with said defined control features; means for analyzing said soundtrack to identify individual speech characteristics contained therein; means for selecting subimages contained in said first database that correspond to individual speech characteristics that are identified in said sound track; and means for incorporating the selected subimages into full-frame images stored in said second database, in accordance with the defined control features, to form a video stream that is synchronized with said soundtrack. - View Dependent Claims (36, 37)
-
-
38. A method for synchronizing a video sequence having an accompanying audio track with a different audio track, comprising the steps of:
-
analyzing the audio track accompanying said video sequence to identify individual speech characteristics in said audio track; analyzing a sound utterance in said different audio track by means of automatic speech recognition techniques to identify individual speech characteristics in said sound utterance; and reordering frames of said video sequence so that identified individual speech characteristics in said video sequence are temporally aligned with corresponding individual speech characteristics in said sound utterance.
-
-
39. A method for modifying a video recording that is associated with a first audio track to produce a video presentation corresponding to a second audio track, comprising the steps of:
-
analyzing said video recording to identify sequences of video frames that are associated with individual features in said first audio track, and storing said sequences of frames in a database in accordance with said identified features; analyzing said second audio track to identify individual features therein; selecting sequences of frames stored in said database according to the identified features in said second audio track, and assembling the selected sequences of frames to form a video stream that is synchronized to said second audio track.
-
Specification