TRANSCRIPT ALIGNMENT
First Claim
1. One or more processor readable storage devices having code embodied on said storage devices, said code for programming one or more processors to perform a method comprising:
- accepting a script associated with a multimedia recording, wherein the script includes dialogue, speaker indications and video event indications;
forming a plurality of search terms from the dialogue, each search term associated with a location within the script;
determining zero or more putative locations of each of the search terms in a time interval of the multimedia recording, including for at least some of the search terms, determining multiple putative locations in the time interval of the multimedia recording;
partially aligning the time interval of the multimedia recording and the script using the determined putative locations of the search terms and one or more of the following;
a result of matching audio characteristics of the multimedia recording with the speaker indications, and a result of matching video characteristics of the multimedia recording with the video event indications;
using a result of the partial alignment to generate event-localization information; and
enabling further processing of the generated event-localization information.
8 Assignments
0 Petitions
Accused Products
Abstract
Some general aspects relate to systems, software, and methods for media processing. In one aspect, a script associated with a multimedia recording is accepted, wherein the script includes dialogue, speaker indications and video event indications. A group of search terms are formed from the dialogue, with each search term being associated with a location within the script. Zero or more putative locations of each of the search terms are identified in a time interval of the multimedia recording. For at least some of the search terms, multiple putative locations are identified in the time interval of the multimedia recording. The time interval of the multimedia recording and the script are partially aligned using the determined putative locations of the search terms and one or more of the following: a result of matching audio characteristics of the multimedia recording with the speaker indications, and a result of matching video characteristics of the multimedia recording with the video event indications. Based on a result of the partial alignment, event-localization information is generated. Further processing of the generated event-localization information is enabled.
156 Citations
32 Claims
-
1. One or more processor readable storage devices having code embodied on said storage devices, said code for programming one or more processors to perform a method comprising:
-
accepting a script associated with a multimedia recording, wherein the script includes dialogue, speaker indications and video event indications; forming a plurality of search terms from the dialogue, each search term associated with a location within the script; determining zero or more putative locations of each of the search terms in a time interval of the multimedia recording, including for at least some of the search terms, determining multiple putative locations in the time interval of the multimedia recording; partially aligning the time interval of the multimedia recording and the script using the determined putative locations of the search terms and one or more of the following;
a result of matching audio characteristics of the multimedia recording with the speaker indications, and a result of matching video characteristics of the multimedia recording with the video event indications;using a result of the partial alignment to generate event-localization information; and enabling further processing of the generated event-localization information. - View Dependent Claims (2, 3)
-
-
5. One or more processor readable storage devices having code embodied on said storage devices, said code for programming one or more processors to perform a method comprising:
-
accepting a script associated with a multimedia recording, wherein the script includes dialogue-based script elements and non-dialogue-based script elements; forming a plurality of search terms from the dialogue-based script elements, each search term associated with a location within the script; determining zero or more putative locations of each of the search terms in a time interval of the multimedia recording, including for at least some of the search terms, determining multiple putative locations in the time interval of the multimedia recording; generating a model that maps at least some of the script elements onto corresponding media elements of the multimedia recording based at least in part on the determined putative locations of the search terms; and enabling localization of the multimedia recording using the generated model. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. One or more processor readable storage devices having code embodied on said storage devices, said code for programming one or more processors to perform a method comprising:
-
accepting a script that is at least partially aligned to a time interval of a multimedia recording, wherein the script includes a plurality of script segments each associated with a corresponding location in the time interval of the multimedia recording; processing the script to segment the multimedia recording to form a plurality of multimedia recording segments, including associating each script segment with a corresponding multimedia recording segment; and forming a visual representation of the script during a presentation of the multimedia recording that includes successive presentations of one or more multimedia recording segments, including, for each one of the successive presentations of one or more multimedia recording segments, forming a respective visual representation of the script segment associated with the corresponding multimedia recording segment. - View Dependent Claims (4, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. One or more processor readable storage devices having code embodied on said storage devices, said code for programming one or more processors to perform a method comprising:
-
accepting a script that is at least partially aligned to a time interval of a first multimedia recording, wherein the script includes a plurality of script segments each associated with a corresponding location in the time interval of the first multimedia recording; accepting a second multimedia recording associated with the multimedia recording; forming a plurality of search terms from the script elements in the script, each search term associated with a location within the script; determining zero or more putative locations of each of the search terms in a time interval of the second multimedia recording, including for at least some of the search terms, determining multiple putative locations in the time interval of the second multimedia recording; generating a model that maps at least some of the script elements onto corresponding media elements of the second multimedia recording based at least in part on the determined putative locations of the search terms; associating at least one media element in the first multimedia recording with a corresponding media element in the second multimedia recording according to the generated model and the partial alignment of the script to the first multimedia recording. - View Dependent Claims (27)
-
-
28. One or more processor readable storage devices having code embodied on said storage devices, said code for programming one or more processors to perform a method comprising:
-
accepting, from a source of a first identity, a first script that is at least partially aligned to a time interval of a multimedia recording; accepting, from a source of a second identity different from the first identity, a second script that is at least partially aligned to the time interval of the multimedia recording; comparing a quality of alignment of the first script to the multimedia recording with a quality of alignment of the second script to the multimedia recording; and based on a result of the comparison, selecting one script from the first and the second script for use in a presentation of the multimedia recording. - View Dependent Claims (29)
-
-
30. One or more processor readable storage devices having code embodied on said storage devices, said code for programming one or more processors to perform a method comprising:
-
accepting a script that is at least partially aligned to a time interval of a multimedia recording, wherein the script includes a plurality of script segments each associated with a corresponding location in the time interval of the multimedia recording, and the multimedia recording includes a multimedia segment not represented in the script; determining a sequential order of the plurality of script segments based on their corresponding locations in the time interval of the multimedia recording; and identifying, in the sequential order of the plurality of script segments, a location associated with the multimedia not represented in the script, including, for each script element; computing an actual time lapse from its immediate preceding script element based on their corresponding locations in the time interval of the multimedia recording; and comparing the actual time lapse with an expected time lapse determined according to a voice characteristic. - View Dependent Claims (31, 32)
-
Specification