Methods and Systems for Performing Synchronization of Audio with Corresponding Textual Transcriptions and Determining Confidence Values of the Synchronization
First Claim
1. A method of processing audio signals, comprising:
- receiving an audio signal comprising vocal elements;
a processor performing an alignment of the vocal elements with corresponding textual transcriptions of the vocal elements;
based on the alignment, determining timing boundary information associated with an elapsed amount of time for a duration of a portion of the vocal elements; and
outputting a confidence metric indicating a level of certainty for the timing boundary information for the duration of the portion of the vocal elements.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for performing audio synchronization with corresponding textual transcription and determining confidence values of the timing-synchronization are provided. Audio and a corresponding text (e.g., transcript) may be synchronized in a forward and reverse direction using speech recognition to output a time-annotated audio-lyrics synchronized data. Metrics can be computed to quantify and/or qualify a confidence of the synchronization. Based on the metrics, example embodiments describe methods for enhancing an automated synchronization process to possibly adapted Hidden Markov Models (HMMs) to the synchronized audio for use during the speech recognition. Other examples describe methods for selecting an appropriate HMM for use.
-
Citations
30 Claims
-
1. A method of processing audio signals, comprising:
-
receiving an audio signal comprising vocal elements; a processor performing an alignment of the vocal elements with corresponding textual transcriptions of the vocal elements; based on the alignment, determining timing boundary information associated with an elapsed amount of time for a duration of a portion of the vocal elements; and outputting a confidence metric indicating a level of certainty for the timing boundary information for the duration of the portion of the vocal elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer readable storage medium having stored therein instructions executable by a computing device to cause the computing device to perform functions of:
-
receiving an audio signal comprising vocal elements; performing an alignment of the vocal elements with corresponding textual transcriptions of the vocal elements; based on the alignment, determining timing boundary information associated with an elapsed amount of time for a duration of a portion of the vocal elements; and outputting a confidence metric indicating a level of certainty for the timing boundary information for the duration of the portion of the vocal elements. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
-
26. A system comprising:
-
a Hidden Markov Model (HMM) database that includes phonetic modeling of words; a pronunciation dictionary database that includes grammars representing words; and a speech decoder that receives an audio signal and accesses the HMM to map vocal elements in the audio signal to phonetic descriptions and accesses the pronunciation dictionary database to map the phonetic descriptions to grammars, the speech decoder further performing an alignment of the grammars with corresponding textual transcriptions of the vocal elements, wherein the speech decoder determines timing boundary information associated with an elapsed amount of time for a duration of a portion of the vocal elements, and the speech decoder determines a confidence metric indicating a level of certainty for the timing boundary information for the duration of the portion of the vocal elements. - View Dependent Claims (27, 28, 29, 30)
-
Specification