Method for aligning text with audio signals
First Claim
1. A computerized method for aligning text segments of a text file with audio segments of an audio file, comprising the steps of:
- generating a vocabulary and language model from the text file, generation of said model involving determination of relative probabilities of all one, two, and three word sequences in all unaligned text segments of the text file based upon frequencies of occurrences of said sequences in said unaligned text segments, all of said text segments being initially classified as unaligned text segments;
recognizing a word list from the audio segments using the vocabulary and language model but without considering the text file;
aligning the word list with the text segments based upon respective scores for all possible alignments of words in the word list with the text segments, each respective score being weighted to increase each respective score by a relatively greater amount if a respective alignment associated with the respective score involves relatively longer sequences of correctly aligned words;
choosing corresponding anchors in the word list and text segments in accordance with the respective scores;
partitioning the text and the audio segments into unaligned and aligned text and audio segments according to the anchors; and
repeating the generating, recognizing, aligning, choosing, and partitioning steps with the unaligned text and audio segments until a termination condition is reached.
3 Assignments
0 Petitions
Accused Products
Abstract
In a computerized method, text segments of a text file are aligned with audio segments of an audio file. The text file includes written words, and the audio file includes spoken words. A vocabulary and language model are generated from the text segment. A word list is recognized from the audio segment using the vocabulary and language model. The word list is aligned with the text segment, and corresponding anchors are chosen in the word list and text segment. Using the anchors, the text segment and the audio segment are partitioned into unaligned and aligned segments according to the anchors. These steps are repeated for any unaligned segments until a termination condition is reached.
-
Citations
22 Claims
-
1. A computerized method for aligning text segments of a text file with audio segments of an audio file, comprising the steps of:
-
generating a vocabulary and language model from the text file, generation of said model involving determination of relative probabilities of all one, two, and three word sequences in all unaligned text segments of the text file based upon frequencies of occurrences of said sequences in said unaligned text segments, all of said text segments being initially classified as unaligned text segments; recognizing a word list from the audio segments using the vocabulary and language model but without considering the text file; aligning the word list with the text segments based upon respective scores for all possible alignments of words in the word list with the text segments, each respective score being weighted to increase each respective score by a relatively greater amount if a respective alignment associated with the respective score involves relatively longer sequences of correctly aligned words; choosing corresponding anchors in the word list and text segments in accordance with the respective scores; partitioning the text and the audio segments into unaligned and aligned text and audio segments according to the anchors; and repeating the generating, recognizing, aligning, choosing, and partitioning steps with the unaligned text and audio segments until a termination condition is reached. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An apparatus for aligning text segments of a text file with audio segments of an audio file, comprising:
-
an analyzer for analyzing the text segments to generate a vocabulary and language mode, generation of said model involving determination of relative probabilities of all one, two, and three word sequences in all unaligned text segments of the text file based upon frequencies of occurrences of said sequences in said unaligned text segments, all of the text segments of the text file being initially classified as unaligned text segments; a speech recognizer for generating a word list from the audio segments using the vocabulary and language model but without considering the text file; an aligner for aligning the word list with the text segments based upon respective scores for all possible alignments of words in the word list with the text segments, each respective score being weighted to increase each respective score by a relatively greater amount if a respective alignment associated with the respective score involves relatively longer sequences of correctly aligned words; an anchor choosing mechanism for choosing corresponding anchors in the word list and text segments in accordance with the respective scores; a partitioning mechanism for partitioning the text and the audio segments into unaligned and aligned segments according to the anchors; and a repetition mechanism for repeating the generating, recognizing, aligning, choosing, and partitioning steps with the unaligned segments until a termination condition is reached.
-
Specification