Method and apparatus for aligning texts
First Claim
Patent Images
1. A method for aligning texts, said method comprising:
- acquiring a target text and a reference text,said target text comprising recognized text acquired by performing speech recognition on speech data, andsaid reference text being associated with said speech data; and
aligning said target text and said reference text at a word level based on phoneme similarity between phonemes in words in said target text and corresponding phonemes in corresponding words in said reference text, said aligning comprising;
parsing said phonemes of said words in said target text and said corresponding phonemes of said corresponding words in said reference text; and
computing a path penalty value by using a dynamic time warping DTW algorithm with said phoneme similarity, and finding a best path matching said target text and said reference text with said path penalty value, said aligning of said target text and said reference text being based on said best path, andsaid acquiring and said aligning being performed by a programmed data processing system.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.
-
Citations
24 Claims
-
1. A method for aligning texts, said method comprising:
-
acquiring a target text and a reference text, said target text comprising recognized text acquired by performing speech recognition on speech data, and said reference text being associated with said speech data; and aligning said target text and said reference text at a word level based on phoneme similarity between phonemes in words in said target text and corresponding phonemes in corresponding words in said reference text, said aligning comprising; parsing said phonemes of said words in said target text and said corresponding phonemes of said corresponding words in said reference text; and computing a path penalty value by using a dynamic time warping DTW algorithm with said phoneme similarity, and finding a best path matching said target text and said reference text with said path penalty value, said aligning of said target text and said reference text being based on said best path, and said acquiring and said aligning being performed by a programmed data processing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus for aligning texts, said apparatus comprising:
-
an input acquiring a target text and a reference text, said target text comprising recognized text acquired by performing speech recognition on speech data, and said reference text being associated with said speech data; and a data processing machine operatively connected to said input and comprising a word alignment module aligning said target text and said reference text at a word level based on phoneme similarity between phonemes in words in said target text and corresponding phonemes in corresponding words in said reference text, said word alignment module comprising the following for performing said aligning of said target text and said reference text at said word level based on said phoneme similarity; a parsing module parsing said phonemes in said words in said target text and said corresponding phonemes in said corresponding words in said reference text; a dynamic time warping DTW module computing a path penalty value by using a dynamic time warping DTW algorithm with said phoneme similarity, and finding a best path matching said target text and said reference text with said path penalty value; and an alignment sub-module aligning said target text and said reference text based on said best path. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for archiving a multimedia resource, said method comprising:
-
acquiring an original multimedia resource and a reference text, said original multimedia resource comprising speech data and said reference text being associated with said speech data; recognizing said speech data to generate a target text, said recognizing comprising performing a speech recognition process on said speech data; aligning said target text and said reference text at a word level based on phoneme similarity between phonemes in words in said target text and corresponding phonemes in corresponding words in said reference text, said aligning comprising; parsing said phonemes of said words in said target text and said corresponding phonemes of said corresponding words in said reference text; and computing a path penalty value by using a dynamic time warping DTW algorithm with said phoneme similarity, and finding a best path matching said target text and said reference text with said path penalty value, said aligning of said target text and said reference text being based on said best path; establishing a temporal link between said speech data and said reference text based on alignment of said target text and said reference text; and adding said temporal link to said original multimedia resource to generate a new multimedia resource archive file, said acquiring, said recognizing, said aligning, said establishing, and said adding being performed by a programmed data processing machine.
-
-
24. A method for searching a multimedia resource, said multimedia resource comprising speech data and reference text associated with said speech data and said method comprising:
-
acquiring a key word for search; acquiring a reference text and a target text, said target text comprising recognized text acquired by performing speech recognition on speech data from a multimedia resource, said reference text being associated with said speech data, said reference text and said target text being aligned at a word level based on phoneme similarity between phonemes in words in said target text and corresponding phonemes in corresponding words in said reference text, wherein alignment of said reference text and said target text having been performed by; parsing said phonemes of said words in said target text and said corresponding phonemes of said corresponding words in said reference text; computing a path penalty value by using a dynamic time warping DTW algorithm with said phoneme similarity, and finding a best path matching said target text and said reference text with said path penalty value; and aligning said target text and said reference text being based on said best path, and said reference text and said speech data further having an established temporal link based on the alignment; searching for and identifying a location of said key word in said reference text; and locating a part of said multimedia resource corresponding to said key word in based on said location of said key word in said reference text and based on said temporal link; said acquiring of said key word, said acquiring of said multimedia resource, said searching and identifying, and said locating being performed by a programmed data processing machine.
-
Specification