Fast vocabulary independent method and apparatus for spotting words in speech
First Claim
Patent Images
1. A fast vocabulary independent method for spotting words in speech comprising the steps of:
- converting a speech waveform into a representation comprising phone-ngrams and a corresponding time interval of occurrence of each of the phone-ngrams;
receiving phone-ngrams of at least one input word;
performing a coarse match by selecting time intervals of the speech waveform having phone-ngrams that correspond to the phone-ngrams of the at least one input word; and
performing a detailed acoustic match at the selected time intervals.
2 Assignments
0 Petitions
Accused Products
Abstract
A fast vocabulary independent method for spotting words in speech utilizes a preprocessing step and a coarse-to-detailed search strategy for spotting a word/phone sequence in speech. The preprocessing includes a Viterbi-beam phone level decoding using a tree-based phone language model. The coarse search matches phone-ngrams to identify regions of speech as putative word hits, and the detailed search performs an acoustic match at the putative hits with a model of the given word included in the vocabulary of the recognizer.
79 Citations
20 Claims
-
1. A fast vocabulary independent method for spotting words in speech comprising the steps of:
-
converting a speech waveform into a representation comprising phone-ngrams and a corresponding time interval of occurrence of each of the phone-ngrams; receiving phone-ngrams of at least one input word; performing a coarse match by selecting time intervals of the speech waveform having phone-ngrams that correspond to the phone-ngrams of the at least one input word; and performing a detailed acoustic match at the selected time intervals. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A fast vocabulary independent method for spotting words in speech comprising the steps of:
-
converting a speech waveform into a table comprising phone-ngrams and the corresponding times of occurrence in the speech waveform of the phone-ngrams; generating phone-ngrams of at least one input word; performing a coarse match by implementing the table to identify time intervals of the input waveform having phone-ngrams associated therewith that correspond to the phone-ngrams of the at least one input word; and performing a detailed acoustic match at each of the identified time intervals of the speech waveform to finally decide whether the at least one input word was actually uttered in the identified time intervals. - View Dependent Claims (12, 13)
-
-
14. A fast vocabulary independent method for spotting words in speech for use in voice mail retrieval systems and browsing and searching audio/video content, the method comprising the steps of:
-
receiving a search query from a user; determining a phonetic baseform for each word of the search query and converting each baseform to phone-ngrams; identifying the locations of the search query words in an audio/video database by comparing phone-ngrams of the search query words and phone-ngrams of at least one audio waveform in the audio/video database; and retrieving segments of the at least one audio waveform and corresponding video segments that are relevant to the received query. - View Dependent Claims (15, 16, 17)
-
-
18. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a fast vocabulary independent method for spotting words in speech, the method steps comprising:
-
converting a speech waveform into a table comprising phone-ngrams and the corresponding times of occurrence in the speech waveform of the phone-ngrams; generating phone-ngrams of at least one input word; performing a coarse match by implementing the table to identify time intervals of the speech waveform having phone-ngrams associated therewith that correspond to the phone-ngrams of the at least one input word; and performing a detailed acoustic match at each of the identified time intervals of the speech waveform to finally decide whether the at least one input word was actually uttered in the identified time intervals. - View Dependent Claims (19, 20)
-
Specification