Fast out-of-vocabulary search in automatic speech recognition systems
First Claim
1. A method comprising:
- receiving, on a computer system, a text search query;
searching, on the computer system, a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files and one or more sub-words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising;
identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary, each of the one or more query words comprising one or more sub-words;
identifying segments of the speech recognition processed audio files, each of the segments comprising audio data, the segments being more likely than other portions of the audio file to include at least one of the identified query words, by searching the metadata for instances of the sub-words of the one or more query words to identify one or more anchor segments of the audio files and expanding the one or more anchor segments, each of the anchor segments including a start time and an end time within the audio file, the segments of the audio files comprising the anchor segments;
performing speech recognition on the audio data of the identified segments for instances of the one or more identified query words using a constrained grammar comprising the one or more identified query words not in the vocabulary; and
returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words.
4 Assignments
0 Petitions
Accused Products
Abstract
A method including: receiving, on a computer system, a text search query, the query including one or more query words; generating, on the computer system, for each query word in the query, one or more anchor segments within a plurality of speech recognition processed audio files, the one or more anchor segments identifying possible locations containing the query word; post-processing, on the computer system, the one or more anchor segments, the post-processing including: expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and searching, on the computer system, the post-processed one or more anchor segments for instances of at least one of the one or more query words using a constrained grammar.
128 Citations
20 Claims
-
1. A method comprising:
-
receiving, on a computer system, a text search query; searching, on the computer system, a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files and one or more sub-words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising; identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary, each of the one or more query words comprising one or more sub-words; identifying segments of the speech recognition processed audio files, each of the segments comprising audio data, the segments being more likely than other portions of the audio file to include at least one of the identified query words, by searching the metadata for instances of the sub-words of the one or more query words to identify one or more anchor segments of the audio files and expanding the one or more anchor segments, each of the anchor segments including a start time and an end time within the audio file, the segments of the audio files comprising the anchor segments; performing speech recognition on the audio data of the identified segments for instances of the one or more identified query words using a constrained grammar comprising the one or more identified query words not in the vocabulary; and returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
a processor; and memory, the memory having instructions that, when executed by the processor, cause the processor to; receive a text search query; search a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files and one or more sub-words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising; identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary, each of the one or more query words comprising one or more sub-words; identifying segments of the speech recognition processed audio files, each of the segments comprising audio data, the segments being more likely than other portions of the audio file to include at least one of the identified query words, by searching the metadata for instances of the sub-words of the one or more query words to identify one or more anchor segments of the audio files and expanding the one or more anchor segments, each of the anchor segments including a start time and an end time within the audio file, the segments of the audio files comprising the anchor segments; performing speech recognition on the audio data of the identified segments for instances of the identified query words using a constrained grammar comprising the one or more identified query words not in the vocabulary; and returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification