Fast out-of-vocabulary search in automatic speech recognition systems
First Claim
1. A method comprising:
- receiving, on a computer system, a text search query, the query comprising one or more query words;
generating, on the computer system, for each query word in the query, a set of one or more anchor segments from searching metadata corresponding to a plurality of speech recognition processed audio files, the metadata including representations of one or more words detected in the audio files, wherein, for each detected word, the metadata includes a reference to each audio file in which the word was detected, a temporal location of the detected word in the audio file, and a confidence measure for the word as detected within the audio file, where each anchor segment includes a query word, an identifier for an audio file, and a temporal location of the query word within the audio file, where generating anchor segments includes, for each query word;
determining, on the computer system, if the query word is included in a vocabulary of a learning model for a speech recognizer engine of the computer system;
on the computer system, when the query word is in the vocabulary, searching the metadata to identify one or more high confidence anchor segments corresponding to the query word; and
on the computer system, when the query word is not in the vocabulary;
generating a search list of one or more sub-words of the query word,searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify one or more anchor segments corresponding to one or more of the sub-words;
post-processing, on the computer system, the one or more anchor segments, the post-processing comprising;
expanding the one or more anchor segments;
sorting the one or more anchor segments; and
merging overlapping ones of the one or more anchor segments; and
performing, on the computer system, speech recognition on the post-processed one or more expanded anchor segments for instances of at least one of the one or more query words using a constrained grammar.
9 Assignments
0 Petitions
Accused Products
Abstract
A method including: receiving, on a computer system, a text search query, the query including one or more query words; generating, on the computer system, for each query word in the query, one or more anchor segments within a plurality of speech recognition processed audio files, the one or more anchor segments identifying possible locations containing the query word; post-processing, on the computer system, the one or more anchor segments, the post-processing including: expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and searching, on the computer system, the post-processed one or more anchor segments for instances of at least one of the one or more query words using a constrained grammar.
83 Citations
19 Claims
-
1. A method comprising:
-
receiving, on a computer system, a text search query, the query comprising one or more query words; generating, on the computer system, for each query word in the query, a set of one or more anchor segments from searching metadata corresponding to a plurality of speech recognition processed audio files, the metadata including representations of one or more words detected in the audio files, wherein, for each detected word, the metadata includes a reference to each audio file in which the word was detected, a temporal location of the detected word in the audio file, and a confidence measure for the word as detected within the audio file, where each anchor segment includes a query word, an identifier for an audio file, and a temporal location of the query word within the audio file, where generating anchor segments includes, for each query word; determining, on the computer system, if the query word is included in a vocabulary of a learning model for a speech recognizer engine of the computer system; on the computer system, when the query word is in the vocabulary, searching the metadata to identify one or more high confidence anchor segments corresponding to the query word; and on the computer system, when the query word is not in the vocabulary; generating a search list of one or more sub-words of the query word, searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify one or more anchor segments corresponding to one or more of the sub-words; post-processing, on the computer system, the one or more anchor segments, the post-processing comprising; expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and performing, on the computer system, speech recognition on the post-processed one or more expanded anchor segments for instances of at least one of the one or more query words using a constrained grammar. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising a computer system comprising a processor, memory, and storage, the system being configured to:
-
receive a text search query, the query comprising one or more query words; generate, for each query word in the query, a set of one or more anchor segments from searching metadata corresponding to a plurality of speech recognition processed audio files, the metadata including representations of one or more words detected in the audio files, wherein, for each detected word, the metadata includes a reference to each audio file in which the word was detected, a temporal location of the detected word in the audio file, and a confidence measure for the word as detected within the audio file, where each anchor segment includes a query word, an identifier for an audio file, and a temporal location of the query word within the audio file, where generating anchor segments includes, for each query word, the computer system; determining if the query word is included in a vocabulary of a learning model for a speech recognizer engine of the computer system; when the query word is in the vocabulary, searching the metadata to identify one or more high confidence anchor segments corresponding to the query word; and when the query word is not in the vocabulary; generating a search list of one or more sub-words of the query word, searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify one or more anchor segments corresponding to one or more of the sub-words; post-process the one or more anchor segments, the post-process comprising; expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and perform speech recognition on the post-processed one or more expanded anchor segments for instances of at least one of the one or more query words using a constrained grammar. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising
means for receiving a text search query, the query comprising one or more query words; -
means for generating, for each query word in the query, a set of one or more anchor segments from searching metadata corresponding to a plurality of speech recognition processed audio files, the metadata including representations of one or more words detected in the audio files, wherein, for each detected word, the metadata includes a reference to each audio file in which the word was detected, a temporal location of the detected word in the audio file, and a confidence measure for the word as detected within the audio file, where each anchor segment includes a query word, an identifier for an audio file, and a temporal location of the query word within the audio file, where the means for generating anchor segments includes, for each query word; means for determining if the query word is included in a vocabulary of a learning model for a speech recognizer engine of the computer system; when the query word is in the vocabulary, means for searching the metadata to identify one or more high confidence anchor segments corresponding to the query word; and means for, when the query word is not in the vocabulary; generating a search list of one or more sub-words of the query word, searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify one or more anchor segments corresponding to one or more of the sub-words; means for post-processing the one or more anchor segments comprising; means for expanding the one or more anchor segments; means for sorting the one or more anchor segments; and means for merging overlapping ones of the one or more anchor segments; and means for searching the post-processed one or more expanded anchor segments for instances of at least one of the one or more query words using a constrained grammar.
-
Specification