Speech index pruning
First Claim
Patent Images
1. A computer-implemented method of searching a speech index comprising:
- accessing a speech index of a plurality of spoken documents each comprising a collection of speech signals using a processor, wherein the speech index comprises a plurality of word entries, each entry identifying a plurality of candidate positions for a word in the plurality of speech signals and a probability of the word appearing at each of the candidate positions given the corresponding speech signal;
receiving a search query comprising a target word from a user using the processor;
receiving a first threshold value from the user using the processor;
searching the speech index for one of the word entries that matches the target word of the search query using the processor;
retrieving from the matched entry the plurality of candidate positions for the target word and the probability of the word appearing at each of the candidate positions, using the processor;
eliminating the candidate positions based on a comparison of the probability of the word appearing at a candidate position and the first threshold value;
ranking the speech signals based on the probabilities of the remaining candidate positions relative to each other to form ranked speech signals using the processor; and
returning search results to the user based on the ranked speech signals using the processor, the search results comprising an identification of one or more of the spoken documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.
73 Citations
9 Claims
-
1. A computer-implemented method of searching a speech index comprising:
-
accessing a speech index of a plurality of spoken documents each comprising a collection of speech signals using a processor, wherein the speech index comprises a plurality of word entries, each entry identifying a plurality of candidate positions for a word in the plurality of speech signals and a probability of the word appearing at each of the candidate positions given the corresponding speech signal; receiving a search query comprising a target word from a user using the processor; receiving a first threshold value from the user using the processor; searching the speech index for one of the word entries that matches the target word of the search query using the processor; retrieving from the matched entry the plurality of candidate positions for the target word and the probability of the word appearing at each of the candidate positions, using the processor; eliminating the candidate positions based on a comparison of the probability of the word appearing at a candidate position and the first threshold value; ranking the speech signals based on the probabilities of the remaining candidate positions relative to each other to form ranked speech signals using the processor; and returning search results to the user based on the ranked speech signals using the processor, the search results comprising an identification of one or more of the spoken documents. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method of searching a speech index comprising:
-
accessing a speech index of a plurality of spoken documents each comprising a collection of speech signals using a processor, the speech index comprising positions of words in the speech signals, and a probability of the words appearing at each of the positions given the corresponding speech signal; receiving a search query from a user using the processor; receiving a first threshold value from the user using the processor; eliminating the positions of words from the speech index based on a comparison of the probability of a word appearing at a position to the first threshold value to form a pruned speech index using the processor; searching the pruned speech index for an entry associated with a word in the search query using the processor; retrieving from the entry candidate positions for the word and their probabilities using the processor; ranking the speech signals based on the probabilities of the candidate positions to form ranked speech signals using the processor; and returning search results to the user based on the ranked speech signals using the processor, the search results comprising an identification of one or more of the spoken documents. - View Dependent Claims (6, 7)
-
-
8. A computer-implemented method of searching a speech index comprising:
-
accessing a speech index of a plurality of spoken documents each comprising a collection of speech signals using a processor; receiving a search query from a user using the processor; modifying a word of the search query using the processor; receiving a first threshold value from the user using the processor; searching the speech index for an entry associated with the modified word using the processor; retrieving from the entry a plurality of candidate positions for the modified word in a plurality of the speech signals, and a probability of the modified word appearing at each of the candidate positions given the corresponding speech signal, using the processor; eliminating the candidate positions based on a comparison of their probabilities to the first threshold value using the processor; ranking the speech signals based on the probabilities of the remaining candidate positions relative to each other to form ranked speech signals using the processor; and returning search results to the user based on the ranked speech signals using the processor, the search results comprising an identification of one or more of the spoken documents. - View Dependent Claims (9)
-
Specification