Method and apparatus for indexing speech
First Claim
Patent Images
1. A method of indexing a speech segment, the method comprising:
- identifying a lattice of speech unit sequences by performing speech recognition on the speech segment, the lattice comprising states with transitions between states, each transition representing a speech unit and all transitions into a state representing the same speech unit;
for each transition into each state, determining all possible speech unit positions that the speech unit for the transition occupies in all speech unit sequences that pass through the transition;
for each state in the lattice, defining at least one sub-state with a separate sub-state for each different speech unit position determined from all of the transitions into the state such that sub-states for two different states represent a same speech unit at a same speech unit position;
calculating a position score for each sub-state;
summing the position scores of sub-states for different states, where the sub-states represent the same speech unit at the same speech unit position, to form a posterior probability for the speech unit in the speech unit position;
for each speech unit in the lattice of speech unit sequences, placing information in an entry in the index that indicates a speech unit position of the speech unit in at least one speech unit sequence and the posterior probability of the speech unit in the speech unit position.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of indexing a speech segment includes identifying at least two alternative word sequences based on the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. The information indicates the position of the word in at least one of the alternative sequences.
88 Citations
8 Claims
-
1. A method of indexing a speech segment, the method comprising:
-
identifying a lattice of speech unit sequences by performing speech recognition on the speech segment, the lattice comprising states with transitions between states, each transition representing a speech unit and all transitions into a state representing the same speech unit; for each transition into each state, determining all possible speech unit positions that the speech unit for the transition occupies in all speech unit sequences that pass through the transition; for each state in the lattice, defining at least one sub-state with a separate sub-state for each different speech unit position determined from all of the transitions into the state such that sub-states for two different states represent a same speech unit at a same speech unit position; calculating a position score for each sub-state; summing the position scores of sub-states for different states, where the sub-states represent the same speech unit at the same speech unit position, to form a posterior probability for the speech unit in the speech unit position; for each speech unit in the lattice of speech unit sequences, placing information in an entry in the index that indicates a speech unit position of the speech unit in at least one speech unit sequence and the posterior probability of the speech unit in the speech unit position. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer storage medium having computer-executable instructions for performing steps comprising:
-
receiving a search query; searching an index for an entry associated with a word in the search query; for each of a plurality of speech signals, retrieving from the entry an identifier for the speech signal, an identifier for a segment of the speech signal, a position for the word within the segment, and a probability of the word appearing at the position within the segment given the speech signal; using the probabilities to rank the speech signals relative to each other to form ranked speech signals by forming a weighted sum for each speech signal as;
where D is the identifier for the speech signal, Q is the search query, WN is the weight associated with a particular N-gram, K is the number of words in the search query and SN-gram(D, Q) is computed as;
where N is the number of words in a particular n-gram, and S(D, qi . . . qi+N−
1) is the score for a single n-gram beginning at point i in the query, which is calculated as;
where the inner summation on the right-hand side is performed over the first k−
N word positions in a segment, the outer summation is performed across all segments associated with speech signal D, and P(Wk+1(s)=)qi+1|D) is the posterior probability stored in the index for the query word qi+1 appearing at position k+1 within segment s given speech signal D; andreturning search results based on the ranked speech signals.
-
-
7. A method of searching for speech segments, the method comprising:
-
accessing an index containing probabilities of positions for words generated from a plurality of speech recognition lattices, each lattice being generated from a separate speech segment and each lattice representing a plurality of word sequences; retrieving a set of probabilities of positions for a word from the index a probability of a position for a word represents a sum of probabilities of the position for the word determined at multiple states in the lattice; and returning identifiers for speech segments that contain the word based on the set of probabilities. - View Dependent Claims (8)
-
Specification