Method and apparatus for indexing speech

US 7,634,407 B2
Filed: 05/20/2005
Issued: 12/15/2009
Est. Priority Date: 05/20/2005
Status: Active Grant

First Claim

Patent Images

1. A method of indexing a speech segment, the method comprising:

identifying a lattice of speech unit sequences by performing speech recognition on the speech segment, the lattice comprising states with transitions between states, each transition representing a speech unit and all transitions into a state representing the same speech unit;

for each transition into each state, determining all possible speech unit positions that the speech unit for the transition occupies in all speech unit sequences that pass through the transition;

for each state in the lattice, defining at least one sub-state with a separate sub-state for each different speech unit position determined from all of the transitions into the state such that sub-states for two different states represent a same speech unit at a same speech unit position;

calculating a position score for each sub-state;

summing the position scores of sub-states for different states, where the sub-states represent the same speech unit at the same speech unit position, to form a posterior probability for the speech unit in the speech unit position;

for each speech unit in the lattice of speech unit sequences, placing information in an entry in the index that indicates a speech unit position of the speech unit in at least one speech unit sequence and the posterior probability of the speech unit in the speech unit position.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of indexing a speech segment includes identifying at least two alternative word sequences based on the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. The information indicates the position of the word in at least one of the alternative sequences.

88 Citations

View as Search Results

8 Claims

1. A method of indexing a speech segment, the method comprising:
- identifying a lattice of speech unit sequences by performing speech recognition on the speech segment, the lattice comprising states with transitions between states, each transition representing a speech unit and all transitions into a state representing the same speech unit;
  
  for each transition into each state, determining all possible speech unit positions that the speech unit for the transition occupies in all speech unit sequences that pass through the transition;
  
  for each state in the lattice, defining at least one sub-state with a separate sub-state for each different speech unit position determined from all of the transitions into the state such that sub-states for two different states represent a same speech unit at a same speech unit position;
  
  calculating a position score for each sub-state;
  
  summing the position scores of sub-states for different states, where the sub-states represent the same speech unit at the same speech unit position, to form a posterior probability for the speech unit in the speech unit position;
  
  for each speech unit in the lattice of speech unit sequences, placing information in an entry in the index that indicates a speech unit position of the speech unit in at least one speech unit sequence and the posterior probability of the speech unit in the speech unit position.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein determining a position score for a sub-state comprises determining the probability of reaching the sub-state from the beginning of the lattice and determining the probability of reaching the end of the lattice from the sub-state.
  - 3. The method of claim 1 further comprising placing separate information in an entry for a speech unit to indicate a plurality of positions in the at least two alternative speech unit sequences where the speech unit appears.
  - 4. The method of claim 1 wherein the speech unit comprises a word.
  - 5. The method of claim 1 wherein the speech unit comprises a sub-word.

6. A computer storage medium having computer-executable instructions for performing steps comprising:
- receiving a search query;
  
  searching an index for an entry associated with a word in the search query;
  
  for each of a plurality of speech signals, retrieving from the entry an identifier for the speech signal, an identifier for a segment of the speech signal, a position for the word within the segment, and a probability of the word appearing at the position within the segment given the speech signal;
  
  using the probabilities to rank the speech signals relative to each other to form ranked speech signals by forming a weighted sum for each speech signal as;
  
  $S (D, Q) = \sum_{N = 1}^{K} w_{N} \cdot S_{N - gram} (D, Q)$
  
  where D is the identifier for the speech signal, Q is the search query, W_Nis the weight associated with a particular N-gram, K is the number of words in the search query and S_N-gram(D, Q) is computed as;
  
  $S_{N - gram} (D, Q) = \sum_{i = 1}^{K - N + 1} S (D, q_{i} \dots q_{i + N - 1})$
  
  where N is the number of words in a particular n-gram, and S(D, q_i. . . q_i+N−
  
  1) is the score for a single n-gram beginning at point i in the query, which is calculated as;
  
  $S (D, q_{i} \dots q_{i + N - 1}) = \log [1 + \sum_{s} \sum_{k} \prod_{l = 0}^{N - 1} P (w_{k + l} (s) = q_{i + l} ❘ D)]$
  
  where the inner summation on the right-hand side is performed over the first k−
  
  N word positions in a segment, the outer summation is performed across all segments associated with speech signal D, and P(W_k+1(s)=)q_i+1|D) is the posterior probability stored in the index for the query word q_i+1appearing at position k+1 within segment s given speech signal D; and
  
  returning search results based on the ranked speech signals.

7. A method of searching for speech segments, the method comprising:
- accessing an index containing probabilities of positions for words generated from a plurality of speech recognition lattices, each lattice being generated from a separate speech segment and each lattice representing a plurality of word sequences;
  
  retrieving a set of probabilities of positions for a word from the index a probability of a position for a word represents a sum of probabilities of the position for the word determined at multiple states in the lattice;
  
  and returning identifiers for speech segments that contain the word based on the set of probabilities.
- View Dependent Claims (8)
- - 8. The method of claim 7 further comprising retrieving a set of probabilities for each word of a multi-word query and using the sets of probabilities to determine a multi-word n-gram score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Acero, Alejandro, Chelba, Ciprian I.
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US11/133,515
Publication Number

US 20060265222A1
Time in Patent Office

1,670 Days
Field of Search

704/251, 704/254, 704/275, 369/25.01, 369/27
US Class Current

704/251
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

Method and apparatus for indexing speech

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

88 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for indexing speech

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

88 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links