Time-anchored posterior indexing of speech

US 7,831,425 B2
Filed: 12/15/2005
Issued: 11/09/2010
Est. Priority Date: 12/15/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice, the method comprising:

using a processor to identify at least two speech recognition hypotheses for a particular word which have time ranges satisfying a criteria, each of the at least two speech recognition hypotheses for the particular word having an associated start time, an associated end time, and an associated probability, at least some of the at least two speech recognition hypotheses having different associated start and/or associated finish times from each other, and satisfying the criteria requires that the at least two speech recognition hypotheses for the particular word have start times that are within a predetermined range of each other, and end times that are within a predetermined range of each other;

using the processor to merge the at least two speech recognition hypotheses, at least some of which having different associated start and/or associated finish times from each other, to generate a merged speech recognition hypothesis for the particular word such that start and end times for the merged speech recognition hypothesis are the same as start and end times for a best of the at least two speech recognition hypotheses, wherein merging the at least two speech recognition hypotheses to generate the merged speech recognition hypothesis for the particular word further comprises combining the associated probabilities of the at least two speech recognition hypotheses for the particular word which have time ranges satisfying the criteria; and

storing an index entry to represent the merged speech recognition hypothesis for the particular word.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice is provided. The method includes identifying at least two speech recognition hypotheses for a word which have time ranges satisfying a criteria. The method further includes merging the at least two speech recognition hypotheses to generate a merged speech recognition hypothesis for the word.

77 Citations

View as Search Results

16 Claims

1. A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice, the method comprising:
- using a processor to identify at least two speech recognition hypotheses for a particular word which have time ranges satisfying a criteria, each of the at least two speech recognition hypotheses for the particular word having an associated start time, an associated end time, and an associated probability, at least some of the at least two speech recognition hypotheses having different associated start and/or associated finish times from each other, and satisfying the criteria requires that the at least two speech recognition hypotheses for the particular word have start times that are within a predetermined range of each other, and end times that are within a predetermined range of each other;
  
  using the processor to merge the at least two speech recognition hypotheses, at least some of which having different associated start and/or associated finish times from each other, to generate a merged speech recognition hypothesis for the particular word such that start and end times for the merged speech recognition hypothesis are the same as start and end times for a best of the at least two speech recognition hypotheses, wherein merging the at least two speech recognition hypotheses to generate the merged speech recognition hypothesis for the particular word further comprises combining the associated probabilities of the at least two speech recognition hypotheses for the particular word which have time ranges satisfying the criteria; and
  
  storing an index entry to represent the merged speech recognition hypothesis for the particular word.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein storing the index entry to represent the merged speech recognition hypothesis for the particular word further comprises encoding the merged speech recognition hypothesis into an integer value represented by a predetermined number of bits, a first plurality of the predetermined number of bits representing a time range for the merged speech recognition hypothesis, a second plurality of the predetermined number of bits representing a quantized time duration and a probability of the merged speech recognition hypothesis.
  - 3. The computer-implemented method of claim 2, wherein the time range for the merged speech recognition hypothesis is a center point between the start and end times of the merged speech recognition hypothesis.
  - 4. The computer-implemented method of claim 2, wherein the time range for the merged speech recognition hypothesis is one of the start and end times of the merged speech recognition hypothesis.
  - 5. The computer-implemented method of claim 2, and further comprising representing audio as a sequence of time ranges with at least one word hypothesis associated with each time range.
  - 6. The computer-implemented method of claim 1, wherein each of the at least two speech recognition hypotheses for the particular word in the speech lattice include a word ID that identifies the particular word.
  - 7. The computer-implemented method of claim 6, wherein each of the at least two speech recognition hypotheses for the particular word in the speech lattice comprise an n-tuple that includes the start time associated with the speech recognition hypothesis, the end time associated with the speech recognition hypothesis, the word ID that identifies the particular word, and the associated probability for the speech recognition hypothesis.

8. A computer-implemented method comprising:
- accessing a speech lattice representing a plurality of speech recognition hypotheses for a portion of speech data, the plurality of speech recognition hypotheses including a plurality of word hypotheses for a plurality of words in the portion of speech data, each word hypothesis of the plurality of word hypotheses including an n-tuple representing a start time associated with the word hypothesis, an end time associated with the word hypothesis, a word TD that identifies a particular word represented by the word hypothesis, and an associated probability for the word hypothesis;
  
  selecting a set of word hypotheses, from the plurality of word hypotheses, that are hypotheses for a same word in the portion of speech data and that have start and end times that satisfy a criteria, the set of word hypotheses being selected using the word IDs, start times, and end times of the n-tuples for the plurality of word hypotheses, whereineach word hypothesis in the set that satisfy the criteria has an associated start time within a first predetermined range of the start times of all other word hypotheses in the set and has an associated end time within a second predetermined range of the end times of all other word hypotheses in the set, andat least two word hypotheses in the set have different associated start times and/or different associated end times from each other; and
  
  generating, using a processor of a computer, a merged word hypothesis for the same word in the portion of speech data by merging the set of word hypotheses, wherein generating comprises;
  
  merging the at least two word hypotheses in the set having different associated start times and/or different associated end times from each other;
  
  assigning start and end times to the merged word hypothesis that are the same as the start and end times associated with the word hypothesis in the set having a highest probability; and
  
  assigning a probability to the merged word hypothesis by combining the associated probabilities of the merged set of word hypotheses; and
  
  storing an index entry to represent the merged word hypothesis.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The computer-implemented method of claim 8, wherein the portion of speech data comprises a spoken sentence.
  - 10. The computer-implemented method of claim 8, and comprising:
    - accessing speech data and forming the speech lattice by generating the plurality of speech recognition hypotheses from the speech data, wherein generating the plurality of speech recognition hypotheses comprises generating a first text transcript for the portion of speech data and at least one alternative speech recognition hypothesis for the portion of speech data.
  - 11. The computer-implemented method of claim 10, wherein the plurality of speech recognition hypotheses comprise hypotheses for a phrase in the speech data.
  - 12. The computer-implemented method of claim 8, wherein storing the index entry to represent the merged word hypothesis comprises encoding the merged word hypothesis into an integer value represented by a predetermined number of bits, a first plurality of the predetermined number of bits representing a time range for the merged speech recognition hypothesis, a second plurality of the predetermined number of bits representing a quantized time duration and a probability of the merged word hypothesis.
  - 13. The computer-implemented method of claim 12, wherein the time range for the merged word hypothesis is a center point between the start and end times of the merged word hypothesis.
  - 14. The computer-implemented method of claim 12, wherein the time range for the merged word hypothesis is one of the start and end times of the merged word hypothesis.
  - 15. The computer-implemented method of claim 12, and further comprising representing audio as a sequence of time ranges with at least one word hypothesis associated with each time range.
  - 16. The computer-implemented method of claim 8, wherein at least one word hypothesis in the set of word hypotheses has a different associated start time and/or different associated end time than the start and end times assigned to the merged word hypothesis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Seide, Frank Torsten B., Chelba, Ciprian I., Yu, Roger Peng, Gunawardana, Asela J., Acero, Alejandro, Nguyen, Patrick, Selberg, Erik W.
Primary Examiner(s)
Wozniak; James S
Assistant Examiner(s)
ORTIZ SANCHEZ, MICHAEL

Application Number

US11/300,735
Publication Number

US 20070143110A1
Time in Patent Office

1,790 Days
Field of Search

704/231, 704/251
US Class Current

704/251
CPC Class Codes

G10L 15/05 Word boundary detection

G10L 15/08 Speech classification or se...

Time-anchored posterior indexing of speech

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

77 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Time-anchored posterior indexing of speech

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

77 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links