System and method for likelihood computation in multi-stream HMM based speech recognition

US 20060074654A1
Filed: 09/21/2004
Published: 04/06/2006
Est. Priority Date: 09/21/2004
Status: Active Grant

First Claim

Patent Images

1. A method for speech recognition, comprising the steps of:

determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams;

determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability;

reducing a number of Gaussians computed for the second stream based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream; and

decoding speech based on the Gaussians computed for the first and second streams.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

9 Citations

View as Search Results

22 Claims

1. A method for speech recognition, comprising the steps of:
- determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams;
  
  determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability;
  
  reducing a number of Gaussians computed for the second stream based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream; and
  
  decoding speech based on the Gaussians computed for the first and second streams.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 13)
- - 2. The method as recited in claim 1, wherein the step of labeling includes hierarchically labeling by surveying Gaussians in multiple resolutions.
  - 3. The method as recited in claim 1, wherein the step of labeling includes employing a search tree.
  - 4. The method as recited in claim 1, wherein the step of determining a distribution of Gaussians co-occurring includes providing a Gaussian cooccurrence map.
  - 5. The method as recited in claim 1, wherein the first stream includes an audio stream and the second stream includes a video stream and the step of decoding speech includes employing multi-stream hidden Markov models.
  - 6. The method as recited in claim 1, further comprising a plurality of feature streams wherein the step of decoding speech includes employing multi-stream hidden Markov models.
  - 7. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for speech recognition as recited in claim 1.
  - 13. The method as recited in claim 1, further comprising a plurality of feature streams wherein the step of decoding speech includes employing multi-stream hidden Markov models.

8. A method for speech recognition based upon a plurality of feature streams, comprising the steps of:
- determining active Gaussians related to a first feature stream by hierarchically labeling the first feature stream;
  
  determining active Gaussians co-occurring in the feature streams other than the first feature stream based upon joint probability using cooccurence statistics such that a number of Gaussians computed for subsequent feature streams are reduced based upon co-occurring Gaussians already computed for at least one other feature stream; and
  
  decoding speech based on the Gaussians computed for the plurality of feature streams.
- View Dependent Claims (9, 10, 11, 12, 14)
- - 9. The method as recited in claim 8, wherein the step of hierarchically labeling includes surveying Gaussians in multiple resolutions.
  - 10. The method as recited in claim 8, wherein the step of hierarchically labeling includes employing a search tree.
  - 11. The method as recited in claim 8, wherein the step of determining active Gaussians includes providing a Gaussian cooccurrence map.
  - 12. The method as recited in claim 8, wherein the first feature stream includes an audio stream and at least one other stream includes a video stream and the step of decoding speech includes employing multi-stream hidden Markov models.
  - 14. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for speech recognition as recited in claim 8.

15. A speech recognition system, comprising:
- a first front end, which extracts features from a first stream to generate likelihoods of the features of the first stream;
  
  a second front end, which extracts features from a second stream associated with the first stream for generating likelihoods of the features of the second stream;
  
  a processing module, which determines active Gaussians used to compute the likelihoods of the features of the first stream and finds active Gaussians co-occurring in the second stream to generate the likelihoods of the features of the second stream such that a number of Gaussians computed for the second stream is reduced based upon Gaussians already computed for the first stream; and
  
  a speech decoder which decodes speech based on the Gaussians computed for the first and second streams.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. The speech recognition system as recited in claim 15, further comprising a concatenator which fuses the features associated with the first and second streams to provide a third stream for decoding speech.
  - 17. The speech recognition system as recited in claim 15, wherein the decoder employs multi-stream hidden Markov models.
  - 18. The speech recognition system as recited in claim 15, further comprising a cooccurrence map wherein the decoder employs the cooccurrence map to generate joint probability statistics for the likelihoods associated with the first and second streams.
  - 19. The speech recognition system as recited in claim 15, wherein the first stream includes one of an acoustic stream and the second stream includes one of an acoustic stream and a video stream.
  - 20. The speech recognition system as recited in claim 19, wherein the video stream includes a human image as a region of interest for decoding speech.
  - 21. The speech recognition system as recited in claim 15, wherein the decoder decodes speech in accordance with a plurality of streams.
  - 22. The speech recognition system as recited in claim 15, further comprising a labeler, which determines a set of available Gaussians for at least one stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Goel, Vaibhava, Potamianos, Gerasimos, Marcheret, Etienne, Chu, Stephen Mingyu

Granted Patent

US 7,480,617 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/242
CPC Class Codes

G10L 15/144 Training of HMMs

System and method for likelihood computation in multi-stream HMM based speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

9 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for likelihood computation in multi-stream HMM based speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links