Systems, methods and articles of manufacture for performing high resolution N-best string hypothesization

US 5,805,772 A
Filed: 12/30/1994
Issued: 09/08/1998
Est. Priority Date: 12/30/1994
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system comprising:

means for receiving an input signal representing a speech utterance;

means for storing a plurality of recognition models having allophonic specifications wherein ones of said plurality of recognition models include one or more inter-word context dependent models and one or more language models; and

means for processing said input signal utilizing ones of said plurality of recognition models to generate one or more string hypotheses of said input signal, said processing means including;

means for producing a forward partial path map according to the allophonic specifications of at least one inter-word context dependent model and at least one language model; and

means for traversing said forward partial path map in the backward direction as a function of said allophonic specifications to generate said one or more string hypotheses.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are systems, methods and articles of manufacture for performing high resolution N-best string hypothesization during speech recognition. A received input signal, representing a speech utterance, is processed utilizing a plurality of recognition models to generate one or more string hypotheses of the received input signal. The plurality of recognition models preferably include one or more inter-word context dependent models and one or more language models. A forward partial path map is produced according to the allophonic specifications of at least one of the inter-word context dependent models and the language models. The forward partial path map is traversed in the backward direction as a function of the allophonic specifications to generate the one or more string hypotheses. One or more of the recognition models may represent one phone words.

Citations

35 Claims

1. A speech recognition system comprising:
- means for receiving an input signal representing a speech utterance;
  
  means for storing a plurality of recognition models having allophonic specifications wherein ones of said plurality of recognition models include one or more inter-word context dependent models and one or more language models; and
  
  means for processing said input signal utilizing ones of said plurality of recognition models to generate one or more string hypotheses of said input signal, said processing means including;
  
  means for producing a forward partial path map according to the allophonic specifications of at least one inter-word context dependent model and at least one language model; and
  
  means for traversing said forward partial path map in the backward direction as a function of said allophonic specifications to generate said one or more string hypotheses.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The system as set forth in claim 1 wherein said storing means further operates to store a plurality of processing unit instructions.
  - 3. The system as set forth in claim 2 wherein said processing means includes one or more processing units and is further operable to retrieve and execute selected ones of said processing unit instructions, said selected ones of said processing unit instructions directing said processing means to process said input signal utilizing ones of said plurality of recognition models to generate said one or more string hypotheses of said input signal.
  - 4. The system as set forth in claim 1 wherein said producing means operates time synchronously with respect to said input signal.
  - 5. The system as set forth in claim 1 wherein said producing means uses a Viterbi search.
  - 6. The system as set forth in claim 5 wherein said producing means uses a beam search.
  - 7. The system as set forth in claim 1 wherein said traversing means operates time asynchronously with respect to said input signal.
  - 8. The system as set forth in claim 1 wherein ones of said plurality of recognition models are tri-phone recognition models.
  - 9. The system as set forth in claim 1 wherein one or more of said plurality of recognition models represent a one phone word.
  - 10. The system as set forth in claim 1 wherein said processing means utilizes pruning techniques.
  - 11. The system as set forth in claim 1 wherein said traversing means further operates to complete said forward partial path map.
  - 12. The system as set forth in claim 1 wherein one or more of said plurality of recognition models is a Hidden Markov Model.

13. A method for generating one or more string hypotheses from a received input signal, said received input signal representing a speech utterance, said method comprising the steps of:
- utilizing ones of a plurality of recognition models having allophonic specifications to process said received input signal, said plurality of recognition models including at least one or more inter-word context dependent models and one or more language models;
  
  producing a search graph in a first direction according to the allophonic specifications of at least one of a particular! inter-word context dependent model and a particular! at least one language model; and
  
  traversing said search graph in a second direction as a function of said allophonic specifications to generate said one or more string hypotheses.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The method as set forth in claim 13 wherein said utilizing step is preceded by the step of storing said plurality of recognition models to a storage device.
  - 15. The method as set forth in claim 13 wherein said producing step is performed time synchronously with respect to said received input signal.
  - 16. The method as set forth in claim 13 wherein said utilizing step is preceded by the step of generating one or more feature vectors characterizing said received input signal.
  - 17. The method as set forth in claim 13 wherein said producing step further includes the step of using a Viterbi search.
  - 18. The method as set forth in claim 13 wherein said producing step further includes the step of using a beam search.
  - 19. The method as set forth in claim 13 wherein said traversing step is performed time asynchronously with respect to said received input signal.
  - 20. The method as set forth in claim 13 further including the step of using one or more tri-phone recognition models.
  - 21. The method as set forth in claim 13 wherein one or more of said plurality of recognition models represent a one phone word.
  - 22. The method as set forth in claim 13 further including the step of pruning said search graph.
  - 23. The method as set forth in claim 13 wherein said traversing step further includes the step of completing said search graph.
  - 24. The method as set forth in claim 13 wherein one or more of said plurality of recognition models is a Hidden Markov Model.

25. A storage medium which is readable by a processing system, said storage medium including a plurality of processing system instructions operable to direct said processing system to perform speech recognition, said storage medium comprising:
- a first instruction set for storing a received input signal representing a speech utterance;
  
  a second instruction set for storing a plurality of recognition models having allophonic specifications wherein ones of said plurality of recognition models include one or more inter-word context dependent models and one or more language models; and
  
  a third instruction set for utilizing ones of said plurality of recognition models to produce a forward partial path map according to the allophonic specifications of at least one of said inter-word context dependent models and at least one of said language models, and to traverse said forward partial path map in the backward direction as a function of said allophonic specifications to generate one or more string hypotheses of said input signal.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 26. The storage medium as set forth in claim 25 wherein said storage medium is selected from the group consisting of:
    - a magnetic device operable to utilize patterns of magnetism to store said processing system instructions,a semiconductor chip operable to utilize on-off electric charges to store said processing system instructions, andan optical memory operable to utilize on-off light beams to store said processing system instructions.
  - 27. The storage medium as set forth in claim 25 wherein said third instruction set operates to produce said forward partial path map time synchronously with respect to said input signal.
  - 28. The storage medium as set forth in claim 25 wherein said third instruction set uses a Viterbi search.
  - 29. The storage medium as set forth in claim 25 wherein said third instruction set uses a beam search.
  - 30. The storage medium as set forth in claim 25 wherein said third instruction set operates to traverse said forward partial path map time asynchronously with respect to said input signal.
  - 31. The storage medium as set forth in claim 25 wherein ones of said plurality of recognition models are tri-phone recognition models.
  - 32. The storage medium as set forth in claim 25 wherein one or more of said plurality of recognition models represent a one phone word.
  - 33. The storage medium as set forth in claim 25 wherein said third instruction set uses pruning techniques.
  - 34. The storage medium as set forth in claim 25 wherein said third instruction set completes said forward partial path map.
  - 35. The storage medium as set forth in claim 25 wherein one or more of said plurality of recognition models is a Hidden Markov Model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Chou, Wu, Juang, Biing-Hwang, Matsuoka, Tatsuo, Lee, Chin-Hui
Primary Examiner(s)
Zele, Krista
Assistant Examiner(s)
WEAVER, SCOTT LOUIS

Application Number

US08/366,843
Time in Patent Office

1,348 Days
Field of Search

395/2.64, 395/2.65, 395/2.66, 395/2.61, 395/2.63, 395/2.51, 395/2.62, 395/2.6, 395/2.45, 395/2.75, 395/2.79
US Class Current

704/255
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/025   Phonemes, fenemes or fenone...

Systems, methods and articles of manufacture for performing high resolution N-best string hypothesization

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Systems, methods and articles of manufacture for performing high resolution N-best string hypothesization

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links