SOLUTION THAT INTEGRATES VOICE ENROLLMENT WITH OTHER TYPES OF RECOGNITION OPERATIONS PERFORMED BY A SPEECH RECOGNITION ENGINE USING A LAYERED GRAMMAR STACK

US 20080154596A1
Filed: 12/22/2006
Published: 06/26/2008
Est. Priority Date: 12/22/2006
Status: Active Grant

First Claim

Patent Images

1. A speech enrollment system comprising:

an ordered stack of grammars, wherein a topmost layer in the stack includes application grammars and wherein the bottommost layer in the stack includes an enrollment grammar;

a recognition engine configured to return results for speech input by processing the input using the ordered stack of grammars, wherein the processing occurs from the topmost layer in the stack to the bottommost layer in the stack, wherein each layer in the stack includes exit criteria based upon a defined condition, wherein when the exit criteria is satisfied, a speech recognition result is returned based upon that layer and lower layers of the ordered stack are ignored, whereby the speech enrollment system supports voice enrollment using the ordered stack without relying upon a voice enrollment specific API.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention can include a speech enrollment system including an ordered stack of grammars and a recognition engine. The ordered stack of grammars can include an application grammars layer, a confusable grammar layer, a personal grammar layer, a phrase enrolled grammar layer, and an enrollment grammar layer. The recognition engine can return recognition results for speech input by processing the input using the ordered stack of grammars. The processing can occur from the topmost layer in the stack to the bottommost layer in the stack. Each layer in the stack can includes exit criteria based upon a defined condition. When the exit criteria is satisfied, a result can be returned based upon that layer and lower layers of the ordered stack can be ignored.

Citations

20 Claims

1. A speech enrollment system comprising:
- an ordered stack of grammars, wherein a topmost layer in the stack includes application grammars and wherein the bottommost layer in the stack includes an enrollment grammar;
  
  a recognition engine configured to return results for speech input by processing the input using the ordered stack of grammars, wherein the processing occurs from the topmost layer in the stack to the bottommost layer in the stack, wherein each layer in the stack includes exit criteria based upon a defined condition, wherein when the exit criteria is satisfied, a speech recognition result is returned based upon that layer and lower layers of the ordered stack are ignored, whereby the speech enrollment system supports voice enrollment using the ordered stack without relying upon a voice enrollment specific API.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein the layered stack includes a confusable grammar layer ordered below the application grammar layer, wherein the layered stack includes a personal grammar layer ordered below the confusable grammar layer, wherein the layered stack includes a phrase enrolled grammar layer ordered below the personal grammar layer, and wherein the phrase enrolled grammar layer is ordered above the enrollment grammar layer.
  - 3. The system of claim 1, wherein the speech recognition engine is a turn based engine, and therein the processing of the ordered stack occurs in a single turn, regardless of the layer of the stack which returns results for the speech input.
  - 4. The system of claim 1, wherein the processing of the ordered stack provides command recognition, clash detection, consistency determination, and acoustic base form generation.
  - 5. The system of claim 1, wherein the exit criteria for each layer is based upon comparing a recognition matching score against at least one of a confidence threshold a clash threshold, a consistency threshold, and a quality threshold.

6. A method for creating voice-enrolled grammars comprising:
- receiving speech input;
  
  using entries in an application grammar to determine whether the speech input matches an entry in the application grammar with a sufficient confidence;
  
  when a sufficient confidence is determined returning a result that indicates a recognition match;
  
  when an insufficient confidence is determined, using entries in at least one of an application grammar and a personal grammar to determine whether the speech input matches an entry in the application grammar with a sufficient clash value;
  
  when a sufficient clash value is determined returning a result that indicates a clash with an existing grammar entry;
  
  when an insufficient clash value is determined performing a voice enrollment consistency detection operation; and
  
  depending upon results of the consistency detection operation, voice enrolling the speech input to generate an acoustic base form corresponding to the speech input.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method of claim 6, wherein the steps of claim 6 are performed by a turn based speech recognition engine in a single turn.
  - 8. The method of claim 6, wherein the method utilizes an ordered stack of grammars consisting of a plurality of layers, wherein the layers of the ordered stack are processed in order from the topmost layer in the stack to the bottommost layer in the stack, wherein each layer in the stack includes exit criteria based upon a defined condition, wherein when the exit criteria is satisfied, a result is returned based upon that layer and lower layers of the ordered stack are ignored.
  - 9. The method of claim 8, wherein the ordered stack includes a application grammars layer, wherein the layered stack includes a confusable grammar layer ordered below the application grammar layer, wherein the layered stack includes a personal grammar layer ordered below the confusable grammar layer, wherein the layered stack includes a phrase enrolled grammar layer ordered below the personal grammar layer, and wherein ordered stack includes a enrollment grammar layer ordered below the phrase enrolled grammar layer.
  - 10. The method of claim 6, wherein the steps of claim 6 are steps performed automatically by at least one machine in accordance with at least one computer program having a plurality of code sections that are executable by the at least one machine.

11. A method of utilizing a layered grammar stack to integrate voice enrollment operations with other types of recognition operations of a speech recognition engine comprising:
- establishing an ordered stack of grammars, wherein a topmost layer in the stack includes at least one application grammar and wherein the bottommost layer in the stack includes an enrollment grammar;
  
  receiving speech input; and
  
  processing the speech input with a speech recognizing engine that utilizes the ordered stack, wherein the processing occurs from the topmost layer in the stack to the bottommost layer in the stack, wherein each layer in the stack includes exit criteria based upon a defined condition, wherein when the exit criteria is satisfied, a speech recognition result is returned based upon that layer, and wherein when a result is returned from a layer, lower layers of the ordered stack are ignored.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, wherein the layered stack includes a confusable grammar layer ordered below the topmost layer, wherein the layered stack includes a personal grammar layer ordered below the confusable grammar layer, wherein the layered stack includes a phrase enrolled grammar ordered below the personal grammar, and wherein the phrase enrolled grammar is ordered above the bottommost layer.
  - 13. The method of claim 11, wherein the speech recognition engine is a turn based engine, and therein the processing of the ordered stack occurs in a single turn, regardless of the layer of the stack which returns results.
  - 14. The method of claim 11, wherein the processing of the ordered stack provides command recognition, clash detection, consistency determination, and acoustic base form generation.
  - 15. The method of claim 11, wherein the process step of the topmost layer for the application grammars compares a recognition result score against a confidence threshold and when the result score is greater or equal to the confidence threshold the processing step does not continue to lower layers of the ordered stack and a recognition result is returned.
  - 16. The method of claim 11, wherein the processing step of the bottommost layer for the enrollment grammar compares an audible quality received against a quality value and selectively enrolls the speech input depending upon comparison results.
  - 17. The method of claim 11, wherein the ordered stack includes a confusable grammar layer for which the processing step compares a recognition result score obtained from the application grammar against a clash threshold and when the result score is greater or equal to the clash threshold the processing step does not continue to lower layers of the ordered stack and a clash indication is returned.
  - 18. The method of claim 11, wherein the ordered stack includes a personal grammar layer for which the processing step compares a recognition result score obtained from a personal grammar against a clash threshold and when the result score is greater or equal to the clash threshold the processing step does not continue to lower layers of the ordered stack and a clash indication is returned.
  - 19. The method of claim 11, wherein the ordered stack includes a phrase enrolled grammar layer for which the processing step compares a recognition result score obtained from a personal grammar against a consistency threshold and when the result score is grater or equal to the consistency threshold, consistent enrollment results are reported.
  - 20. The method of claim 11, wherein the steps of claim 11 are steps performed automatically by at least one machine in accordance with at least one computer program having a plurality of code sections that are executable by the at least one machine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
MUSCHETT, BRIEN H., DA PALMA, WILLIAM V.

Granted Patent

US 8,731,925 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/246
CPC Class Codes

G10L 15/06 Creation of reference templ...

G10L 15/19 Grammatical context, e.g. d...

SOLUTION THAT INTEGRATES VOICE ENROLLMENT WITH OTHER TYPES OF RECOGNITION OPERATIONS PERFORMED BY A SPEECH RECOGNITION ENGINE USING A LAYERED GRAMMAR STACK

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SOLUTION THAT INTEGRATES VOICE ENROLLMENT WITH OTHER TYPES OF RECOGNITION OPERATIONS PERFORMED BY A SPEECH RECOGNITION ENGINE USING A LAYERED GRAMMAR STACK

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links