VoiceXML language extension for natively supporting voice enrolled grammars

US 7,881,932 B2
Filed: 10/02/2006
Issued: 02/01/2011
Est. Priority Date: 10/02/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:

prompting a user to provide speech input; and

processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code, the VoiceXML code comprising at least one VoiceXML tag associated with the editing of the at least one voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies at least one voice phrase of the at least one voice-enrolled grammar to be edited, andwherein editing the at least one voice-enrolled grammar comprises editing in accordance with attributes associated with the at least one VoiceXML tag.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention extends the VoiceXML language model to natively support voice enrolled grammars. Specifically, three VoiceXML tags can be added to the language model to add, modify, and delete acoustically provided phrases to voice enrolled grammars. Once created, the voice enrolled grammars can be used in normal speaker dependent speech recognition operations. That is, the voice enrolled grammars can be referenced and utilized just like text enrolled grammars can be referenced and utilized. For example using the present invention, voice enrolled grammars can be referenced by standard text-based Speech Recognition Grammar Specification (SRGS) grammars to create more complex, usable grammars.

39 Citations

View as Search Results

21 Claims

1. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code, the VoiceXML code comprising at least one VoiceXML tag associated with the editing of the at least one voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies at least one voice phrase of the at least one voice-enrolled grammar to be edited, andwherein editing the at least one voice-enrolled grammar comprises editing in accordance with attributes associated with the at least one VoiceXML tag.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer-readable recordable storage medium of claim 1, wherein editing the at least one voice-enrolled grammar comprises creating the at least one voice-enrolled grammar including the at least one word or phrase identified by the speech input and making the at least one voice-enrolled grammar available to be used by a speech recognition engine with standard text-based grammars.
  - 3. The computer-readable recordable storage medium of claim 2, wherein making the at least one voice-enrolled grammar available to be used by the speech recognition engine with said standard text-based grammars comprises making the at least one voice-enrolled grammar available to be used by the speech recognition engine with Speech Recognition Grammar Specification (SRGS)-compliant grammars.
  - 4. The computer-readable recordable storage medium of claim 1, wherein editing the at least one voice-enrolled grammar based on the speech input comprises:
    - adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
      
      /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase.
  - 5. The computer-readable recordable storage medium of claim 4, wherein editing the at least one voice-enrolled grammar based on the speech input further comprises:
    - updating a third voice phrase included in an existing voice-enrolled grammar by replacing the third voice phrase with a new voice phrase, the speech input comprising the third voice phrase and/or the new voice phrase.
  - 6. The computer-readable recordable storage medium of claim 5, wherein editing the at least one voice-enrolled grammar comprises adding phrase-related values to the at least one voice-enrolled grammar, the phrase-related values conforming to naming rules specified by a Speech Recognition Grammar Specification (SRGS)-compliant grammar.

7. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code,wherein editing the at least one voice-enrolled grammar based on the speech input comprises;
  
  adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
  
  /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, andwherein adding the first voice phrase to the voice-enrolled grammar comprises executing a VoiceXML command comprising a plurality of attributes, said plurality of attributes comprising at least three attributes selected from a group of attributes consisting of a first attribute identifying a name of a variable for storing the first voice phrase, a second attribute identifying a first identifier for the first voice phrase, a third attribute identifying a second identifier for an expression to which the first voice phrase relates, a fourth attribute identifying the voice-enrolled grammar to which the first voice phrase is to be added, a fifth attribute identifying an expression of the voice-enrolled grammar to which the first voice phrase relates, a sixth attribute specifying a semantic tag value to be associated with the first voice phrase in the voice-enrolled grammar, a seventh attribute specifying a semantic tag value to be associated with an expression of the voice-enrolled grammar to which the first voice phrase relates, an eighth attribute specifying a phrase weight, a ninth attribute specifying whether audio corresponding to the first voice phrase is to be returned when the first voice phrase is successfully added to the voice-enrolled grammar, a tenth attribute identifying a level of consistency required between the first voice phrase and a potential voice command for the potential voice command to be determined to match the first voice phrase, an eleventh attribute identifying a level of similarity between pronunciations of voice phrases for the voice phrases to be unambiguous, a twelfth attribute identifying a maximum number of training attempts that will be used to achieve a consistent statistical model, a thirteenth attribute identifying a minimum number of consistent speech inputs to be received from a user for the first voice phrase to be added to the voice-enrolled grammar, a fourteenth attribute identifying a set of one or more other voice phrases to which the first voice phrase should be compared to determine whether the first voice phrase clashes with any of the one or more other voice phrases, and an fifteenth attribute specifying a storage location of audio of one or more voice phrases to be added to the voice-enrolled grammar.

8. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code,wherein editing the at least one voice-enrolled grammar based on the speech input comprises;
  
  adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
  
  /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, andwherein adding the first voice phrase to the voice-enrolled grammar comprises receiving, following the adding, return values depending on an outcome of the adding, said return values comprising at least two values selected from a group of values consisting of a first value identifying an identifier for the first voice phrase, a second value identifying a current number of successive failed attempts at adding, a third value identifying a location from which audio of the first voice phrase can be retrieved, a fourth value identifying a duration of the audio of the first voice phrase, and a fifth value identifying a storage size of the audio of the first voice phrase.

9. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code,wherein editing the at least one voice-enrolled grammar based on the speech input comprises;
  
  adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
  
  /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, andwherein the method further comprises detecting an error during the adding, wherein detecting the error comprises detecting at least one of a first error arising when a clash is detected between the first voice phrase to be added and at least one existing voice phrase in the voice-enrolled grammar or between the first voice phrase and another voice phase contained within a list of bad voice phrases maintained for the voice-enrolled grammar, and a second error arising when consistency of the first voice phrase attempted to be added is not achieved within an established maximum number of attempts.

10. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing he speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code,wherein editing the at least one voice-enrolled grammar based on the speech input comprises;
  
  adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
  
  /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, andwherein deleting the second voice phrase from the voice-enrolled grammar comprises executing a VoiceXML command comprising a plurality of attributes, said plurality of attributes comprising at least two attributes selected from a group of attributes consisting of a first attribute identifying a first identifier for the second voice phrase, a second attribute identifying a second identifier for an expression to which the second voice phrase relates, a third attribute identifying the voice-enrolled grammar from which the second voice phrase is to be deleted, and a fourth attribute identifying an expression of the voice-enrolled grammar to which the second voice phrase relates.

11. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code,wherein editing the at least one voice-enrolled grammar based on the speech input comprises;
  
  adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
  
  /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, andwherein the method further comprises detecting an error during the deleting when an attempt to delete the second voice phrase is made and the second voice phrase is not able to be deleted.

12. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code,wherein editing the at least one voice-enrolled grammar based on the speech input comprises;
  
  adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
  
  /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase,wherein editing the at least one voice-enrolled grammar based on the speech input further comprises;
  
  updating a third voice phrase included in an existing voice-enrolled grammar by replacing the third voice phrase with a new voice phrase, the speech input comprising the third voice phrase and/or the new voice phrase, andwherein updating the third voice phrase comprising executing a VoiceXML command comprising a plurality of attributes, said plurality of attributes comprising at least two attributes selected from a group of attributes consisting of a first attribute identifying a first identifier for the third voice phrase, a second attribute identifying a second identifier for a first expression to which the third voice phrase relates, a third attribute identifying a third identifier for the new voice phrase, a fourth attribute identifying a third identifier for a second expression to which the new voice phrase relates, a fifth attribute identifying the voice-enrolled grammar containing the third voice phrase to be updated, a sixth attribute identifying an expression of the voice-enrolled grammar to which the new voice phrase relates, a seventh attribute specifying a semantic tag value to be associated with the new voice phrase in the voice-enrolled grammar, an eighth attribute specifying a semantic tag value to be associated with an expression of the voice-enrolled grammar to which the new voice phrase relates, and a ninth attribute specifying a phrase weight for the new voice phrase.

13. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
  
  processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user.,wherein the computer-executable instructions are implemented as VoiceXML code,wherein editing the at least one voice-enrolled grammar based on the speech input comprises;
  
  adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
  
  /ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase,wherein editing the at least one voice-enrolled grammar based on the speech input further comprises;
  
  updating a third voice phrase included in an existing voice-enrolled grammar by replacing the third voice phrase with a new voice phrase, the speech input comprising the third voice phrase and/or the new voice phrase, andwherein the method further comprises detecting an error during the updating when an attempt to update the third voice phrase is made and the third voice phrase is not able to be updated.

14. A method comprising:
- using a voice browser executing on at least one programmed processor, executing VoiceXML code for manipulating voice phrases contained within a voice-enrolled grammar, the VoiceXML code comprising at least one VoiceXML tag for editing the voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies a voice phrase to be edited, wherein execution of the VoiceXML code by the voice browser causes the voice browser to;
  
  prompt a user to provide speech input; and
  
  process the speech input, wherein the processing comprises making a change to the voice-enrolled grammar relating to the voice phrase, wherein specifics of the change are determined by attributes associated with the VoiceXML tag, and wherein the at least one voice-enrolled grammar comprises at least one word or phrase added to the voice-enrolled grammar by the user.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The method of claim 14, wherein making the change comprises adding the voice phrase to the voice-enrolled grammar.
  - 16. The method of claim 14, wherein the adding the voice phrase to the voice-enrolled grammar comprises executing a VoiceXML tag comprising a plurality of attributes, said plurality of attributes comprising at least three attributes elected from a group of attributes consisting of a first attribute identifying a name of a variable for storing the first voice phrase, a second attribute identifying a first identifier for the first voice phrase, a third attribute identifying a second identifier for an expression to which the first voice phrase relates, a fourth attribute identifying the voice-enrolled grammar to which the first voice phrase is to be added, a fifth attribute identifying an expression of the voice-enrolled grammar to which the first voice phrase relates, a sixth attribute specifying a semantic tag value to be associated with the first voice phrase in the voice-enrolled grammar, a seventh attribute specifying a semantic tag value to be associated with an expression of the voice-enrolled grammar to which the first voice phrase relates, an eighth attribute specifying a phrase weight, a ninth attribute specifying whether audio corresponding to the first voice phrase is to be returned when the first voice phrase is successfully added to the voice-enrolled grammar, a tenth attribute identifying a level of consistency required between the first voice phrase and a potential voice command for the potential voice command to be determined to match the first voice phrase, an eleventh attribute identifying a level of similarity between pronunciations of voice phrases for the voice phrases to be unambiguous, a twelfth attribute identifying a maximum number of training attempts that will be used to achieve a consistent statistical model, a thirteenth attribute identifying a minimum number of consistent speech inputs to be received from a user for the first voice phrase to be added to the voice-enrolled grammar, a fourteenth attribute identifying a set of one or more other voice phrases to which the first voice phrase should be compared to determine whether the first voice phrase clashes with any of the one or more other voice phrases, and a fifteenth attribute specifying a storage location of audio of one or more voice phrases to be added to the voice-enrolled grammar.
  - 17. The method of claim 14, wherein making the change comprises deleting the voice phrase from the voice-enrolled grammar.
  - 18. The method of claim 17, wherein deleting the voice phrase from the voice-enrolled grammar comprises executing a VoiceXML tag comprising a plurality of attributes, said plurality of attributes comprising at least two attributes selected from a group of attributes consisting of a first attribute identifying a first identifier for the second voice phrase, a second attribute identifying a second identifier for an expression to which the second voice phrase relates, a third attribute identifying the voice-enrolled grammar from which the second voice phrase is to be deleted, and a fourth attribute identifying an expression of the voice-enrolled grammar to which the second voice phrase relates.
  - 19. The method of claim 14, wherein making the change comprises updating the voice phrase contained in the voice-enrolled grammar.
  - 20. The method of claim 19, wherein updating the voice phrase in the voice-enrolled grammar comprises executing a VoiceXML tag comprising a plurality of attributes, said plurality of attributes comprising at least three attributes selected from a group of attributes consisting of a first attribute identifying a first identifier for the third voice phrase, a second attribute identifying a second identifier for a first expression to which the third voice phrase relates, a third attribute identifying a third identifier for the new voice phrase, a fourth attribute identifying a third identifier for a second expression to which the new voice phrase relates, a fifth attribute identifying the voice-enrolled grammar containing the third voice phrase to be updated, a sixth attribute identifying an expression of the voice-enrolled grammar to which the new voice phrase relates, a seventh attribute specifying a semantic tag value to be associated with the new voice phrase in the voice-enrolled grammar, an eighth attribute specifying a semantic tag value to be associated with an expression of the voice-enrolled grammar to which the new voice phrase relates, and a ninth attribute specifying a phrase weight for the new voice phrase.

21. An apparatus comprising:
- at least one processor programmed with VoiceXML code to;
  
  prompt a user to provide speech input; and
  
  process the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the userwherein the VoiceXML code comprises at least one VoiceXML tag for editing of the at least one voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies at least one voice phrase of the at least one voice-enrolled grammar to be edited, andwherein the at least one processor is programmed to edit the at least one voice-enrolled grammar in accordance with attributes associated with the at least one VoiceXML tag.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Muschett, Brien H.
Primary Examiner(s)
Lerner; Martin

Application Number

US11/537,769
Publication Number

US 20080082963A1
Time in Patent Office

1,583 Days
Field of Search

704/244, 704/270, 704/275, 704/270.1, 379/88.03, 715/234
US Class Current

704/244
CPC Class Codes

G06F 40/143   Markup, e.g. Standard Gener...

G10L 15/22   Procedures used during a sp...

G10L 2015/0631   Creating reference template...

G10L 2015/228   of application context

VoiceXML language extension for natively supporting voice enrolled grammars

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

VoiceXML language extension for natively supporting voice enrolled grammars

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links