VoiceXML language extension for natively supporting voice enrolled grammars
First Claim
1. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
- prompting a user to provide speech input; and
processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user,wherein the computer-executable instructions are implemented as VoiceXML code, the VoiceXML code comprising at least one VoiceXML tag associated with the editing of the at least one voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies at least one voice phrase of the at least one voice-enrolled grammar to be edited, andwherein editing the at least one voice-enrolled grammar comprises editing in accordance with attributes associated with the at least one VoiceXML tag.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention extends the VoiceXML language model to natively support voice enrolled grammars. Specifically, three VoiceXML tags can be added to the language model to add, modify, and delete acoustically provided phrases to voice enrolled grammars. Once created, the voice enrolled grammars can be used in normal speaker dependent speech recognition operations. That is, the voice enrolled grammars can be referenced and utilized just like text enrolled grammars can be referenced and utilized. For example using the present invention, voice enrolled grammars can be referenced by standard text-based Speech Recognition Grammar Specification (SRGS) grammars to create more complex, usable grammars.
39 Citations
21 Claims
-
1. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user, wherein the computer-executable instructions are implemented as VoiceXML code, the VoiceXML code comprising at least one VoiceXML tag associated with the editing of the at least one voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies at least one voice phrase of the at least one voice-enrolled grammar to be edited, and wherein editing the at least one voice-enrolled grammar comprises editing in accordance with attributes associated with the at least one VoiceXML tag. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user, wherein the computer-executable instructions are implemented as VoiceXML code, wherein editing the at least one voice-enrolled grammar based on the speech input comprises; adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
/ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, and wherein adding the first voice phrase to the voice-enrolled grammar comprises executing a VoiceXML command comprising a plurality of attributes, said plurality of attributes comprising at least three attributes selected from a group of attributes consisting of a first attribute identifying a name of a variable for storing the first voice phrase, a second attribute identifying a first identifier for the first voice phrase, a third attribute identifying a second identifier for an expression to which the first voice phrase relates, a fourth attribute identifying the voice-enrolled grammar to which the first voice phrase is to be added, a fifth attribute identifying an expression of the voice-enrolled grammar to which the first voice phrase relates, a sixth attribute specifying a semantic tag value to be associated with the first voice phrase in the voice-enrolled grammar, a seventh attribute specifying a semantic tag value to be associated with an expression of the voice-enrolled grammar to which the first voice phrase relates, an eighth attribute specifying a phrase weight, a ninth attribute specifying whether audio corresponding to the first voice phrase is to be returned when the first voice phrase is successfully added to the voice-enrolled grammar, a tenth attribute identifying a level of consistency required between the first voice phrase and a potential voice command for the potential voice command to be determined to match the first voice phrase, an eleventh attribute identifying a level of similarity between pronunciations of voice phrases for the voice phrases to be unambiguous, a twelfth attribute identifying a maximum number of training attempts that will be used to achieve a consistent statistical model, a thirteenth attribute identifying a minimum number of consistent speech inputs to be received from a user for the first voice phrase to be added to the voice-enrolled grammar, a fourteenth attribute identifying a set of one or more other voice phrases to which the first voice phrase should be compared to determine whether the first voice phrase clashes with any of the one or more other voice phrases, and an fifteenth attribute specifying a storage location of audio of one or more voice phrases to be added to the voice-enrolled grammar.
-
-
8. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user, wherein the computer-executable instructions are implemented as VoiceXML code, wherein editing the at least one voice-enrolled grammar based on the speech input comprises; adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
/ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, and wherein adding the first voice phrase to the voice-enrolled grammar comprises receiving, following the adding, return values depending on an outcome of the adding, said return values comprising at least two values selected from a group of values consisting of a first value identifying an identifier for the first voice phrase, a second value identifying a current number of successive failed attempts at adding, a third value identifying a location from which audio of the first voice phrase can be retrieved, a fourth value identifying a duration of the audio of the first voice phrase, and a fifth value identifying a storage size of the audio of the first voice phrase.
-
-
9. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user, wherein the computer-executable instructions are implemented as VoiceXML code, wherein editing the at least one voice-enrolled grammar based on the speech input comprises; adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
/ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, and wherein the method further comprises detecting an error during the adding, wherein detecting the error comprises detecting at least one of a first error arising when a clash is detected between the first voice phrase to be added and at least one existing voice phrase in the voice-enrolled grammar or between the first voice phrase and another voice phase contained within a list of bad voice phrases maintained for the voice-enrolled grammar, and a second error arising when consistency of the first voice phrase attempted to be added is not achieved within an established maximum number of attempts.
-
-
10. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing he speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user, wherein the computer-executable instructions are implemented as VoiceXML code, wherein editing the at least one voice-enrolled grammar based on the speech input comprises; adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
/ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, and wherein deleting the second voice phrase from the voice-enrolled grammar comprises executing a VoiceXML command comprising a plurality of attributes, said plurality of attributes comprising at least two attributes selected from a group of attributes consisting of a first attribute identifying a first identifier for the second voice phrase, a second attribute identifying a second identifier for an expression to which the second voice phrase relates, a third attribute identifying the voice-enrolled grammar from which the second voice phrase is to be deleted, and a fourth attribute identifying an expression of the voice-enrolled grammar to which the second voice phrase relates.
-
-
11. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user, wherein the computer-executable instructions are implemented as VoiceXML code, wherein editing the at least one voice-enrolled grammar based on the speech input comprises; adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
/ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, and wherein the method further comprises detecting an error during the deleting when an attempt to delete the second voice phrase is made and the second voice phrase is not able to be deleted.
-
-
12. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user, wherein the computer-executable instructions are implemented as VoiceXML code, wherein editing the at least one voice-enrolled grammar based on the speech input comprises; adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
/ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, wherein editing the at least one voice-enrolled grammar based on the speech input further comprises; updating a third voice phrase included in an existing voice-enrolled grammar by replacing the third voice phrase with a new voice phrase, the speech input comprising the third voice phrase and/or the new voice phrase, and wherein updating the third voice phrase comprising executing a VoiceXML command comprising a plurality of attributes, said plurality of attributes comprising at least two attributes selected from a group of attributes consisting of a first attribute identifying a first identifier for the third voice phrase, a second attribute identifying a second identifier for a first expression to which the third voice phrase relates, a third attribute identifying a third identifier for the new voice phrase, a fourth attribute identifying a third identifier for a second expression to which the new voice phrase relates, a fifth attribute identifying the voice-enrolled grammar containing the third voice phrase to be updated, a sixth attribute identifying an expression of the voice-enrolled grammar to which the new voice phrase relates, a seventh attribute specifying a semantic tag value to be associated with the new voice phrase in the voice-enrolled grammar, an eighth attribute specifying a semantic tag value to be associated with an expression of the voice-enrolled grammar to which the new voice phrase relates, and a ninth attribute specifying a phrase weight for the new voice phrase.
-
-
13. A computer-readable recordable storage medium having recorded thereon computer-executable instructions that, when executed by a computer, cause the computer to carry out a method, the method comprising:
-
prompting a user to provide speech input; and processing the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user., wherein the computer-executable instructions are implemented as VoiceXML code, wherein editing the at least one voice-enrolled grammar based on the speech input comprises; adding a first voice phrase to a voice-enrolled grammar, the speech input comprising the first voice phrase; and
/ordeleting a second voice phrase from an existing voice enrolled grammar, the speech input comprising the second voice phrase, wherein editing the at least one voice-enrolled grammar based on the speech input further comprises; updating a third voice phrase included in an existing voice-enrolled grammar by replacing the third voice phrase with a new voice phrase, the speech input comprising the third voice phrase and/or the new voice phrase, and wherein the method further comprises detecting an error during the updating when an attempt to update the third voice phrase is made and the third voice phrase is not able to be updated.
-
-
14. A method comprising:
using a voice browser executing on at least one programmed processor, executing VoiceXML code for manipulating voice phrases contained within a voice-enrolled grammar, the VoiceXML code comprising at least one VoiceXML tag for editing the voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies a voice phrase to be edited, wherein execution of the VoiceXML code by the voice browser causes the voice browser to; prompt a user to provide speech input; and process the speech input, wherein the processing comprises making a change to the voice-enrolled grammar relating to the voice phrase, wherein specifics of the change are determined by attributes associated with the VoiceXML tag, and wherein the at least one voice-enrolled grammar comprises at least one word or phrase added to the voice-enrolled grammar by the user. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
21. An apparatus comprising:
-
at least one processor programmed with VoiceXML code to; prompt a user to provide speech input; and process the speech input, wherein the processing comprises editing at least one voice-enrolled grammar based on the speech input, the at least one voice-enrolled grammar comprising at least one word or phrase added to the voice-enrolled grammar by the user wherein the VoiceXML code comprises at least one VoiceXML tag for editing of the at least one voice-enrolled grammar and an identifier associated with the at least one VoiceXML tag that identifies at least one voice phrase of the at least one voice-enrolled grammar to be edited, and wherein the at least one processor is programmed to edit the at least one voice-enrolled grammar in accordance with attributes associated with the at least one VoiceXML tag.
-
Specification