Turn-taking confidence
First Claim
Patent Images
1. A method for managing interactive dialog between a machine and a user comprising:
- providing audio output comprising speech to the user from the machine, said audio output comprising a sequence of one or more phrases, wherein each phrase is followed by a yield zone, said yield zone characterized by an absence of speech provided from the machine;
receiving digitized audio data comprising speech audio input at the machine wherein said speech audio input is generated from the user or from an environment of the user;
determining said audio input comprises speech audio input generated from the user;
determining a time at which said speech audio input begins;
determining an onset likelihood value based on the time wherein the onset likelihood has a first value if the time occurs during a given phrase associated with the one or more phrases and a second value if the time occurs during a given yield zone associated with the one or more yield zones;
determining a confidence value from the audio input, wherein the confidence value is dependent upon the onset likelihood value and a recognition result from a speech recognition module; and
providing an audio response from the machine to the user based on the confidence value.
5 Assignments
0 Petitions
Accused Products
Abstract
A method for managing interactive dialog between a machine and a user is claimed. In one embodiment, an interaction between the machine and the user is managed by determining at least one likelihood value which is dependent upon a possible speech onset of the user. In another embodiment, the likelihood value can be dependent a model of a desire of the user for specific items, a model of an attention of the user to specific items, or a model of turn-taking cues. Further, the likelihood value can be utilized in a voice activity system.
-
Citations
6 Claims
-
1. A method for managing interactive dialog between a machine and a user comprising:
-
providing audio output comprising speech to the user from the machine, said audio output comprising a sequence of one or more phrases, wherein each phrase is followed by a yield zone, said yield zone characterized by an absence of speech provided from the machine; receiving digitized audio data comprising speech audio input at the machine wherein said speech audio input is generated from the user or from an environment of the user; determining said audio input comprises speech audio input generated from the user; determining a time at which said speech audio input begins; determining an onset likelihood value based on the time wherein the onset likelihood has a first value if the time occurs during a given phrase associated with the one or more phrases and a second value if the time occurs during a given yield zone associated with the one or more yield zones; determining a confidence value from the audio input, wherein the confidence value is dependent upon the onset likelihood value and a recognition result from a speech recognition module; and providing an audio response from the machine to the user based on the confidence value. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification