Household agent learning
First Claim
Patent Images
1. A device comprising:
- a profile building component in communication with an electronic data store;
a speech recognition component; and
a sensor configured to detect movement of a user independent of a direction of the user'"'"'s gaze and without detecting physical contact between the user and the device;
wherein the profile building component is configured to;
receive, from the sensor, an indication that presence of the user was detected;
begin listening for utterances from the user in response to receiving the indication;
detect a first voice signal corresponding to a first utterance of the user;
determine an identity of the user using the first voice signal;
process the first voice signal to determine acoustic information about the user, wherein the acoustic information comprises at least one of an age, a gender, an accent type, a native language, or a type of speech pattern of the user;
perform speech recognition on the first voice signal to obtain a transcript;
process the transcript to determine language information relating to the user, wherein the language information comprises at least one of a name, hobbies, habits, or preferences of the user;
store, in a user profile associated with the identity of the user, the acoustic information and the language information;
determine acoustic model information using at least one of the first voice signal, the acoustic information, or the language information; and
determine language model information using at least one of the transcript, the acoustic information, or the language information; and
wherein the speech recognition component is configured to;
receive a second voice signal corresponding to a second utterance of the user;
determine the identity of the user using the second voice signal;
perform speech recognition on the second voice signal using at least one of the acoustic model information or the language model information to obtain a word sequence that indicates that a third utterance corresponding to a language characteristic will be uttered by a second user different than the user at a time after a current time; and
select a second user acoustic model corresponding to the language characteristic for performing speech recognition at the time after the current time.
1 Assignment
0 Petitions
Accused Products
Abstract
A user profile for a plurality of users may be built for speech recognition purposes and for acting as an agent of the user. In some embodiments, a speech processing device automatically receives an utterance from a user. The utterance may be analyzed using signal processing to identify data associated with the user. The utterance may also be analyzed using speech recognition to identify additional data associated with the user. The identified data may be stored in a profile of the user. Data in the user profile may be used to select an acoustic model and/or a language model for speech recognition or to take actions on behalf of the user.
167 Citations
37 Claims
-
1. A device comprising:
-
a profile building component in communication with an electronic data store; a speech recognition component; and a sensor configured to detect movement of a user independent of a direction of the user'"'"'s gaze and without detecting physical contact between the user and the device; wherein the profile building component is configured to; receive, from the sensor, an indication that presence of the user was detected; begin listening for utterances from the user in response to receiving the indication; detect a first voice signal corresponding to a first utterance of the user; determine an identity of the user using the first voice signal; process the first voice signal to determine acoustic information about the user, wherein the acoustic information comprises at least one of an age, a gender, an accent type, a native language, or a type of speech pattern of the user; perform speech recognition on the first voice signal to obtain a transcript; process the transcript to determine language information relating to the user, wherein the language information comprises at least one of a name, hobbies, habits, or preferences of the user; store, in a user profile associated with the identity of the user, the acoustic information and the language information; determine acoustic model information using at least one of the first voice signal, the acoustic information, or the language information; and determine language model information using at least one of the transcript, the acoustic information, or the language information; and wherein the speech recognition component is configured to; receive a second voice signal corresponding to a second utterance of the user; determine the identity of the user using the second voice signal; perform speech recognition on the second voice signal using at least one of the acoustic model information or the language model information to obtain a word sequence that indicates that a third utterance corresponding to a language characteristic will be uttered by a second user different than the user at a time after a current time; and select a second user acoustic model corresponding to the language characteristic for performing speech recognition at the time after the current time. - View Dependent Claims (2, 3, 32, 36, 37)
-
-
4. A device comprising:
-
a profile building component in communication with an electronic data store; a sensor configured to detect presence of a user independent of a direction of the user'"'"'s gaze and without detecting physical contact between the user and the device; and a speech recognition component; wherein the profile building component is configured to; receive, from the sensor, an indication that presence of the user was detected; begin to listen for utterances from the user in response to receiving the indication; receive a first voice signal corresponding to a first utterance of a user; determine an identity of the user using the first voice signal; process the first voice signal to determine user information and a word sequence that indicates that a second utterance corresponding to a language characteristic is likely to be uttered by a second user different than the user at a time after a current time; store the user information in a user profile associated with the identity of the user; and select a second user acoustic model corresponding to the language characteristic for performing speech recognition. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 33)
-
-
14. A non-transitory computer-readable medium comprising one or more modules configured to execute in one or more processors of a computing device, the one or more modules being further configured to:
-
receive, from a sensor configured to detect presence of a user independent of a direction of the user'"'"'s gaze and without detecting physical contact between the user and the computing device, an indication that presence of the user was detected; begin to listen for utterances from the user in response to receiving the indication; detect a first voice signal corresponding to a first utterance of the user; determine an identity of the user using the first voice signal; determine speech recognition model information using at least one of the first voice signal and user information stored in a user profile associated with the identity of the user; perform speech recognition on the first voice signal using the speech recognition model information to obtain speech recognition results that indicate that a second utterance corresponding to a language characteristic is likely to be uttered by a second user different from the user at a time after a current time; and select a second user acoustic model corresponding to the language characteristic for performing speech recognition. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 34)
-
-
22. A computer-implemented method comprising:
as implemented by one or more computing devices configured with specific computer-executable instructions, receiving, from a sensor configured to detect presence of a user independent of a direction of the user'"'"'s gaze and without detecting physical contact between the user and the one or more computing devices, an indication that presence of the user was detected; begin listening for utterances from the user in response to receiving the indication; receiving a first voice signal corresponding to a first utterance of the user, wherein the first utterance is received by the one or more computing devices, and wherein the first utterance is not directed to the one or more computing devices; determining an identity of the user using the first voice signal; performing speech recognition on the first voice signal, using information from a user profile associated with the identity of the user, to obtain speech recognition results that indicate that a second utterance corresponding to a language characteristic is likely to be uttered by a second user different than the user at a time after a current time; performing an action using the speech recognition results; and selecting a second user acoustic model corresponding to the language characteristic for performing speech recognition. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 35)
Specification