Audio, video, simulation, and user interface paradigms
First Claim
1. A method of adaptation of a user-specific acoustic model, comprising:
- receiving, by a processor, initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data;
simulating specific speech characteristics of the user based at least in part upon the initial speech input data;
generating additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user;
combining the initial speech input data from the user and the generated additional speech data;
adapting the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and
refining the user-specific adapted acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.
6 Assignments
0 Petitions
Accused Products
Abstract
Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
55 Citations
20 Claims
-
1. A method of adaptation of a user-specific acoustic model, comprising:
-
receiving, by a processor, initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data; simulating specific speech characteristics of the user based at least in part upon the initial speech input data; generating additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user; combining the initial speech input data from the user and the generated additional speech data; adapting the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and refining the user-specific adapted acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.
-
-
2. The method of claim 1, wherein a plurality of user-specific acoustic models are created, each user-specific acoustic model capable of determining one of a plurality of user identities for one of a plurality of users based upon speech data received from one of the users corresponding to one of the user-specific acoustic models.
-
3. The method of claim 2, further comprising:
-
receiving subsequent speech input data from an unknown user; evaluating the subsequent speech input data with at least one of the plurality of user-specific acoustic models; and associating the unknown user with a selected one of the plurality of user identities corresponding with one of the plurality of user-specific acoustic models in response to the evaluation.
-
-
4. The method of claim 3, further comprising:
applying configuration settings associated with the selected user identity.
-
5. The method of claim 1, wherein the user-specific acoustic model and simulated specific speech characteristics of the user are iteratively refined until the user-specific acoustic model is sufficiently tuned to the speech of the user.
-
6. The method of claim 1, wherein the generated additional speech data comprises artificial speech data.
-
7. The method of claim 1, wherein receiving the initial speech input data from the user is performed automatically, and without specific enrollment by the user.
-
8. The method of claim 1, further comprising utilizing at least one adaptation mechanism to refine the user-specific acoustic model.
-
9. The method of claim 1, further comprising:
-
determining an identifier of an object being displayed to the user through a video device; and adding the identifier to an active speech recognition vocabulary, whereby a system causing the object to be displayed and accepting speech input data from a user will be able to more easily recognize the identifier when spoken by the user.
-
-
10. The method of claim 1, further comprising:
-
receiving at least one user speech input word at substantially at time when a user interacts with an object displayed to the user through a video device; and associating the at least one user speech input word with the object after a minimum number of repetitions of the at least one user speech input word with respect to instances of the object.
-
-
11. The method of claim 1, further comprising:
-
receiving at least one user speech input word at substantially at time when a user performs an interaction with at least one object displayed to the user through a video device; and associating the at least one user speech input word with a type of interaction after a minimum number of repetitions of the at least one user speech input word with respect to the interaction.
-
-
12. The method of claim 1 further comprising:
-
processing the initial speech input data using the user-specific acoustic model in order to recognize the initial speech input data; and refining the adapted user-specific acoustic model by feeding back information of the adapted user-specific acoustic model to the user-specific acoustic model until the adapted user-specific acoustic model is sufficiently adapted to the speech of the speaker to adequately recognize the speaker'"'"'s speech.
-
-
13. The method of claim 12, wherein the initial data includes less speech data than is necessary to adapt a user-specific acoustic model to recognize the speaker'"'"'s speech from any subsequently received speech data.
-
14. The method of claim 1 further comprising:
-
generating an estimated production model of the user from the initial speech data; and refining the estimated production model using any subsequently received speech input data, wherein the estimated production model is used to simulate the specific speech characteristics of the user and generate the additional speech data for the user.
-
-
15. A system for adaptation of a user-specific acoustic model, comprising:
-
a processor; and a memory device including instructions that, when executed by the processor, cause the processor to; receive initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data; simulate specific speech characteristics of the user based at least in part upon the initial speech input data; generate additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user; combine the initial speech input data from the user and the generated additional speech data; adapt the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and refine the user-specific adapted acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.
-
-
16. The system of claim 15, wherein the memory device including further instructions that, when executed by the processor, cause the processor to:
-
process the initial speech input data using the user-specific acoustic model in order to recognize the initial speech input data; and refine the adapted user-specific acoustic model by feeding back information of the adapted user-specific acoustic model to the user-specific acoustic model until the adapted user-specific acoustic model is sufficiently adapted to the speech of the speaker to adequately recognize the speaker'"'"'s speech.
-
-
17. The system of claim 16, wherein the initial data received by the instructions includes less speech data than is necessary to adapt a user-specific acoustic model to recognize the speaker'"'"'s speech from any subsequently received speech data.
-
18. A non-transitory computer readable storage medium storing instructions for adaptation of a user-specific acoustic model, the instructions when executed by a processor causing the processor to:
-
receive initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data; simulate specific speech characteristics of the user based at least in part upon the initial speech input data; generate additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user; combine the initial speech input data from the user and the generated additional speech data; adapt the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and refine the adapted user-specific acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.
-
-
19. The non-transitory computer readable storage medium of claim 18 further storing instructions for adaptation of a user-specific acoustic model, the instructions when executed by a processor causing the processor to:
-
process the initial speech input data using the user-specific acoustic model in order to recognize the initial speech input data; and refine the adapted user-specific acoustic model by feeding back information of the adapted user-specific acoustic model to the user-specific acoustic model until the adapted user-specific acoustic model is sufficiently adapted to the speech of the speaker to adequately recognize the speaker'"'"'s speech.
-
-
20. The non-transitory computer readable storage medium of claim 19, wherein the initial data received by the instructions includes less speech data than is necessary to adapt a user-specific acoustic model to recognize the speaker'"'"'s speech from any subsequently received speech data.
Specification