Audio, video, simulation, and user interface paradigms

US 8,825,482 B2
Filed: 09/15/2006
Issued: 09/02/2014
Est. Priority Date: 09/15/2005
Status: Active Grant

First Claim

Patent Images

1. A method of adaptation of a user-specific acoustic model, comprising:

receiving, by a processor, initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data;

simulating specific speech characteristics of the user based at least in part upon the initial speech input data;

generating additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user;

combining the initial speech input data from the user and the generated additional speech data;

adapting the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and

refining the user-specific adapted acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.

55 Citations

View as Search Results

20 Claims

1. A method of adaptation of a user-specific acoustic model, comprising:
- receiving, by a processor, initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data;
  
  simulating specific speech characteristics of the user based at least in part upon the initial speech input data;
  
  generating additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user;
  
  combining the initial speech input data from the user and the generated additional speech data;
  
  adapting the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and
  
  refining the user-specific adapted acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.

2. The method of claim 1, wherein a plurality of user-specific acoustic models are created, each user-specific acoustic model capable of determining one of a plurality of user identities for one of a plurality of users based upon speech data received from one of the users corresponding to one of the user-specific acoustic models.

3. The method of claim 2, further comprising:
- receiving subsequent speech input data from an unknown user;
  
  evaluating the subsequent speech input data with at least one of the plurality of user-specific acoustic models; and
  
  associating the unknown user with a selected one of the plurality of user identities corresponding with one of the plurality of user-specific acoustic models in response to the evaluation.

4. The method of claim 3, further comprising:
- applying configuration settings associated with the selected user identity.

5. The method of claim 1, wherein the user-specific acoustic model and simulated specific speech characteristics of the user are iteratively refined until the user-specific acoustic model is sufficiently tuned to the speech of the user.

6. The method of claim 1, wherein the generated additional speech data comprises artificial speech data.

7. The method of claim 1, wherein receiving the initial speech input data from the user is performed automatically, and without specific enrollment by the user.

8. The method of claim 1, further comprising utilizing at least one adaptation mechanism to refine the user-specific acoustic model.

9. The method of claim 1, further comprising:
- determining an identifier of an object being displayed to the user through a video device; and
  
  adding the identifier to an active speech recognition vocabulary, whereby a system causing the object to be displayed and accepting speech input data from a user will be able to more easily recognize the identifier when spoken by the user.

10. The method of claim 1, further comprising:
- receiving at least one user speech input word at substantially at time when a user interacts with an object displayed to the user through a video device; and
  
  associating the at least one user speech input word with the object after a minimum number of repetitions of the at least one user speech input word with respect to instances of the object.

11. The method of claim 1, further comprising:
- receiving at least one user speech input word at substantially at time when a user performs an interaction with at least one object displayed to the user through a video device; and
  
  associating the at least one user speech input word with a type of interaction after a minimum number of repetitions of the at least one user speech input word with respect to the interaction.

12. The method of claim 1 further comprising:
- processing the initial speech input data using the user-specific acoustic model in order to recognize the initial speech input data; and
  
  refining the adapted user-specific acoustic model by feeding back information of the adapted user-specific acoustic model to the user-specific acoustic model until the adapted user-specific acoustic model is sufficiently adapted to the speech of the speaker to adequately recognize the speaker'"'"'s speech.

13. The method of claim 12, wherein the initial data includes less speech data than is necessary to adapt a user-specific acoustic model to recognize the speaker'"'"'s speech from any subsequently received speech data.

14. The method of claim 1 further comprising:
- generating an estimated production model of the user from the initial speech data; and
  
  refining the estimated production model using any subsequently received speech input data, wherein the estimated production model is used to simulate the specific speech characteristics of the user and generate the additional speech data for the user.

15. A system for adaptation of a user-specific acoustic model, comprising:
- a processor; and
  
  a memory device including instructions that, when executed by the processor, cause the processor to;
  
  receive initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data;
  
  simulate specific speech characteristics of the user based at least in part upon the initial speech input data;
  
  generate additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user;
  
  combine the initial speech input data from the user and the generated additional speech data;
  
  adapt the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and
  
  refine the user-specific adapted acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.

16. The system of claim 15, wherein the memory device including further instructions that, when executed by the processor, cause the processor to:
- process the initial speech input data using the user-specific acoustic model in order to recognize the initial speech input data; and
  
  refine the adapted user-specific acoustic model by feeding back information of the adapted user-specific acoustic model to the user-specific acoustic model until the adapted user-specific acoustic model is sufficiently adapted to the speech of the speaker to adequately recognize the speaker'"'"'s speech.

17. The system of claim 16, wherein the initial data received by the instructions includes less speech data than is necessary to adapt a user-specific acoustic model to recognize the speaker'"'"'s speech from any subsequently received speech data.

18. A non-transitory computer readable storage medium storing instructions for adaptation of a user-specific acoustic model, the instructions when executed by a processor causing the processor to:
- receive initial speech input data in form of a plurality of words from a user, the initial speech input data including less speech data than is necessary to adapt a user-specific acoustic model to identify the user based upon any subsequently received speech input data;
  
  simulate specific speech characteristics of the user based at least in part upon the initial speech input data;
  
  generate additional speech data including one or more simulated words for the user based at least in part upon the initial speech input data and the specific speech characteristics of the user;
  
  combine the initial speech input data from the user and the generated additional speech data;
  
  adapt the user-specific acoustic model for speech recognition of the speaker based at least in part upon the combined initial speech input data and the generated additional speech data; and
  
  refine the adapted user-specific acoustic model until the adapted user-specific acoustic model is sufficiently tuned to the speech of the user to adequately identify the user based upon subsequently received speech input data from the user.

19. The non-transitory computer readable storage medium of claim 18 further storing instructions for adaptation of a user-specific acoustic model, the instructions when executed by a processor causing the processor to:
- process the initial speech input data using the user-specific acoustic model in order to recognize the initial speech input data; and
  
  refine the adapted user-specific acoustic model by feeding back information of the adapted user-specific acoustic model to the user-specific acoustic model until the adapted user-specific acoustic model is sufficiently adapted to the speech of the speaker to adequately recognize the speaker'"'"'s speech.

20. The non-transitory computer readable storage medium of claim 19, wherein the initial data received by the instructions includes less speech data than is necessary to adapt a user-specific acoustic model to recognize the speaker'"'"'s speech from any subsequently received speech data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Interactive Entertainment Inc. (Sony Group Corp.)
Original Assignee
Sony Computer Entertainment Incorporated (Sony Group Corp.)
Inventors
Hernandez-Abrego, Gustavo, Menendez-Pidal, Xavier, Osman, Steven, Chen, Ruxin, Deshpande, Rishi, Michaud-Wideman, Care, Marks, Richard, Larsen, Eric, Mao, Xiaodong
Primary Examiner(s)
Godbold, Douglas
Assistant Examiner(s)
Villena, Mark

Application Number

US11/522,304
Publication Number

US 20070061142A1
Time in Patent Office

2,909 Days
Field of Search

704/275, 704/257, 704/246, 704/243, 704/260, 704/245, 704/235, 704/233
US Class Current

704/246
CPC Class Codes

A63F 13/213   comprising photodetecting m...

A63F 13/217   using environment-related i...

A63F 13/428   involving motion or positio...

G06F 2203/011   Emotion or mood input deter...

G06F 3/005   Input arrangements through ...

G06F 3/012   Head tracking input arrange...

G06F 3/015   Input arrangements based on...

G10L 17/00   Speaker identification or v...

G10L 17/04   Training, enrolment or mode...

Audio, video, simulation, and user interface paradigms

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

55 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Audio, video, simulation, and user interface paradigms

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links