Method and apparatus for training a voice recognition model database

US 9,275,638 B2
Filed: 12/03/2013
Issued: 03/01/2016
Est. Priority Date: 03/12/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving speech data corresponding to an utterance spoken in a particular noise environment;

for each of a plurality of noise environments that are different than the particular noise environment;

combining the speech data with stored noise data that is associated with the noise environment of the plurality of noise environments, to generate noise-specific, training audio data, andtraining a noise-specific, speech recognition model based at least on the noise-specific, training audio data; and

providing the respective, noise-specific, speech recognition models associated with each of the plurality of noise environments, for output.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An electronic device digitally combines a single voice input with each of a series of noise samples. Each noise sample is taken from a different audio environment (e.g., street noise, babble, interior car noise). The voice input/noise sample combinations are used to train a voice recognition model database without the user having to repeat the voice input in each of the different environments. In one variation, the electronic device transmits the user'"'"'s voice input to a server that maintains and trains the voice recognition model database.

Citations

20 Claims

1. A computer-implemented method comprising:
- receiving speech data corresponding to an utterance spoken in a particular noise environment;
  
  for each of a plurality of noise environments that are different than the particular noise environment;
  
  combining the speech data with stored noise data that is associated with the noise environment of the plurality of noise environments, to generate noise-specific, training audio data, andtraining a noise-specific, speech recognition model based at least on the noise-specific, training audio data; and
  
  providing the respective, noise-specific, speech recognition models associated with each of the plurality of noise environments, for output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, comprising:
    - receiving, from a user, data indicating a selection of the stored noise data, wherein the speech data is received from the user.
  - 3. The method of claim 1, wherein the plurality of noise environments comprises:
    - noise associated with a home,noise associated with a car,noise associated with an office, ornoise associated with a restaurant.
  - 4. The method of claim 1, comprising:
    - detecting a new noise type; and
      
      storing new noise data that is associated with the new noise type.
  - 5. The method of claim 1, comprising:
    - detecting a new noise type; and
      
      in response to detecting the new noise type;
      
      prompting a user to provide additional speech data; and
      
      training a noise-specific, speech recognition model based at least on the additional speech data.
  - 6. The method of claim 1, comprising:
    - receiving additional speech data;
      
      combining the additional speech data with the stored noise data to generate additional noise-specific, training audio data; and
      
      updating the noise-specific, speech recognition model based on the additional noise-specific, training audio data.
  - 7. The method of claim 1, comprising:
    - receiving additional speech data from a user who provided the speech data; and
      
      after combining the speech data and training the noise-specific, speech recognition model;
      
      combining the additional speech data with the stored noise data to generate additional noise-specific, training audio data; and
      
      updating the noise-specific, speech recognition model based on the additional noise-specific, training audio data.
  - 8. The method of claim 1, comprising:
    - storing the speech data in a speech data database.

9. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving speech data corresponding to an utterance spoken in a particular noise environment;
  
  for each of a plurality of noise environments that are different than the particular noise environment;
  
  combining the speech data with stored noise data that is associated with the noise environment of the plurality of noise environments, to generate noise-specific, training audio data, andtraining a noise-specific, speech recognition model based at least on the noise-specific, training audio data; and
  
  providing the respective, noise-specific, speech recognition models associated with each of the plurality of noise environments, for output.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the operations further comprise:
    - receiving, from a user, data indicating a selection of the stored noise data, wherein the speech data is received from the user.
  - 11. The system of claim 9, wherein the plurality of noise environments comprises:
    - noise associated with a home,noise associated with a car,noise associated with an office, ornoise associated with a restaurant.
  - 12. The system of claim 9, wherein the operations further comprise:
    - detecting a new noise type; and
      
      storing new noise data that is associated with the new noise type.
  - 13. The system of claim 9, wherein the operations further comprise:
    - detecting a new noise type; and
      
      in response to detecting the new noise type;
      
      prompting a user to provide additional speech data; and
      
      training a noise-specific, speech recognition model based at least on the additional speech data.
  - 14. The system of claim 9, wherein the operations further comprise:
    - receiving additional speech data;
      
      combining the additional speech data with the stored noise data to generate additional noise-specific, training audio data; and
      
      updating the noise-specific, speech recognition model based on the additional noise-specific, training audio data.
  - 15. The system of claim 9, wherein the operations further comprise:
    - receiving additional speech data from a user who provided the speech data; and
      
      after combining the speech data and training the noise-specific, speech recognition model;
      
      combining the additional speech data with the stored noise data to generate additional noise-specific, training audio data; and
      
      updating the noise-specific, speech recognition model based on the additional noise-specific, training audio data.
  - 16. The system of claim 9, wherein the operations further comprise:
    - storing the speech data in a speech data database.

17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving speech data corresponding to an utterance spoken in a particular noise environment;
  
  for each of a plurality of noise environments that are different than the particular noise environment;
  
  combining the speech data with stored noise data that is associated with the noise environment of the plurality of noise environments, to generate noise-specific, training audio data, andtraining a noise-specific, speech recognition model based at least on the noise-specific, training audio data; and
  
  providing the respective, noise-specific, speech recognition models associated with each of the plurality of noise environments, for output.
- View Dependent Claims (18, 19, 20)
- - 18. The medium of claim 17, wherein the operations further comprise:
    - receiving, from a user, data indicating a selection of the stored noise data, wherein the speech data is received from the user.
  - 19. The medium of claim 17, wherein the plurality of noise environments comprises:
    - noise associated with a home,noise associated with a car,noise associated with an office, ornoise associated with a restaurant.
  - 20. The medium of claim 17, wherein the operations further comprise:
    - detecting a new noise type; and
      
      storing new noise data that is associated with the new noise type.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Inventors
Clark, Joel A, Dwyer, Joseph C, Schuster, Adrian M, Singaraju, Snehitha, Zurek, Robert A, Meloney, John R
Primary Examiner(s)
Colucci, Michael

Application Number

US14/094,875
Publication Number

US 20140278420A1
Time in Patent Office

819 Days
Field of Search

704/249, 704/256.4, 704/244, 704/233, 704/231, 704/229, 704/226, 381/86, 379/406.01
US Class Current

1/1
CPC Class Codes

G10L 15/063 Training

G10L 15/20 Speech recognition techniqu...

Method and apparatus for training a voice recognition model database

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for training a voice recognition model database

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links