Speaker adaptation of neural network acoustic models using I-vectors

US 9,858,919 B2
Filed: 09/29/2014
Issued: 01/02/2018
Est. Priority Date: 11/27/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

providing a deep neural network acoustic model;

receiving audio data including one or more utterances of a speaker;

extracting a plurality of speech recognition features from the one or more utterances of the speaker;

creating a speaker identity vector for the speaker based on the speech recognition features extracted from the one or more utterances of the speaker;

performing, by a computer system, an automatic speech recognition using the speech recognition features extracted from the one or more utterances of the speaker and the speaker identity vector by executing the deep neural network acoustic model; and

adapting the deep neural network acoustic model executing on the computer system performing the automatic speech recognition using the speech recognition features extracted from the one or more utterances of the speaker and the speaker identity vector, wherein adapting the deep neural network acoustic model further comprises concatenating the speaker identity vector to each of the speech recognition features extracted from the one or more utterances of the speakers to form an input to the deep neural network acoustic model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method includes providing a deep neural network acoustic model, receiving audio data including one or more utterances of a speaker, extracting a plurality of speech recognition features from the one or more utterances of the speaker, creating a speaker identity vector for the speaker based on the extracted speech recognition features, and adapting the deep neural network acoustic model for automatic speech recognition using the extracted speech recognition features and the speaker identity vector.

14 Citations

View as Search Results

10 Claims

1. A method comprising:
- providing a deep neural network acoustic model;
  
  receiving audio data including one or more utterances of a speaker;
  
  extracting a plurality of speech recognition features from the one or more utterances of the speaker;
  
  creating a speaker identity vector for the speaker based on the speech recognition features extracted from the one or more utterances of the speaker;
  
  performing, by a computer system, an automatic speech recognition using the speech recognition features extracted from the one or more utterances of the speaker and the speaker identity vector by executing the deep neural network acoustic model; and
  
  adapting the deep neural network acoustic model executing on the computer system performing the automatic speech recognition using the speech recognition features extracted from the one or more utterances of the speaker and the speaker identity vector, wherein adapting the deep neural network acoustic model further comprises concatenating the speaker identity vector to each of the speech recognition features extracted from the one or more utterances of the speakers to form an input to the deep neural network acoustic model.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the speaker identity vector encapsulates information about an identity of the speaker in a low-dimensional fixed-length representation.
  - 3. The method of claim 1, wherein adapting the deep neural network acoustic model further comprises:
    - training a speaker-independent Gaussian Mixture Model; and
      
      aligning the audio data to the speaker-independent Gaussian Mixture Model to determine zero-order statistics and first-order statistics.
  - 4. The method of claim 1, further comprising clustering a plurality of speakers using respective speaker identity vectors.
  - 5. The method of claim 1, further comprising clustering a plurality of utterances using respective speaker identity vectors.

6. A computer program product for adapting deep neural network acoustic models for automatic speech recognition, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising:
- providing a deep neural network acoustic model;
  
  receiving audio data including one or more utterances of a speaker;
  
  extracting a plurality of speech recognition features from the one or more utterances of the speaker;
  
  creating a speaker identity vector for the speaker based on the speech recognition features extracted from the one or more utterances of the speaker;
  
  performing, by the processor, an automatic speech recognition using the speech recognition features extracted from the one or more utterances of the speaker and the speaker identity vector by executing the deep neural network acoustic model; and
  
  adapting the deep neural network acoustic model executing on a computer system the processor performing the automatic speech recognition using the speech recognition features extracted from the one or more utterances of the speaker and the speaker identity vector, wherein adapting the deep neural network acoustic model further comprises concatenating the speaker identity vector to each of the speech recognition features extracted from the one or more utterances of the speakers to form an input to the deep neural network acoustic model.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The computer program product of claim 6, wherein the speaker identity vector encapsulates information about an identity of the speaker in a low-dimensional fixed-length representation.
  - 8. The computer program product of claim 6, wherein adapting the deep neural network acoustic model further comprises:
    - training a speaker-independent Gaussian Mixture Model; and
      
      aligning the audio data to the speaker-independent Gaussian Mixture Model to determine zero-order statistics and first-order statistics.
  - 9. The computer program product of claim 6, further comprising clustering a plurality of speakers using respective speaker identity vectors.
  - 10. The computer program product of claim 6, further comprising clustering a plurality of utterances using respective speaker identity vectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Saon, George A.
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
CHAVEZ, RODRIGO A

Application Number

US14/500,042
Publication Number

US 20150149165A1
Time in Patent Office

1,191 Days
Field of Search

704232
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/16   using artificial neural net...

G10L 17/18   Artificial neural networks;...

Speaker adaptation of neural network acoustic models using I-vectors

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

14 Citations

10 Claims

Specification

Use Cases

Quick Links

Others

Speaker adaptation of neural network acoustic models using I-vectors

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

10 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others