Neural network training apparatus and method, and speech recognition apparatus and method

US 10,529,317 B2
Filed: 11/04/2016
Issued: 01/07/2020
Est. Priority Date: 11/06/2015
Status: Active Grant

First Claim

Patent Images

1. A neural network training apparatus comprising:

a processor comprisinga primary trainer configured toperform a primary training of a neural network model using clean training data and target data corresponding to the clean training data, andgenerate, as an output of the primary training, a probability distribution of an output class for the clean training data;

a mixer configured to create noisy training data by mixing the clean training data and training noise data or adding distorted data to the clean training data; and

a secondary trainer configured to perform a secondary training of the neural network model on which the primary training has been performed using the noisy training data and the probability distribution of the output class for the clean training data that is generated during the primary training of the neural network model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A neural network training apparatus includes a primary trainer configured to perform a primary training of a neural network model based on clean training data and target data corresponding to the clean training data; and a secondary trainer configured to perform a secondary training of the neural network model on which the primary training has been performed based on noisy training data and an output probability distribution of an output class for the clean training data calculated during the primary training of the neural network model.

32 Citations

View as Search Results

20 Claims

1. A neural network training apparatus comprising:
- a processor comprisinga primary trainer configured toperform a primary training of a neural network model using clean training data and target data corresponding to the clean training data, andgenerate, as an output of the primary training, a probability distribution of an output class for the clean training data;
  
  a mixer configured to create noisy training data by mixing the clean training data and training noise data or adding distorted data to the clean training data; and
  
  a secondary trainer configured to perform a secondary training of the neural network model on which the primary training has been performed using the noisy training data and the probability distribution of the output class for the clean training data that is generated during the primary training of the neural network model.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The neural network training apparatus of claim 1, wherein the neural network model is a neural network-based acoustic model.
  - 3. The neural network training apparatus of claim 1, wherein the primary trainer is further configured to perform the primary training using a first objective function that performs training of the neural network model to obtain the target data from the clean training data.
  - 4. The neural network training apparatus of claim 1, wherein the secondary training is performed using a second objective function that is a combination of the probability distribution of the output class for the clean training data and an activation function of an output layer of the neural network model.
  - 5. The neural network training apparatus of claim 1, wherein the secondary trainer is further configured to perform the secondary training using a second objective function that is a weighted sum of an objective function that performs training of the neural network model to obtain the target data from the clean training data and an objective function that is a combination of the probability distribution of the output class for the clean training data that is generated during the primary training of the neural network model and an activation function of an output layer of the neural network model.

6. A neural network training method comprising:
- performing, by a processor, a primary training of a neural network model using clean training data and target data corresponding to the clean training data;
  
  generating, by the processor and as an output of the primary training, a probability distribution of an output class for the clean training data;
  
  creating, by the processor, noisy training data by mixing the clean training data and training noise data or adding distorted data to the clean training data; and
  
  performing, by the processor, a secondary training of the neural network model on which the primary training has been performed using the noisy training data and the probability distribution of the output class for the clean training data that is generated during the primary training of the neural network model.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The neural network training method of claim 6, wherein the neural network model is a neural network-based acoustic model.
  - 8. The neural network training method of claim 6, wherein the performing of the primary training comprises performing the primary training using a first objective function that performs training of the neural network model to obtain the target data from the clean training data.
  - 9. The neural network training method of claim 6, wherein the performing of the secondary training comprises performing the secondary training using a second objective function that is a combination of the probability distribution of the output class for the clean training data that is generated during the primary training of the neural network model and an activation function of an output layer of the neural network model.
  - 10. The neural network training method of claim 6, wherein the performing of the secondary training comprises performing the secondary training using a second objective function that is a weighted sum of an objective function that performs training of the neural network model to obtain the target data from the clean training data and an objective function that is a combination of the probability distribution of the output class for the clean training data that is generated during the primary training of the neural network model and an activation function of an output layer of the neural network model.
  - 11. A computer-readable storage medium storing instructions, that when executed by a processor, cause the processor to perform the method of claim 6.

12. A speech recognition apparatus comprising:
- a processor comprisinga feature extractor configured to extract a feature of noisy speech data; and
  
  a phoneme probability calculator configured to calculate a probability of a phoneme corresponding to the extracted feature using an acoustic model;
  
  wherein the processor is further configured to generate the acoustic model byperforming a primary training based on speech training data and a phoneme sequence corresponding to the speech training data using a first objective function that obtains the phoneme sequence from the speech training data,generating noisy speech training data by mixing the speech training data and training noise data or adding distorted data to the speech training data,performing a secondary training based on the noisy speech training data and a probability distribution of an output class for the speech training data calculated during the primary training of the acoustic model using a second objective function that obtains the probability distribution of the output class for the speech training data from the noisy speech training data.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The speech recognition apparatus of claim 12, wherein the acoustic model is a neural network-based acoustic model.
  - 14. The speech recognition apparatus of claim 12, wherein the first objective function performs training of the acoustic model to obtain a phoneme from the speech training data.
  - 15. The speech recognition apparatus of claim 12, wherein the second objective function is a combination of the probability distribution of the output class for the speech training data calculated during the primary training of the acoustic model and an activation function of an output layer of the acoustic model.
  - 16. The speech recognition apparatus of claim 12, wherein the second objective function is a weighted sum of an objective function that performs training of the acoustic model to obtain a phoneme from the speech training data and an objective function that is a combination of the probability distribution of the output class for the speech training data calculated during the primary training of the acoustic model and an activation function of an output layer of the acoustic model.

17. A neural network training apparatus comprising:
- a processor comprisinga primary trainer configured to perform a primary training of a neural network model using clean training data and hard target data;
  
  a mixer configured to generate noisy training data from the clean training data; and
  
  a secondary trainer configured to perform a secondary training of the neural network model on which the primary training has been performed using the noisy training data and soft target data obtained as an output of the primary training of the neural network model.
- View Dependent Claims (18, 19, 20)
- - 18. The neural network training apparatus of claim 17, wherein the noisy training data is obtained by distorting the clean training data or mixing the clean training data with noise.
  - 19. The neural network training apparatus of claim 17, wherein the soft target data is a probability distribution of an output class for the clean training data calculated during the primary training of the neural network model.
  - 20. The neural network training apparatus of claim 17, wherein the secondary trainer is further configured to perform the secondary training based on the noisy training data, the soft target data, and an activation function of an output layer of the neural network model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Lee, Ho Shik, Choi, Hee Youl
Primary Examiner(s)
Colucci, Michael C

Application Number

US15/344,110
Publication Number

US 20170133006A1
Time in Patent Office

1,159 Days
Field of Search

704 9, 704 4, 704246, 704245, 704239, 704233, 704235, 704232, 704500, 4554561, 381 941
US Class Current
CPC Class Codes

G06N 3/084   Backpropagation, e.g. using...

G10L 15/02   Feature extraction for spee...

G10L 15/063   Training

G10L 15/16   using artificial neural net...

G10L 15/20   Speech recognition techniqu...

G10L 2015/025   Phonemes, fenemes or fenone...

Neural network training apparatus and method, and speech recognition apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

32 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Neural network training apparatus and method, and speech recognition apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links