MODEL LEARNING DEVICE, METHOD THEREFOR, AND PROGRAM

US 20190244604A1
Filed: 09/05/2017
Published: 08/08/2019
Est. Priority Date: 09/16/2016
Status: Active Grant

First Claim

Patent Images

1. A model learning device, comprisingan initial value setting part that uses a parameter of a learned first model including a neural network to set a parameter of a second model including a neural network having a same network structure as the first model;

a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using features obtained from learning data and the first model;

a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using features obtained from the learning data and the second model; and

a modified model update part that calculates a second loss function from correct information corresponding to the learning data and from the second output probability distribution, calculates a cross entropy between the first output probability distribution and the second output probability distribution, obtains a weighted sum of the second loss function and the cross entropy, and updates the parameter of the second model so as to reduce the weighted sum.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A model learning device comprises: an initial value setting part that uses a parameter of a learned first model including a neural network to set a parameter of a second model including a neural network having a same network structure as the first model; a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using learning features and the first model; a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using learning features and the second model; and a modified model update part that obtains a weighted sum of a second loss function calculated from correct information and from the second output probability distribution, and a cross entropy between the first output probability distribution and the second output probability distribution, and updates the parameter of the second model so as to reduce the weighted sum.

17 Citations

9 Claims

1. A model learning device, comprisingan initial value setting part that uses a parameter of a learned first model including a neural network to set a parameter of a second model including a neural network having a same network structure as the first model;
- a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using features obtained from learning data and the first model;
  
  a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using features obtained from the learning data and the second model; and
  
  a modified model update part that calculates a second loss function from correct information corresponding to the learning data and from the second output probability distribution, calculates a cross entropy between the first output probability distribution and the second output probability distribution, obtains a weighted sum of the second loss function and the cross entropy, and updates the parameter of the second model so as to reduce the weighted sum.
- View Dependent Claims (5, 8, 9)
- - 5. The model learning device according to any one of claims 1 to 4,wherein the second output probability distribution calculating part receives a smoothing parameter that is a real value larger than zero, and obtains the second output probability distribution so as to approach a uniform distribution with increase in the smoothing parameter.
  - 8. A program for causing a computer to function as the model learning device according to any of claims 1 to 4.
  - 9. A program for causing a computer to function as the model learning device according to claim 5.

2. A model learning device, comprising:
- an initial value setting part that uses a parameter of a learned first acoustic model including a neural network to set a parameter of a second acoustic model including a neural network having a same network structure as the first acoustic model;
  
  a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using features obtained from a learning acoustic signal and the first acoustic model;
  
  a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using features obtained from a learning acoustic signal and the second acoustic model; and
  
  a modified model update part that calculates a second loss function from a correct unit number corresponding to the learning acoustic signal and from the second output probability distribution, calculates a cross entropy between the first output probability distribution and the second output probability distribution, obtains a weighted sum of the second loss function and the cross entropy, and updates the parameter of the second acoustic model so as to reduce the weighted sum.

3. A model learning device, comprising:
- an initial value setting part that uses a parameter of a learned first language model including a neural network to set a parameter of a second language model including a neural network having a same network structure as the first language model;
  
  a first output probability distribution calculating part that calculates a first output probability distribution including a distribution of an output probability of each unit on an output layer, using a word history that is a word string obtained from learning text data, and the first language model;
  
  a second output probability distribution calculating part that calculates a second output probability distribution including a distribution of an output probability of each unit on the output layer, using a word history that is a word string obtained from learning text data, and the second language model; and
  
  a modified model update part that calculates a second loss function from a correct word corresponding to the learning word history and from the second output probability distribution, calculates a cross entropy between the first output probability distribution and the second output probability distribution, obtains a weighted sum of the second loss function and the cross entropy, and updates the parameter of the second language model so as to reduce the weighted sum.
- View Dependent Claims (4)
- - 4. The model learning device according to claim 3,wherein the first language model and the second language model are each a class RNN language model,the model learning device comprises:
    - a first class output probability distribution calculating part that calculates a first class output probability distribution including a distribution of an class output probability of each unit on a class output layer, using a word history that is a word string obtained from learning text data, and the first language model; and
      
      a second class output probability distribution calculating part that calculates a second class output probability distribution including a distribution of an class output probability of each unit on a class output layer, using a word history that is a word string obtained from learning text data, and the second language model, andthe modified model update part obtains a cross entropy CW1 on the output layer from the first output probability distribution, obtains a cross entropy CC1 on the class output layer from the first class output probability distribution, obtains a cross entropy CW2 on the output layer from the second output probability distribution, obtains a cross entropy CC2 on the class output layer from the second class output probability distribution, obtains a cross entropy CW between the cross entropy CW1 and the cross entropy CW2, and a cross entropy CC between the cross entropy CC1 and the cross entropy CC2, and updates the parameter of the second language model so as to reduce values of the obtained cross entropy CW and the cross entropy CC.

6. A model learning method, comprising:
- an initial value setting step of using a parameter of a learned first model including a neural network to set a parameter of a second model including a neural network having a same network structure as the first model;
  
  a first output probability distribution calculating step of calculating a first output probability distribution including a distribution of an output probability of each unit on an output layer, using features obtained from learning data and the first model;
  
  a second output probability distribution calculating step of calculating a second output probability distribution including a distribution of an output probability of each unit on the output layer, using features obtained from the learning data and the second model; and
  
  a modified model update step of calculating a second loss function from correct information corresponding to the learning data and from the second output probability distribution, of calculating a cross entropy between the first output probability distribution and the second output probability distribution, of obtaining a weighted sum of the second loss function and the cross entropy, and of updating the parameter of the second model so as to reduce the weighted sum.
- View Dependent Claims (7)
- - 7. The model learning method according to claim 6,wherein the second output probability distribution calculating step receives a smoothing parameter that is a real value larger than zero, and obtains the second output probability distribution so as to approach a uniform distribution with increase in the smoothing parameter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nippon Telegraph and Telephone Corporation
Original Assignee
Nippon Telegraph and Telephone Corporation
Inventors
MASATAKI, Hirokazu, ASAMI, Taichi, NAKAMURA, Takashi, MASUMURA, Ryo

Granted Patent

US 11,081,105 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/30   Semantic analysis

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/049   Temporal neural networks, e...

G06N 3/08   Learning methods

G06N 99/00   Subject matter not provided...

G10L 15/06   Creation of reference templ...

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/16   using artificial neural net...

G10L 15/183   using context dependencies,...

G10L 2015/0635   updating or merging of old ...

MODEL LEARNING DEVICE, METHOD THEREFOR, AND PROGRAM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

17 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

MODEL LEARNING DEVICE, METHOD THEREFOR, AND PROGRAM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others