Natural language generation through character-based recurrent neural networks with finite-state prior knowledge

US 10,049,106 B2
Filed: 01/18/2017
Issued: 08/14/2018
Est. Priority Date: 01/18/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

building a target background model using words occurring in training data, the target background model being adaptable to accept subsequences of an input semantic representation, the training data includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language;

receiving human-generated utterances in the form of speech or text;

predicting a current dialog state of a natural language dialog between a virtual agent and a user, based on the utterances;

generating a semantic representation of a next utterance, based on the current dialog state, the semantic representation including a sequence of characters; and

generating a target sequence in a natural language from the semantic representation, comprising;

after generating the semantic representation, adapting the target background model to form an adapted background model, which accepts all subsequences of the semantic representation;

representing the semantic representation as a sequence of character embeddings;

with an encoder, encoding the character embeddings to generate a set of character representations; and

with a decoder, generating a target sequence of characters, based on the set of character representations, wherein at a plurality of time steps, a next character in the target sequence is a function of a previously generated character of the target sequence and the adapted background model; and

outputting the target sequence.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a system for generating a target character sequence from a semantic representation including a sequence of characters are provided. The method includes adapting a target background model, built from a vocabulary of words, to form an adapted background model. The adapted background model accepts subsequences of an input semantic representation as well as words from the vocabulary. The input semantic representation is represented as a sequence of character embeddings, which are input to an encoder. The encoder encodes each of the character embeddings to generate a respective character representation. A decoder then generates a target sequence of characters, based on the set of character representations. At a plurality of time steps, a next character in the target sequence is selected as a function of a previously generated character(s) of the target sequence and the adapted background model.

32 Citations

View as Search Results

20 Claims

1. A method comprising:
- building a target background model using words occurring in training data, the target background model being adaptable to accept subsequences of an input semantic representation, the training data includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language;
  
  receiving human-generated utterances in the form of speech or text;
  
  predicting a current dialog state of a natural language dialog between a virtual agent and a user, based on the utterances;
  
  generating a semantic representation of a next utterance, based on the current dialog state, the semantic representation including a sequence of characters; and
  
  generating a target sequence in a natural language from the semantic representation, comprising;
  
  after generating the semantic representation, adapting the target background model to form an adapted background model, which accepts all subsequences of the semantic representation;
  
  representing the semantic representation as a sequence of character embeddings;
  
  with an encoder, encoding the character embeddings to generate a set of character representations; and
  
  with a decoder, generating a target sequence of characters, based on the set of character representations, wherein at a plurality of time steps, a next character in the target sequence is a function of a previously generated character of the target sequence and the adapted background model; and
  
  outputting the target sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein at least one of the representing of the semantic representation, adapting of the background model, encoding the character embeddings, and generating a target sequence of characters is performed with a processor.
  - 3. The method of claim 1, wherein the background model comprises a finite state automaton.
  - 4. The method of claim 3 wherein, in the finite state automaton, transitions between states are associated with respective probabilities, and wherein transitions between the target background model and subsequences of the semantic representation are associated with a respective probability which is lower than a probability assigned to a transition between two characters in the semantic representation.
  - 5. The method of claim 1, wherein in generating the target sequence of characters, a next character in the target sequence is also a function of an attention model which biases selection of a next character representation to be input to the decoder towards character representations for a region in the sematic representation.
  - 6. The method of claim 1, wherein the decoder includes a recurrent neural network.
  - 7. The method of claim 6, wherein cells of the recurrent neural network are selected from Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) cells, each of which, except the first, taking as input a hidden state output by the previous cell in the recurrent neural network and a previously-generated embedding of a character in the target sequence and compute a conditional probability distribution for an embedding of a next character in the target sequence.
  - 8. The method of claim 1, wherein the encoder is a bi-directional recurrent neural network.
  - 9. The method of claim 8, wherein the character representations are each a concatenation of hidden states of cells of forward and backward recurrent neural networks, which cells receive as input the respective character embedding.
  - 10. The method of claim 1, further comprising learning a hierarchical model which includes the encoder and decoder using a training set of semantic representations and respective reference sequences.
  - 11. The method of claim 10, further comprising generating the target background model from words present in the training set.
  - 12. The method of claim 1, wherein the characters in the semantic representation and target sequence are drawn from a finite alphabet.
  - 13. The method of claim 1, wherein the finite alphabet consists of no more than 400 characters.
  - 14. A computer program product comprising a non-transitory recording medium storing instructions, which when executed on a computer, causes the computer to perform the method of claim 1.
  - 15. A system comprising memory storing instructions for performing the method of claim 1 and a processor, in communication with the memory, which executes the instructions.

16. A method comprising:
- training a hierarchical model with training data, the training data including training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language, to minimize, over the training data, an error between a realization output by the model and the reference sequence assigned to the training sample, the hierarchical model including an encoder, a decoder, and a target background model, built from a vocabulary of words;
  
  receiving a human-generated utterance in a natural language;
  
  predicting a current dialog state based on the utterance;
  
  generating a semantic representation of a next utterance, based on the current dialog state, the semantic representation including a sequence of characters; and
  
  generating a target sequence in a natural language from the semantic representation, comprising;
  
  adapting the target background model to form an adapted background model, which accepts subsequences of the sequence of characters in the semantic representation;
  
  representing the semantic representation as a sequence of character embeddings;
  
  with the encoder, encoding the character embeddings to generate a set of character representations; and
  
  with the decoder, generating a target sequence of characters, based on the set of character representations, wherein at a plurality of time steps, a next character in the target sequence is a function of a previously generated character of the target sequence and the adapted background model; and
  
  outputting the target sequence,wherein the decoder includes an adaptor part and a background part, the adaptor part defining a first conditional probability distribution for sampling a next character in the target sequence, given the already generated characters, the background part defining a second conditional probability distribution for sampling the next character in the target sequence, given the already generated characters, which is based on the adapted background model, an overall conditional probability distribution for sampling the next character being a function of the first and second conditional probability distributions.
- View Dependent Claims (17)
- - 17. The method of claim 16, wherein the overall conditional probability distribution is a function of a product of the first and second conditional probability distributions.

18. A dialog system comprising:
- a dialog state tracker which predicts a current dialog state based on received human-generated utterances in the form of speech or text;
  
  a dialog manager which generates a semantic representation of a next utterance, based on the current dialog state, the semantic representation including a sequence of characters; and
  
  a system for generation of a target sequence from the semantic representation comprising;
  
  a hierarchical character sequence-to-character sequence model;
  
  a character embedding component which represents the semantic representation as a sequence of character embeddings;
  
  a natural language generation component which inputs the sequence of character embeddings into the hierarchical model;
  
  an output component which outputs the target sequence;
  
  a learning component for training the hierarchical model with training data, the training data including training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language, to minimize, over the training data, an error between a realization output by the model and the reference sequence assigned to the training sample; and
  
  a processor which implements the learning component, dialog state tracker, dialog manager, character embedding component, natural language generation component, and output component;
  
  the hierarchical model comprising;
  
  a target background model, built from a vocabulary of words, which is adapted by copying subsequences of the generated semantic representation, to form an adapted background model, which accepts subsequences of the input semantic representation;
  
  an encoder, which encodes the character embeddings to generate a set of character representations; and
  
  a decoder, which generates a target sequence of characters, based on the set of character representations,wherein at a plurality of time steps, a next character in the target sequence is a function of a previously generated character of the target sequence, and the adapted background model.

19. A method for generating a system for generation of a target sequence of characters from an input semantic representation, the method comprising:
- providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language;
  
  building a target background model using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation;
  
  incorporating the target background model into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level, such that the background model, when adapted to accept subsequences of the input semantic representation, biases the decoder towards outputting a target character sequence comprising at least one of;
  
  words occurring in the training data, andsubsequences of the input semantic representation; and
  
  training the hierarchical model on the training pairs to output a target sequence from an input semantic representation,wherein the decoder includes an adaptor part and a background part, the adaptor part defining a first conditional probability distribution for sampling a next character in the target sequence, given the already generated characters, the background part defining a second conditional probability distribution for sampling the next character in the target sequence, given the already generated characters, which is based on the adapted background model, an overall conditional probability distribution for sampling the next character being a function of the first and second conditional probability distributions, andwherein at least one of the building of the target background model, incorporating the target background model, and training the hierarchical model is performed with a processor.
- View Dependent Claims (20)
- - 20. The method of claim 19, wherein the input semantic representation is a dialog act generated by an automated dialog system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Goyal, Raghav, Dymetman, Marc
Primary Examiner(s)
Jackson, Jakieda

Application Number

US15/408,526
Publication Number

US 20180203852A1
Time in Patent Office

573 Days
Field of Search

704 9
US Class Current
CPC Class Codes

G06F 40/35   Discourse or dialogue repre...

G06F 40/56   Natural language generation

G06N 3/006   based on simulated virtual ...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G10L 15/18   using natural language mode...

Natural language generation through character-based recurrent neural networks with finite-state prior knowledge

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Natural language generation through character-based recurrent neural networks with finite-state prior knowledge

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links