Text classification and transformation based on author

US 10,083,157 B2
Filed: 08/05/2016
Issued: 09/25/2018
Est. Priority Date: 08/07/2015
Status: Active Grant

First Claim

Patent Images

1. A method performed by a system comprising one or more computers to generate an output text in a style of a requested author from an input text, wherein the output text and the input text are written in a same natural language, the system comprising an encoder language model and a decoder language model, wherein:

the encoder and decoder language model have been trained with text from multiple authors, the text from multiple authors comprising a plurality of training texts;

as a result of training, the encoder language model stores data representing words occurring in the plurality of training texts from the multiple authors as respective vectors, wherein each vector represents a respective distribution of contexts in the plurality of training texts of a respective word from the plurality of training texts;

as a result of training, the decoder language model (i) stores the distributions of contexts of words used by particular respective authors in the plurality of training texts and (ii) is configured to perform a transformation of a stream of vectors from the encoder language model to generate text in the natural language according to distributions of contexts of words used by a decoder author, the decoder author being one of the multiple authors;

the encoder and decoder language model have been trained by performing the following operations for each of multiple training input texts each having a respective author;

presenting each training input text to the encoder language model;

receiving from the encoder language model a training vector stream representing the training input text, wherein the training vector stream includes vectors that are each (i) associated with a word from the input text and (ii) based on the distribution of contexts of the word in the plurality of training texts;

presenting the training vector stream, an author of the training input text, and the training input text to the decoder language model;

receiving a respective decoder output training text from the decoder language model based on the author, the training input text, and the training vector stream;

comparing the decoder output of the decoder language model with an expected output for the author and the training input text, wherein the expected output is the training input text;

if the comparing indicates a difference for a particular author, indicating an error; and

in the case of an error, updating the decoder language model, including updating the decoder language model'"'"'s representation of vectors in the training vector stream, and back-propagating the error to the encoder language model, which updates a representation of the encoder language model;

the method using the encoder language model and the decoder language model after training, the method comprising;

receiving an input text including one or more words and a name of a requested author, wherein the requested author is one of the multiple authors;

generating a vector stream of vectors by the encoder language model, each vector in the vector stream representing the distribution of contexts in which a respective word of the input text appears in training input texts; and

producing an output text from the vector stream by the decoder language model according to the distributions of contexts of words used by the requested author, whereby the output text is a transformation of the input text to a style of the requested author.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for transforming and classifying text based on analysis of training texts from particular authors. One of the methods includes receiving an input text including one or more words and a requested author; generating a vector stream representing the input text based on an encoder language model and including one or more multi-dimensional vectors associated with associated words of the words of the input text and representing a distribution of contexts in which the associated words occurred in a plurality of training texts; and producing an output text representing a particular transformation of the input text based at least in part on a decoder language model, the generated vector stream, and the requested author.

Citations

13 Claims

1. A method performed by a system comprising one or more computers to generate an output text in a style of a requested author from an input text, wherein the output text and the input text are written in a same natural language, the system comprising an encoder language model and a decoder language model, wherein:
- the encoder and decoder language model have been trained with text from multiple authors, the text from multiple authors comprising a plurality of training texts;
  
  as a result of training, the encoder language model stores data representing words occurring in the plurality of training texts from the multiple authors as respective vectors, wherein each vector represents a respective distribution of contexts in the plurality of training texts of a respective word from the plurality of training texts;
  
  as a result of training, the decoder language model (i) stores the distributions of contexts of words used by particular respective authors in the plurality of training texts and (ii) is configured to perform a transformation of a stream of vectors from the encoder language model to generate text in the natural language according to distributions of contexts of words used by a decoder author, the decoder author being one of the multiple authors;
  
  the encoder and decoder language model have been trained by performing the following operations for each of multiple training input texts each having a respective author;
  
  presenting each training input text to the encoder language model;
  
  receiving from the encoder language model a training vector stream representing the training input text, wherein the training vector stream includes vectors that are each (i) associated with a word from the input text and (ii) based on the distribution of contexts of the word in the plurality of training texts;
  
  presenting the training vector stream, an author of the training input text, and the training input text to the decoder language model;
  
  receiving a respective decoder output training text from the decoder language model based on the author, the training input text, and the training vector stream;
  
  comparing the decoder output of the decoder language model with an expected output for the author and the training input text, wherein the expected output is the training input text;
  
  if the comparing indicates a difference for a particular author, indicating an error; and
  
  in the case of an error, updating the decoder language model, including updating the decoder language model'"'"'s representation of vectors in the training vector stream, and back-propagating the error to the encoder language model, which updates a representation of the encoder language model;
  
  the method using the encoder language model and the decoder language model after training, the method comprising;
  
  receiving an input text including one or more words and a name of a requested author, wherein the requested author is one of the multiple authors;
  
  generating a vector stream of vectors by the encoder language model, each vector in the vector stream representing the distribution of contexts in which a respective word of the input text appears in training input texts; and
  
  producing an output text from the vector stream by the decoder language model according to the distributions of contexts of words used by the requested author, whereby the output text is a transformation of the input text to a style of the requested author.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the requested author includes one or more co-authors.
  - 3. The method of claim 1, wherein the requested author is an anonymous author associated with training texts for which an author is not known.
  - 4. The method of claim 1, further comprising receiving a name of an original author of the input text, wherein producing the output text is performed based at least in part on the original author.
  - 5. The method of claim 1, wherein the encoder language model and decoder language models are artificial neural network models.

6. A method performed by a system comprising one or more computers to produce a classification of an input text, the system comprising an encoder language model and one classification decoder, wherein:
- the encoder language model and the classification decoder have been trained with text from multiple authors, the text from multiple authors comprising a plurality of training texts;
  
  as a result of training, the encoder language model stores data representing words occurring in the plurality of training texts from the multiple authors as respective vectors, wherein each vector represents a respective distribution of contexts in the plurality of training texts of a respective word from the plurality of training texts;
  
  as a result of training, the classification decoder (i) stores the distributions of contexts of words used by particular respective authors in the plurality of training texts and (ii) is configured to classify the input text based on distributions of contexts of words used by an author of the input text, the author of the input text being one of the multiple authors;
  
  the encoder language model and the classification decoder have been trained by performing the following operations for each of multiple training input texts each having a respective author;
  
  presenting each training input text to the encoder language model;
  
  receiving from the encoder language model a training vector stream representing the training input text, wherein the training vector stream includes vectors that are each (i) associated with a word from the input text and (ii) based on the distribution of contexts of the word in the plurality of training texts;
  
  presenting the training vector stream, an author of the training input text, and the training input text to the classification decoder;
  
  receiving a classification from the classification decoder based on the author, the training input text, and the training vector stream;
  
  comparing the classification of the classification decoder with an expected classification for the author and the training input text;
  
  if the comparing indicates a difference, indicating an error;
  
  in the case of an error, updating the classification decoder, including updating the classification decoder'"'"'s representation of vectors in the training vector stream, and back-propagating the error to the encoder language model, which updates a representation of the encoder language model;
  
  the method using the encoder language model and the classification decoder after training, the method comprising;
  
  receiving an input text, wherein the author of the input text is one of the multiple authors;
  
  generating a vector stream of vectors by the encoder language model, each vector in the vector stream representing the distribution of contexts in which a respective word of the input text appears in the training input texts; and
  
  producing a classification of the input text from the vector stream by the classification decoder according to the distributions of contexts of words used by the authors of the training texts.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The method of claim 6, wherein the classification of the input text includes a satire indication, a non-satire indication, a predicted author indication, or a relevance indication.
  - 8. The method of claim 6, wherein the author of the input text is one or more co-authors, and wherein the classification of the input text includes an indication of each predicted co-author.
  - 9. The method of claim 6, wherein the author of the input text is an anonymous author.
  - 10. The method of claim 6, further comprising receiving a name of an original author of the input text, wherein producing the classification of the input text is performed based at least in part of the original author.
  - 11. The method of claim 6, wherein the encoder language model and classification decoder are artificial neural network models.

12. A system for generating an output text in a style of a requested author from an input text, wherein the output text and the input text are written in a same natural language, the system comprising:
- memory for storing data and one or more processors, the memory and processors configured to run an encoder language model and a decoder language models, wherein;
  
  the encoder and decoder language model have been trained with text from multiple authors, the text from multiple authors comprising a plurality of training texts;
  
  as a result of training, the encoder language model stores data representing words occurring in the plurality of training texts from the multiple authors as respective vectors, wherein each vector represents a respective distribution of contexts in the plurality of training texts of a respective word from the plurality of training texts;
  
  as a result of training, the decoder language model (i) stores the distributions of contexts of words used by particular respective authors in the plurality of training texts and (ii) is configured to perform a transformation of a stream of vectors from encoder language model to generate text in the natural language according to distributions of contexts of words used by a decoder author, the decoder author being one of the multiple authors;
  
  the encoder and decoder language model have been trained by performing the following operations for each of multiple training input texts each having a respective author;
  
  presenting each training input text to the encoder language model;
  
  receiving from the encoder language model a training vector stream representing the training input text, wherein the training vector stream includes vectors that are each (i) associated with a word from the input text and (ii) based on the distribution of contexts of the word in the plurality of training texts;
  
  presenting the training vector stream, an author of the training input text, and the training input text to the decoder language model;
  
  receiving a respective decoder output training text from the decoder language model based on the author, the training input text, and the training vector stream;
  
  comparing the decoder output of the decoder language model with an expected output for the author and the training input text, wherein the expected output is the training input text;
  
  if the comparing indicates a difference for a particular author, indicating an error;
  
  in the case of an error, updating the decoder language model, including updating the decoder language model'"'"'s representation of vectors in the training vector stream, and back-propagating the error to the encoder language model, which updates a representation of the encoder language model;
  
  the system operable to perform operations, using the encoder language model and the decoder language model after training, comprising;
  
  receiving an input text including one or more words and a name of a requested author, wherein the requested author is one of the multiple authors;
  
  generating a vector stream of vectors by the encoder language model, each vector in the vector stream representing the distribution of contexts in which a respective word of the input text appears in the training input texts; and
  
  producing an output text from the vector stream by the decoder language model according to the distribution of contexts of words used by the requested author, whereby the output text is a transformation of the input text to a style of the requested author.
- View Dependent Claims (13)
- - 13. The system of claim 12, wherein the encoder language model and decoder language model are Word2Vec models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Strope, Brian Patrick, Henderson, Matthew Steedman
Primary Examiner(s)
Tsui, Wilson W

Application Number

US15/229,743
Publication Number

US 20170039174A1
Time in Patent Office

781 Days
Field of Search

715271
US Class Current
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 40/151   Transformation

G06F 40/166   Editing, e.g. inserting or ...

G06F 40/253   Grammatical analysis; Style...

G06F 40/40   Processing or translation o...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

Text classification and transformation based on author

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Text classification and transformation based on author

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links