Abstractive sentence summarization

US 10,402,495 B1
Filed: 09/01/2017
Issued: 09/03/2019
Est. Priority Date: 09/01/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising, by one or more computer server devices:

receiving an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words;

encoding each of the input words as an indicator vector, wherein the indicator vector captures features of the input word and a sequence of the indicator vectors captures features of the sequence of the input words;

mapping, using a neural network language model, the sequence of the indicator vectors to a distribution of a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words;

for each of one or more next output words of the sequence of output words;

encoding the sequence of the indicator vectors with a context, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and

mapping the encoded sequence of the indicator vectors and the context to the distribution of the contextual probability of the next output word; and

generating a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, a sequence of input words is received. Each of the input words is encoded as an indicator vector, wherein a sequence of the indicator vectors captures features of the sequence of input words. The sequence of the indicator vectors is then mapped to a distribution of a contextual probability of a first output word in a sequence of output words. For each subsequent output word, the sequence of the indicator vectors is encoded with a context, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous output words; and the encoded sequence of the indicator vectors and the context is mapped to the distribution of the contextual probability of the subsequent output word. Finally, a condensed summary is generated using a decoder by maximizing the contextual probability of each of the output words.

31 Citations

View as Search Results

18 Claims

1. A method comprising, by one or more computer server devices:
- receiving an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words;
  
  encoding each of the input words as an indicator vector, wherein the indicator vector captures features of the input word and a sequence of the indicator vectors captures features of the sequence of the input words;
  
  mapping, using a neural network language model, the sequence of the indicator vectors to a distribution of a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words;
  
  for each of one or more next output words of the sequence of output words;
  
  encoding the sequence of the indicator vectors with a context, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and
  
  mapping the encoded sequence of the indicator vectors and the context to the distribution of the contextual probability of the next output word; and
  
  generating a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the encoding comprises using an attention-based encoder which is used to find a latent soft alignment between the indicator vectors and the context, and wherein the latent soft alignment points to a position in the sequence of the indicator vectors where a block of highly relevant information for generating the summary is concentrated.
  - 3. The method of claim 1, wherein a number of the output words in the sequence of the output words is pre-determined.
  - 4. The method of claim 1, wherein the decoder is a Viterbi decoder that finds an exact solution by searching through an entire distribution of the contextual probability.
  - 5. The method of claim 1, wherein the decoder is a beam search decoder that finds an approximate solution by searching through a limited distribution of the contextual probability.
  - 6. The method of claim 1, further comprising modifying a scoring function to find extractive word matches from the input sentences by directly estimating the contextual probability using a log-linear model.

7. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
- receive an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words;
  
  encode each of the input words as an indicator vector, wherein the indicator vector captures features of the input word and a sequence of the indicator vectors captures features of the sequence of the input words;
  
  map, using a neural network language model, the sequence of the indicator vectors to a distribution of a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words;
  
  for each of one or more next output words of the sequence of output words;
  
  encode the sequence of the indicator vectors with a context, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and
  
  map the encoded sequence of the indicator vectors and the context to the distribution of the contextual probability of the next output word; and
  
  generate a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The media of claim 7, wherein the encoding comprises using an attention-based encoder which is used to find a latent soft alignment between the indicator vectors and the context, and wherein the latent soft alignment points to a position in the sequence of the indicator vectors where a block of highly relevant information for generating the summary is concentrated.
  - 9. The media of claim 7, wherein a number of the output words in the sequence of the output words is pre-determined.
  - 10. The media of claim 7, wherein the decoder is a Viterbi decoder that finds an exact solution by searching through an entire distribution of the contextual probability.
  - 11. The media of claim 7, wherein the decoder is a beam search decoder that finds an approximate solution by searching through a limited distribution of the contextual probability.
  - 12. The media of claim 7, further embodying software that is operable when executed to:
    - modify a scoring function to find extractive word matches from the input sentences by directly estimating the contextual probability using a log-linear model.

13. A system comprising:
- one or more processors; and
  
  a memory coupled to the processors comprising instructions executable by the processors, the processors being operable when executing the instructions to;
  
  receive an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words;
  
  encode each of the input words as an indicator vector, wherein the indicator vector captures features of the input word and a sequence of the indicator vectors captures features of the sequence of the input words;
  
  map, using a neural network language model, the sequence of the indicator vectors to a distribution of a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words;
  
  for each of one or more next output words of the sequence of output words;
  
  encode the sequence of the indicator vectors with a context, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and
  
  map the encoded sequence of the indicator vectors and the context to the distribution of the contextual probability of the next output word; and
  
  generate a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The system of claim 13, wherein the encoding comprises using an attention-based encoder which is used to find a latent soft alignment between the indicator vectors and the context, and wherein the latent soft alignment points to a position in the sequence of the indicator vectors where a block of highly relevant information for generating the summary is concentrated.
  - 15. The system of claim 13, wherein a number of the output words in the sequence of the output words is pre-determined.
  - 16. The system of claim 13, wherein the decoder is a Viterbi decoder that finds an exact solution by searching through an entire distribution of the contextual probability.
  - 17. The system of claim 13, wherein the decoder is a beam search decoder that finds an approximate solution by searching through a limited distribution of the contextual probability.
  - 18. The system of claim 13, wherein the processors are further operable when executing the instructions to:
    - modify a scoring function to find extractive word matches from the input sentences by directly estimating the contextual probability using a log-linear model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Rush, Alexander Matthew, Chopra, Sumit, Weston, Jason Edward
Primary Examiner(s)
Singh, Satwant K

Application Number

US15/694,031
Time in Patent Office

732 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/30   Semantic analysis

G06F 40/44   Statistical methods, e.g. p...

G10L 15/197   Probabilistic grammars, e.g...

G10L 25/30   using neural networks

Abstractive sentence summarization

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Abstractive sentence summarization

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others