ABSTRACTIVE SENTENCE SUMMARIZATION
First Claim
1. A method comprising, by one or more computer server devices:
- receiving an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words;
determining, using a neural network language model, a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words;
for each of one or more next output words of the sequence of output words;
determining a context of a sequence of the input words, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and
determining the contextual probability of the next output word; and
generating a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, a sequence of input words is received. Each of the input words is encoded as an indicator vector, wherein a sequence of the indicator vectors captures features of the sequence of input words. The sequence of the indicator vectors is then mapped to a distribution of a contextual probability of a first output word in a sequence of output words. For each subsequent output word, the sequence of the indicator vectors is encoded with a context, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous output words; and the encoded sequence of the indicator vectors and the context is mapped to the distribution of the contextual probability of the subsequent output word. Finally, a condensed summary is generated using a decoder by maximizing the contextual probability of each of the output words.
16 Citations
20 Claims
-
1. A method comprising, by one or more computer server devices:
-
receiving an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words; determining, using a neural network language model, a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words; for each of one or more next output words of the sequence of output words; determining a context of a sequence of the input words, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and determining the contextual probability of the next output word; and generating a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.
-
-
2. The method of claim 1, further comprising:
encoding each of the input words as an indicator vector, wherein the indicator vector captures features of the input word and a sequence of the indicator vectors captures features of the sequence of the input words, wherein the determining the context of the sequence of the input words comprises encoding the sequence of the indicator vectors with the context.
-
3. The method of claim 2, wherein determining the contextual probability of the first output word comprises mapping the sequence of the indicator vectors to a distribution of the contextual probability of the next output word.
-
4. The method of claim 3, wherein determining the contextual probability of the next output word comprises mapping the encoded sequence of the indicator vectors and the context to the distribution of the contextual probability of the next output word.
-
5. The method of claim 2, wherein the encoding comprises using an attention-based encoder which is used to find a latent soft alignment between the indicator vectors and the context, and wherein the latent soft alignment points to a position in the sequence of the indicator vectors where a block of highly relevant information for generating the summary is concentrated.
-
6. The method of claim 1, wherein a number of the output words in the sequence of the output words is pre-determined.
-
7. The method of claim 1, wherein the decoder is a Viterbi decoder that finds an exact solution by searching through an entire distribution of the contextual probability.
-
8. The method of claim 1, wherein the decoder is a beam search decoder that finds an approximate solution by searching through a limited distribution of the contextual probability.
-
9. The method of claim 1, further comprising modifying a scoring function to find extractive word matches from the input sentences by directly estimating the contextual probability using a log-linear model.
-
10. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
-
receive an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words; determine, using a neural network language model, a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words; for each of one or more next output words of the sequence of output words; determine a context of a sequence of the input words, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and determine the contextual probability of the next output word; and generate a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.
-
-
11. The non-transitory storage media of claim 10, wherein the software is further operable when executed to:
encoding each of the input words as an indicator vector, wherein the indicator vector captures features of the input word and a sequence of the indicator vectors captures features of the sequence of the input words, wherein the determining the context of the sequence of the input words comprises encoding the sequence of the indicator vectors with the context.
-
12. The non-transitory storage media of claim 11, wherein the software operable to determine the contextual probability of the first output word comprises software operable to map the sequence of the indicator vectors to a distribution of the contextual probability of the next output word.
-
13. The non-transitory storage media of claim 12, wherein the software operable to determine the contextual probability of the next output word comprises the software operable to map the encoded sequence of the indicator vectors and the context to the distribution of the contextual probability of the next output word.
-
14. The non-transitory storage media of claim 11, wherein the software operable to encode uses an attention-based encoder to find a latent soft alignment between the indicator vectors and the context, and wherein the latent soft alignment points to a position in the sequence of the indicator vectors where a block of highly relevant information for generating the summary is concentrated.
-
15. The non-transitory storage media of claim 10, wherein a number of the output words in the sequence of the output words is pre-determined.
-
16. The non-transitory storage media of claim 10, wherein the decoder is a Viterbi decoder that finds an exact solution by searching through an entire distribution of the contextual probability.
-
17. The non-transitory storage media of claim 10, wherein the decoder is a beam search decoder that finds an approximate solution by searching through a limited distribution of the contextual probability.
-
18. The non-transitory storage media of claim 10, further comprising software operable to modify a scoring function to find extractive word matches from the input sentences by directly estimating the contextual probability using a log-linear model.
-
19. A system comprising:
- one or more processors; and
a memory coupled to the processors comprising instructions executable by the processors, the processors being operable when executing the instructions to;receive an input sentence comprising a sequence of input words, wherein a set of words in a vocabulary comprises the input words; determine, using a neural network language model, a contextual probability of a first output word in a sequence of output words, wherein the set of words in the vocabulary comprises the output words; for each of one or more next output words of the sequence of output words; determine a context of a sequence of the input words, wherein the context comprises a previously mapped contextual probability distribution of a fixed window of previous ones of the output words; and determine the contextual probability of the next output word; and generate a condensed summary using a decoder by maximizing the contextual probability of each of the output words in the sequence of the output words.
- one or more processors; and
-
20. The system of claim 19, the processors being further operable when executing the instructions to:
encode each of the input words as an indicator vector, wherein the indicator vector captures features of the input word and a sequence of the indicator vectors captures features of the sequence of the input words, wherein the context of the sequence of the input words is determined by encoding the sequence of the indicator vectors with the context.
Specification