Thematic segmentation of long content using deep learning and contextual cues
First Claim
1. A system comprising:
- a memory that stores instructions;
one or more processors configured by the instructions to perform operations comprising;
accessing a plurality of sentences embedded in a file that includes a paragraph change indicator at a position within the plurality of sentences;
generating a plurality of sentence vectors, each sentence vector of the plurality of sentence vectors corresponding to a respective sentence of the plurality of sentences;
providing a subset of the plurality of sentence vectors as an input to a recurrent neural network (RNN);
based on the position of the paragraph change indicator and an output of the RNN responsive to the input, determining that a subset of the plurality of sentences relate to a first topic; and
providing an output comprising the subset of the plurality of sentences related to the first topic.
1 Assignment
0 Petitions
Accused Products
Abstract
A recurrent neural network (RNN) is trained to identify split positions in long content, wherein each split position is a position at which the theme of the long content changes. Each sentence in the long content is converted to a vector that corresponds to the meaning of the sentence. The sentence vectors are used as inputs to the RNN. The high-probability split points determined by the RNN may be combined with contextual cues to determine the actual split point to use. The split points are used to generate thematic segments of the long content. The multiple thematic segments may be presented to a user along with a topic label for each thematic segment. Each topic label may be generated based on the words contained in the corresponding thematic segment.
-
Citations
20 Claims
-
1. A system comprising:
-
a memory that stores instructions; one or more processors configured by the instructions to perform operations comprising; accessing a plurality of sentences embedded in a file that includes a paragraph change indicator at a position within the plurality of sentences; generating a plurality of sentence vectors, each sentence vector of the plurality of sentence vectors corresponding to a respective sentence of the plurality of sentences; providing a subset of the plurality of sentence vectors as an input to a recurrent neural network (RNN); based on the position of the paragraph change indicator and an output of the RNN responsive to the input, determining that a subset of the plurality of sentences relate to a first topic; and providing an output comprising the subset of the plurality of sentences related to the first topic. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
accessing, by one or more processors, a plurality of sentences embedded in a file that includes a paragraph change indicator at a position within the plurality of sentences; generating, by the one or more processors, a plurality of sentence vectors, each sentence vector of the plurality of sentence vectors corresponding to a respective sentence of the plurality of sentences; providing, by the one or more processors, a subset of the plurality of sentence vectors as an input to a recurrent neural network (RNN); based on the position of the paragraph change indicator and an output of the RNN responsive to the input, determining, by the one or more processors, that a subset of the plurality of sentences relate to a first topic; and providing, by the one or more processors, an output comprising the subset of the plurality of sentences related to the first topic. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory machine-readable storage medium comprising instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising:
-
accessing, by one or more processors, a plurality of sentences embedded in a file that includes a paragraph change indicator at a position within the plurality of sentences; generating, by the one or more processors, a plurality of sentence vectors, each sentence vector of the plurality of sentence vectors corresponding to a respective sentence of the plurality of sentences; providing, by the one or more processors, a subset of the plurality of sentence vectors as an input to a recurrent neural network (RNN); based on the position of the paragraph change indicator and an output of the RNN responsive to the input, determining, by the one or more processors, that a subset of the plurality of sentences relate to a first topic; and providing, by the one or more processors, an output comprising the subset of the plurality of sentences related to the first topic. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification