Automatic linear text segmentation
First Claim
Patent Images
1. A method for automatically organizing a document into conceptually cohesive segments, comprising:
- (a) subdividing the document into contiguous blocks of text;
(b) generating an abstract mathematical space based on the blocks of text, wherein each block of text has a representation in the abstract mathematical space;
(c) computing similarity scores for adjacent blocks of text based on the representations of the adjacent blocks of text; and
(d) aggregating similar adjacent blocks of text based on the similarity scores.
1 Assignment
0 Petitions
Accused Products
Abstract
An embodiment of the present invention provides a method for automatically subdividing a document into conceptually cohesive segments. The method includes the following steps: subdividing the document into contiguous blocks of text; generating an abstract mathematical space based on the blocks of text, wherein each block of text has a representation in the abstract mathematical space; computing similarity scores for adjacent blocks of text based on the similarity scores; and aggregating similar adjacent blocks of text based on the similarity scores.
-
Citations
24 Claims
-
1. A method for automatically organizing a document into conceptually cohesive segments, comprising:
-
(a) subdividing the document into contiguous blocks of text;
(b) generating an abstract mathematical space based on the blocks of text, wherein each block of text has a representation in the abstract mathematical space;
(c) computing similarity scores for adjacent blocks of text based on the representations of the adjacent blocks of text; and
(d) aggregating similar adjacent blocks of text based on the similarity scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer program product for automatically organizing a document into conceptually cohesive segments, comprising:
-
a computer usable medium having computer readable program code means embodied in said medium for causing an application program to execute on an operating system of a computer, said computer readable program code means comprising;
a computer readable first program code means for subdividing the document into contiguous blocks of text;
a computer readable second program code means for generating an abstract mathematical space based on the blocks of text, wherein each block of text has a representation in the abstract mathematical space;
a computer readable third program code means for computing similarity scores for adjacent blocks of text based on the representations of the adjacent blocks of text; and
a computer readable fourth program code means for aggregating similar adjacent blocks of text based on the similarity scores. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification