Method of extracting important terms, phrases, and sentences
First Claim
Patent Images
1. A method of extracting an important item from an input document including at least one document segment, comprising the steps of:
- (a) generating, for the respective at least one document segment, document segment vectors having values relating to occurrence frequencies of terms occurring in the respective at least one document segment as component values;
(b) generating a square sum matrix from the document segment vectors;
(c) calculating eigenvectors and eigenvalues of the square sum matrix; and
(d) selecting the important item of the input document from the calculated eigenvectors and eigenvalues of the square sum matrix.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer extracts important terms, phrases or sentences from a document that it segments. The computer generates a square sum matrix from the document segments. The computer determines the importance of a given term, phrase or sentence on the basis of eigenvectors and eigenvalues of the matrix. The computer thereby selects the important terms, phrases or sentences related to the central concepts of the document.
16 Citations
29 Claims
-
1. A method of extracting an important item from an input document including at least one document segment, comprising the steps of:
-
(a) generating, for the respective at least one document segment, document segment vectors having values relating to occurrence frequencies of terms occurring in the respective at least one document segment as component values;
(b) generating a square sum matrix from the document segment vectors;
(c) calculating eigenvectors and eigenvalues of the square sum matrix; and
(d) selecting the important item of the input document from the calculated eigenvectors and eigenvalues of the square sum matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of extracting an important item from an input document including at least one document segment, comprising the steps of:
-
(a) generating document segment vectors having, as components, values relating to occurrence frequencies of terms occurring in the document segments;
(b) generating, with respect to an item in the input document, an item vector in which occurrence numbers of terms contained in the item are assigned to components corresponding to the terms in the item, and the other components are assigned a zero value;
(c) obtaining a measure of the importance of the item by using a sum of squared inner products between the item vector and all the document segment vectors; and
(d) selecting the important item of the input document by using the importance measure. - View Dependent Claims (14, 15, 16)
-
-
17. Apparatus for extracting an important item from an input document including at least one document segment, the apparatus comprising:
-
(a) a data processor arrangement, (b) an input device for supplying the document to the data processor arrangement, (c) the data processor arrangement being arranged to be responsive to the document supplied to it by the input device for;
(i) generating, for the respective at least one document segment, document segment vectors having values relating to occurrence frequencies of terms occurring in the respective at least one document segment as component values;
(ii) generating a square sum matrix from the document segment vectors;
(iii) calculating eigenvectors and eigenvalues of the square sum matrix; and
(iv) selecting the important item of the input document from the calculated eigenvectors and eigenvalues of the square sum matrix. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. Apparatus for extracting an important item from an input document including at least one document segment, the apparatus comprising:
-
(a) a data processor arrangement, (b) an input device for supplying the document to the data processor arrangement, (c) the data processor arrangement arranged to be responsive to the document supplied to it by the input device for;
(i) generating document segment vectors having, as components, values relating to occurrence frequencies of terms occurring in the document segments;
(ii) generating, with respect to an item in the input document, an item vector in which occurrence numbers of terms contained in the item are assigned to components corresponding to the terms in the item, and the other components are assigned a zero value;
(iii) obtaining a measure of the importance of the item by using a sum of squared inner products between the item vector and all the document segment vectors; and
(iv) selecting the important item of the input document by using the importance measure.
-
Specification