Method and system for indentifying significant topics of a document
First Claim
Patent Images
1. A method for identifying significant topics in a document comprising the steps of:
- extracting from said document a complete list of simplex noun phrases and corresponding heads that represent candidate significant topics of said document;
clustering said simplex noun phrases into groups by said heads; and
ranking said clustered simplex noun phrases by said heads in accordance with a significance measure to identify said significant topics of said document from said candidate significant topics.
2 Assignments
0 Petitions
Accused Products
Abstract
A "domain-general" method for representing the "sense" of a document includes the steps of extracting a list of simplex noun phrases representing candidate significant topics in the document, clustering the simplex noun phrases by head, and ranking the simplex noun phrases according to a significance measure to indicate the relative importance of the simplex noun phrases as significant topics of the document. Furthermore, the output can be filtered in a variety of ways, both for automatic processing and for presentation to users.
-
Citations
22 Claims
-
1. A method for identifying significant topics in a document comprising the steps of:
-
extracting from said document a complete list of simplex noun phrases and corresponding heads that represent candidate significant topics of said document; clustering said simplex noun phrases into groups by said heads; and ranking said clustered simplex noun phrases by said heads in accordance with a significance measure to identify said significant topics of said document from said candidate significant topics. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
2. A method for identifying significant topics in a document comprising the steps of:
-
extracting from said document a complete list of simplex noun phrases and corresponding heads that represent candidate significant topics of said document, said simplex noun phrases being described by a user-specified pattern having a determiner preceding an adjective preceding a noun; clustering said simplex noun phrases into groups by said heads; and ranking said clustered simplex noun phrases by said heads in accordance with a significance measure to identify said significant topics of said document from said candidate significant topics.
-
-
13. A system for identifying significant topics in a document comprising:
-
a general purpose computer having a computer usable media; computer readable program code means embodied in said computer usable media, said program code means comprising; means for extracting from said document a complete list of simplex noun phrases and corresponding heads that represent candidate significant topics of said document; means for clustering said simplex noun phrases into groups by said heads; and means for ranking said clustered simplex noun phrases by said heads in accordance with a significance measure to identify significant topics of said document from said candidate significant topics. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification