System and method for generating a taxonomy from a plurality of documents
First Claim
Patent Images
1. A computer-implemented method comprising:
- inputting text;
extracting phrases from the text;
constructing clusters of the phrases defining connections between the phrases;
selecting leader phrases from the clusters that each have a pre-determined number of connections to other ones of the phrases; and
defining a taxonomy that classifies categories of information within the text, based on the leader phrases.
15 Assignments
0 Petitions
Accused Products
Abstract
A system and method for generating a taxonomy is provided in which the taxonomy is generated based on clusters of phrases and a topical library. The taxonomy permits a user of a text processing system to rapidly search through a database and find relevant documents since the classifications in the taxonomy are narrow enough to limit the number of documents classified in each of the classifications.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
inputting text; extracting phrases from the text; constructing clusters of the phrases defining connections between the phrases; selecting leader phrases from the clusters that each have a pre-determined number of connections to other ones of the phrases; and defining a taxonomy that classifies categories of information within the text, based on the leader phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a computer-based text processing system to generate a multi-level taxonomy that classifies content using linked phrases that have been extracted from the content, wherein a first level of the multi-level taxonomy includes a leader phrase selected from the phrases based on a number of links associated with the leader phrase; and a user interface operable to output the multi-level taxonomy, such that a user proceeds from the first level to a second level of the multi-level taxonomy, wherein the second level includes connected phrases that are linked to the leader phrase. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. An apparatus comprising a computer-readable storage medium having instructions stored thereon, wherein a processor performs operations upon reading the instructions, the instructions including:
-
a first code segment for inputting documents, the documents including text; a second code segment for extracting phrases from the text; a third code segment for establishing connections between the phrases; a fourth code segment for defining a leader phrase from among the phrases, based on a number of the connections associated with the leader phrase; and a fifth code segment for defining a taxonomy in which the leader phrase is included at a first level. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification