Computer-implemented system and method for generating document groupings for display
First Claim
1. A computer-implemented system for generating document groupings, comprising:
- a database to store a set of document, a lexicon of terms extracted from the set of documents and comprising a frequency of each extracted term within each document, and concepts each comprising two or more of the extracted terms; and
a server comprising a central processing unit, memory, an input port to receive the documents, lexicon and concepts from the database, and an output port, wherein the central processing unit is configured to;
select a subset of the documents in the set based on the term frequencies;
group the subset of documents into clusters based on the concepts;
calculate a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and
update the weights until a rate of change for each cluster becomes constant.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for generating document groupings is provided. A lexicon of terms extracted from a set of documents is generated. The lexicon includes a frequency of each extracted term within each document in the set. Concepts each having two or more of the extracted terms are generated. A subset of the documents in the set is selected based on the term frequencies. The subset of documents is grouped into clusters based on the concepts. A similarity of each document cluster is calculated with one or more documents based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms. The weights are updated until a rate of change for each cluster becomes constant.
-
Citations
20 Claims
-
1. A computer-implemented system for generating document groupings, comprising:
-
a database to store a set of document, a lexicon of terms extracted from the set of documents and comprising a frequency of each extracted term within each document, and concepts each comprising two or more of the extracted terms; and a server comprising a central processing unit, memory, an input port to receive the documents, lexicon and concepts from the database, and an output port, wherein the central processing unit is configured to; select a subset of the documents in the set based on the term frequencies; group the subset of documents into clusters based on the concepts; calculate a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and update the weights until a rate of change for each cluster becomes constant. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for generating document groupings, comprising:
-
generating a lexicon of terms extracted from a set of documents and comprising a frequency of each extracted term within each document; generating concepts each comprising two or more of the extracted terms; selecting a subset of the documents in the set based on the term frequencies; grouping the subset of documents into clusters based on the concepts; calculating a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and updating the weights until a rate of change for each cluster becomes constant. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification