Computer-Implemented System And Method For Generating Document Groupings For Display
First Claim
1. A computer-implemented system for generating document groupings, comprising:
- a lexicon of terms extracted from a set of documents and comprising a frequency of each extracted term within each document;
concepts each comprising two or more of the extracted terms;
a selection module to select a subset of the documents in the set based on the term frequencies;
a grouping module to group the subset of documents into clusters based on the concepts;
a similarity module to calculate a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and
an update module to update the weights until a rate of change for each cluster becomes constant.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for generating document groupings is provided. A lexicon of terms extracted from a set of documents is generated. The lexicon includes a frequency of each extracted term within each document in the set. Concepts each having two or more of the extracted terms are generated. A subset of the documents in the set is selected based on the term frequencies. The subset of documents is grouped into clusters based on the concepts. A similarity of each document cluster is calculated with one or more documents based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms. The weights are updated until a rate of change for each cluster becomes constant.
33 Citations
20 Claims
-
1. A computer-implemented system for generating document groupings, comprising:
-
a lexicon of terms extracted from a set of documents and comprising a frequency of each extracted term within each document; concepts each comprising two or more of the extracted terms; a selection module to select a subset of the documents in the set based on the term frequencies; a grouping module to group the subset of documents into clusters based on the concepts; a similarity module to calculate a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and an update module to update the weights until a rate of change for each cluster becomes constant. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for generating document groupings, comprising:
-
generating a lexicon of terms extracted from a set of documents and comprising a frequency of each extracted term within each document; generating concepts each comprising two or more of the extracted terms; selecting a subset of the documents in the set based on the term frequencies; grouping the subset of documents into clusters based on the concepts; calculating a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and updating the weights until a rate of change for each cluster becomes constant. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification