System and method for grouping similar documents
First Claim
Patent Images
1. A system for grouping similar documents, comprising:
- a frequency determination module to determine frequencies of occurrences for terms and noun phrases within a set of documents;
a threshold module to select a subset of the documents by removing those documents having terms and noun phrases that fall outside a bounded range of upper and lower conditions for frequency of occurrence;
a mapping module to map each of the documents in the subset to a cluster of documents based on a similarity of the documents in the subset to the cluster documents; and
a processor to execute the modules.
8 Assignments
0 Petitions
Accused Products
Abstract
A system and method for grouping similar documents is provided. Frequencies of occurrences are determined for terms and noun phrases within a set of documents. A subset of the documents is selected by removing those documents having terms and noun phrases that fall outside a bounded range of upper and lower conditions for frequency of occurrence. Each of the documents in the subset is mapped to a cluster of documents based on a similarity of the documents to the cluster documents.
-
Citations
20 Claims
-
1. A system for grouping similar documents, comprising:
-
a frequency determination module to determine frequencies of occurrences for terms and noun phrases within a set of documents; a threshold module to select a subset of the documents by removing those documents having terms and noun phrases that fall outside a bounded range of upper and lower conditions for frequency of occurrence; a mapping module to map each of the documents in the subset to a cluster of documents based on a similarity of the documents in the subset to the cluster documents; and a processor to execute the modules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for grouping similar documents, comprising:
-
determining frequencies of occurrences for terms and noun phrases within a set of documents; selecting a subset of the documents by removing those documents having terms and noun phrases that fall outside a bounded range of upper and lower conditions for frequency of occurrence; mapping each of the documents in the subset to a cluster of documents based on a similarity of the documents in the subset to the cluster documents. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification