System and method for performing efficient document scoring and clustering
First Claim
1. A system for grouping clusters of semantically scored documents, comprising:
- a scoring module determining a score assigned to at least one concept extracted from a plurality of documents based on at least one of a frequency of occurrence of the at least one concept within at least one such document, a concept weight, a structural weight, and a corpus weight; and
a clustering module forming clusters of the documents by applying the score for the at least one concept to a best fit criterion for each such document.
13 Assignments
0 Petitions
Accused Products
Abstract
A system and method for providing efficient document scoring of concepts within a document set is described. A frequency of occurrence of at least one concept within a document retrieved from the document set is determined. A concept weight is analyzed reflecting a specificity of meaning for the at least one concept within the document. A structural weight is analyzed reflecting a degree of significance based on structural location within the document for the at least one concept. A corpus weight is analyzed inversely weighing a reference count of occurrences for the at least one concept within the document. A score associated with the at least one concept is evaluated as a function of the frequency, concept weight, structural weight, and corpus weight.
235 Citations
53 Claims
-
1. A system for grouping clusters of semantically scored documents, comprising:
-
a scoring module determining a score assigned to at least one concept extracted from a plurality of documents based on at least one of a frequency of occurrence of the at least one concept within at least one such document, a concept weight, a structural weight, and a corpus weight; and
a clustering module forming clusters of the documents by applying the score for the at least one concept to a best fit criterion for each such document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for grouping clusters of semantically scored documents, comprising:
-
determining a score assigned to at least one concept extracted from a plurality of documents based on at least one of a frequency of occurrence of the at least one concept within at least one such document, a concept weight, a structural weight, and a corpus weight; and
forming clusters of the documents by applying the score for the at least one concept to a best fit criterion for each such document. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for providing efficient document scoring of concepts within a document set, comprising:
-
a frequency module determining a frequency of occurrence of at least one concept within a document retrieved from the document set; and
a concept weight module analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document;
a structural weight module analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept;
a corpus weight module analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document; and
a scoring module evaluating a score associated with the at least one concept as a function of the frequency, concept weight, structural weight, and corpus weight. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A method for providing efficient document scoring of concepts within a document set, comprising:
-
determining a frequency of occurrence of at least one concept within a document retrieved from the document set; and
analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document;
analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept;
analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document; and
evaluating a score associated with the at least one concept as a function of the frequency, concept weight, structural weight, and corpus weight. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
-
53. An apparatus for providing efficient document scoring of concepts within a document set, comprising:
-
means for determining a frequency of occurrence of at least one concept within a document retrieved from the document set; and
means for analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document;
means for analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept;
means for analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document; and
means for evaluating a score associated with the at least one concept as a function of the frequency, concept weight, structural weight, and corpus weight.
-
Specification