System and method for performing efficient document scoring and clustering
First Claim
1. A system for providing efficient document scoring of concepts within and clustering of documents in an electronically-stored document set, comprising:
- a database electronically storing a document set;
a scoring module scoring a document in the electronically-stored document set, comprising;
a frequency submodule determining a frequency of occurrence of at least one concept within a document;
a concept weight submodule analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document, wherein the concept weight is based on a number of terms for the at least one concept;
a structural weight submodule analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept;
a corpus weight submodule analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document;
a scoring evaluation submodule evaluating a score to be associated with the at least one concept as a function of a summation of the frequency, concept weight, structural weight, and corpus weight in accordance with the formula;
13 Assignments
0 Petitions
Accused Products
Abstract
A system and method for providing efficient document scoring of concepts within a document set is described. A frequency of occurrence of at least one concept within a document retrieved from the document set is determined. A concept weight is analyzed reflecting a specificity of meaning for the at least one concept within the document. A structural weight is analyzed reflecting a degree of significance based on structural location within the document for the at least one concept. A corpus weight is analyzed inversely weighing a reference count of occurrences for the at least one concept within the document. A score associated with the at least one concept is evaluated as a function of the frequency, concept weight, structural weight, and corpus weight.
91 Citations
22 Claims
-
1. A system for providing efficient document scoring of concepts within and clustering of documents in an electronically-stored document set, comprising:
-
a database electronically storing a document set; a scoring module scoring a document in the electronically-stored document set, comprising; a frequency submodule determining a frequency of occurrence of at least one concept within a document; a concept weight submodule analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document, wherein the concept weight is based on a number of terms for the at least one concept; a structural weight submodule analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept; a corpus weight submodule analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document; a scoring evaluation submodule evaluating a score to be associated with the at least one concept as a function of a summation of the frequency, concept weight, structural weight, and corpus weight in accordance with the formula; - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for providing efficient document scoring of concepts within and clustering of documents in an electronically-stored document set, comprising:
scoring a document in an electronically-stored document set, comprising; determining a frequency of occurrence of at least one concept within a document; analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document, wherein the concept weight is based on a number of terms for the at least one concept; analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept; analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document; and evaluating a score to be associated with the at least one concept as a function of a summation of the frequency, concept weight, structural weight, and corpus weight and in accordance with the formula; - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
21. A computer-readable storage medium holding code for providing efficient document scoring of concepts within and clustering of documents in an electronically-stored document set, comprising:
code for scoring a document in an electronically-stored document set, comprising; code for determining a frequency of occurrence of at least one concept within a document; code for analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document, wherein the concept weight is based on a number of terms for the at least one concept; code for analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept; code for analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document; and code for evaluating a score to be associated with the at least one concept as a function of a summation of the frequency, concept weight, structural weight, and corpus weight in accordance with the formula;
-
22. An apparatus for providing efficient document scoring of concepts within and clustering of documents in an electronically-stored document set, comprising:
means for scoring a document in an electronically-stored document set, comprising; means for determining a frequency of occurrence of at least one concept within a document; means for analyzing a concept weight reflecting a specificity of meaning for the at least one concept within the document, wherein the concept weight is based on a number of terms for the at least one concept; means for analyzing a structural weight reflecting a degree of significance based on structural location within the document for the at least one concept; means for analyzing a corpus weight inversely weighing a reference count of occurrences for the at least one concept within the document; and means for evaluating a score to be associated with the at least one concept as a function of a summation of the frequency, concept weight, structural weight, and corpus weight in accordance with the formula;
Specification