Method and system for determining relevance of terms in text documents
First Claim
1. A computer implemented method comprising:
- receiving a list comprising an entity, the entity having been identified as being associated with an electronic document;
based solely upon a set of characteristics of the document, determining a relevancy score associated with the entity with respect to the document; and
storing the relevancy score.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a corpus-independent method for determining relevancy of terms to content of text appearing in a document by analyzing the document itself. Conventional information extraction, or other methods, may be applied to a document to generate a list of terms. The invention analyzes the document using relevancy scoring algorithms to determine a term relevancy score representing the term'"'"'s relevance to the text contained in the document. The scores, including an aggregate score, may be normalized in the process. Based on relevancy scoring, terms are then ranked and further processed. In this manner relevancy is determined based on the subject document itself and by analyzing the occurrences and locations of the terms within the document. Additional techniques may be applied to relate the relevancy scores generated by the present invention to a corpus or collection of documents.
-
Citations
53 Claims
-
1. A computer implemented method comprising:
-
receiving a list comprising an entity, the entity having been identified as being associated with an electronic document; based solely upon a set of characteristics of the document, determining a relevancy score associated with the entity with respect to the document; and storing the relevancy score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented method comprising:
-
receiving terms extracted from a first electronic document; scoring the extracted terms using two or more term relevancy algorithms based solely upon the first electronic document; aggregating for each of the extracted terms the relevancy scores generated by the two or more term relevancy algorithms to produce a term aggregate relevancy score for each of the extracted terms; and ranking each of the extracted terms based on the term aggregate relevancy score assigned to the extracted terms to determine a relevance ranking of the extracted terms to the first electronic document. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. An article comprising a machine-readable medium, the medium having stored thereon instructions to be executed by a machine to perform operations, the article comprising instructions for:
-
receiving a list comprising a term, the term having been identified as being associated with an electronic document; based solely upon a set of characteristics of the document, determining and assigning a relevancy score to the term with respect to the document; and storing the relevancy score. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A computer-based system comprising memory and a processor for executing instructions to perform operations, the system comprising:
-
input adapted to receive a list comprising a term, the term having been identified as being associated with an electronic document; relevancy scoring module adapted to determine a relevancy score associated with the term with respect to the document, the relevancy score being based solely upon a set of characteristics of the document; and memory for storing the relevancy score. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53)
-
Specification