Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections
DCFirst Claim
1. A method for determining an authoritativeness of a document having a plurality of document content features, the method comprising:
- determining a set of document content feature values of a document based on textual contents in the document, the document providing information regarding a subject;
determining an authoritativeness for the document based on the determined set of document content feature values using a trained document textual authority model, wherein determining the authoritativeness comprises determining a reliability of the document, the reliability indicative of whether the information, as provided in the document, is reliable regarding the subject; and
outputting the determined authoritativeness in association with the document.
11 Assignments
Litigations
0 Petitions
Accused Products
Abstract
Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document'"'"'s textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document'"'"'s authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists.
-
Citations
25 Claims
-
1. A method for determining an authoritativeness of a document having a plurality of document content features, the method comprising:
-
determining a set of document content feature values of a document based on textual contents in the document, the document providing information regarding a subject; determining an authoritativeness for the document based on the determined set of document content feature values using a trained document textual authority model, wherein determining the authoritativeness comprises determining a reliability of the document, the reliability indicative of whether the information, as provided in the document, is reliable regarding the subject; and outputting the determined authoritativeness in association with the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A machine-readable medium that provides instructions for determining the authority of a document having a plurality of document content features, instructions, which when executed by a processor, cause the processor to perform operations comprising:
-
determining a set of document content feature values of a document based on textual contents in the document, the document providing information regarding a subject; and determining at least one of textual authoritativeness value or textual authority class for the document based on the determined set of document content feature values using a trained document textual authority model, wherein determining the authoritativeness comprises determining a reliability of the document, the reliability indicative of whether the information, as provided in the document, is reliable regarding the subject; and outputting the determined authoritativeness in association with the document. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A textual authority determining system that determines an authority of a document having a plurality of document content features, comprising:
-
a memory; and a document textual authoritativeness value determination circuit or routine that; determines at least a textual authoritativeness value for the document based on textual contents in the document by processing a set of document content feature values determined for a subset of document content features extracted from the plurality of document content features using one or more of metric regression or boosted decision tree algorithms or methods, the document providing information regarding a subject, wherein determining the authoritativeness comprises determining a reliability of the document, the authoritativeness indicative of whether the information, as provided in the document, is reliable regarding the subject; and outputs the determined authoritativeness in association with the document. - View Dependent Claims (22, 23, 24, 25)
-
Specification