Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections
First Claim
1. A method for creating a document textual authority model used to determine an authority of a document having a plurality of document content features, the method comprising:
- determining, for each document in a set of documents, a set of document classification attributes;
applying a document attribute evaluation framework to each document in the set of documents to determine a textual authoritativeness value or a textual authority class for the document;
selecting a subset of document content features from the plurality of document content features; and
encoding the subset of document content features into a feature vector x; and
determining a predictive model used to assign the feature vector x to an authority rank or class.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document'"'"'s textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document'"'"'s authoritativeness, and to improve the aggregation of ran-ordered lists with numerically-ordered lists.
42 Citations
8 Claims
-
1. A method for creating a document textual authority model used to determine an authority of a document having a plurality of document content features, the method comprising:
-
determining, for each document in a set of documents, a set of document classification attributes;
applying a document attribute evaluation framework to each document in the set of documents to determine a textual authoritativeness value or a textual authority class for the document;
selecting a subset of document content features from the plurality of document content features; and
encoding the subset of document content features into a feature vector x; and
determining a predictive model used to assign the feature vector x to an authority rank or class. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification