Detection of junk in search result ranking
First Claim
1. A computer-implemented method for ranking candidate documents in response to a search query, comprising steps of:
- creating, by at least a first processor, an index of a plurality of documents in a corpus;
calculating a junk score for at least a first document in the corpus, wherein calculating the junk score comprises;
using a first candidate histogram for the first document in the corpus, wherein the first candidate histogram is specific to the first document; and
using a junk profile, wherein the junk profile comprises;
a first reference histogram for a first known junk document, wherein the first reference histogram is specific to the first known junk document and is based on a first junk variable; and
comparing the first candidate histogram to the first reference histogram;
receiving a search query;
identifying, based on the search query and the index, candidate documents from the plurality of documents in the corpus, wherein the candidate documents include at least the first document;
ranking the candidate documents.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are directed to ranking search results using a junk profile. For a given corpus of documents, one or more junk profiles may be created and maintained. The junk profile provides reference metrics to represent known junk documents. For example, a junk profile may comprise a dictionary of document data that is automatically inserted into documents created using a particular system or template. A junk profile may also comprise one or more representations (e.g., histograms) of a distribution of a particular junk variable for known junk documents. The junk profile provides a usable representation of known junk documents, and the present systems and methods employ the junk profile to predict the likelihood that documents in the corpus are junk. In embodiments, junk scores are calculated and used to rank such documents higher or lower in response to a search query.
360 Citations
20 Claims
-
1. A computer-implemented method for ranking candidate documents in response to a search query, comprising steps of:
-
creating, by at least a first processor, an index of a plurality of documents in a corpus; calculating a junk score for at least a first document in the corpus, wherein calculating the junk score comprises; using a first candidate histogram for the first document in the corpus, wherein the first candidate histogram is specific to the first document; and using a junk profile, wherein the junk profile comprises; a first reference histogram for a first known junk document, wherein the first reference histogram is specific to the first known junk document and is based on a first junk variable; and comparing the first candidate histogram to the first reference histogram; receiving a search query; identifying, based on the search query and the index, candidate documents from the plurality of documents in the corpus, wherein the candidate documents include at least the first document; ranking the candidate documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for ranking candidate documents in response to a search query, comprising:
-
at least one processor; a memory, operatively connected to the at least one processor and containing instructions that, when executed by the at least one processor, perform a method comprising; creating an index of a plurality of documents in a corpus; calculating a junk score for at least a first document in the corpus, wherein calculating the junk score comprises; using a first candidate histogram for the first document in the corpus, wherein the first candidate histogram is specific to the first document; and using a junk profile, wherein the junk profile comprises; a first reference histogram for a first known junk document, wherein the first reference histogram is specific to the first known junk document and is based on a first junk variable; and comparing the first candidate histogram to the first reference histogram; receiving a search query; identifying, based on the search query and the index, candidate documents from the plurality of documents in the corpus, wherein the candidate documents include at least the first document; ranking the candidate documents based at least in part on the junk score for the first document; wherein creating the index comprises separately delineating document data from the plurality of documents if the document data matches the junk profile. - View Dependent Claims (19)
-
-
20. A computer storage medium including computer-executable instructions that, when executed by at least one processor, perform a method comprising:
-
creating an index of a plurality of documents in a corpus; creating, for at least a first document of the plurality of documents, a candidate histogram specific to the first document for at least a first junk variable; calculating a junk score for at least the first document using a junk profile, wherein; the junk profile comprises; a first reference histogram for a first known junk document, wherein the first reference histogram is specific to at least the first known junk document and is based on the first junk variable, and a dictionary of automatically generated data; and calculating a junk score comprises at least (a) comparing the candidate histogram to the first reference histogram to determine a first similarity metric and (b) determining a second similarity metric between document data in the first document and the dictionary of automatically generated data; receiving a search query; identifying, based on the search query and the index, candidate documents from the plurality of documents in the corpus, wherein the candidate documents include at least the first document; ranking the candidate documents based at least in part on the junk score for the first document.
-
Specification