Ranking documents based on a series of document graphs
First Claim
1. A method in a computing device that is programmed for ranking documents with links between the documents, the method comprising:
- providing a first document graph representing links between the documents at a first time and a second document graph representing links between the documents at a second time that is later than the first time;
determining by the computing device a first ranking of the documents based on the first document graph by;
initializing a first jumping vector indicating probabilities of visiting the documents without using a link, the initializing including setting the probability of visiting a suspected spam document without using a link to zero; and
generating a first transition probability matrix indicating probabilities of visiting the documents using links; and
applying a page ranking algorithm to the first document graph to generate a first ranking of the documents based on the first transition probability matrix and the first jumping vector wherein the ranking of the suspected spam document is lowered as a result of setting the probability of visiting the suspect spam document without using a link to zero; and
determining by the computing device a second ranking of the documents based on the second document graph and the first ranking of the documents based on the first document graph by;
initializing a second jumping vector indicating probabilities of visiting the documents without using a link, the second jumping vector being initialized based on the first ranking of documents such that a higher first ranking increases the probability of visiting a document without a link; and
generating a second transition probability matrix indicating probabilities of visiting the documents using links; and
applying a page ranking algorithm to the second document graph to generate a second ranking of the documents based on the second transition probability matrix and the second jumping vector.
2 Assignments
0 Petitions
Accused Products
Abstract
Ranking documents based on a series of web graphs collected over time is provided. A ranking system provides multiple transition probability distributions representing different snapshots or times. Each transition probability distribution represents a probability of transitioning from one document to another document within a collection of documents using a link of the document. The ranking system determines a stationary probability distribution for each snapshot based on the transition probability distributions for that snapshot and the stationary probability distribution of the previous snapshot. The stationary probability distributions represent a ranking of the documents over time.
-
Citations
11 Claims
-
1. A method in a computing device that is programmed for ranking documents with links between the documents, the method comprising:
-
providing a first document graph representing links between the documents at a first time and a second document graph representing links between the documents at a second time that is later than the first time; determining by the computing device a first ranking of the documents based on the first document graph by; initializing a first jumping vector indicating probabilities of visiting the documents without using a link, the initializing including setting the probability of visiting a suspected spam document without using a link to zero; and generating a first transition probability matrix indicating probabilities of visiting the documents using links; and applying a page ranking algorithm to the first document graph to generate a first ranking of the documents based on the first transition probability matrix and the first jumping vector wherein the ranking of the suspected spam document is lowered as a result of setting the probability of visiting the suspect spam document without using a link to zero; and determining by the computing device a second ranking of the documents based on the second document graph and the first ranking of the documents based on the first document graph by; initializing a second jumping vector indicating probabilities of visiting the documents without using a link, the second jumping vector being initialized based on the first ranking of documents such that a higher first ranking increases the probability of visiting a document without a link; and generating a second transition probability matrix indicating probabilities of visiting the documents using links; and applying a page ranking algorithm to the second document graph to generate a second ranking of the documents based on the second transition probability matrix and the second jumping vector. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-readable storage medium containing instructions for controlling a computing device to rank web pages, by a method comprising:
-
providing a first web graph representing links between the web pages at a first time and a second web graph representing links between the web pages at a second time that is later than the first time; determining a first ranking of the web pages based on the first web graph by; initializing a first jumping vector to indicate probabilities of visiting the web pages without using a link, the initializing including setting the probability of visiting a suspected spam web page without using a link based on confidence that the suspected spam web page is actually spam so that the set probability decreases as the confidence increases; and generating a first transition probability matrix indicating probabilities of visiting the web pages using links; and applying a page ranking algorithm to the first web graph to generate a first ranking of the web pages based on the first transition probability matrix and the first jumping vector wherein the ranking of a suspected spam web page is lowered as a result of setting the probability of visiting the suspect spam web page without using a link based on confidence that the suspected web page is actually spam; and determining by the computing device a second ranking of the web pages based on the second web graph and the first ranking of the web pages based on the first web graph by; initializing a second jumping vector indicating probabilities of visiting the web pages without using a link, the second jumping vector being initialized based on the first ranking of web pages such that a higher first ranking increases the probability of visiting a web page without a link; and generating a second transition probability matrix indicating probabilities of visiting the web pages using links; and applying a page ranking algorithm to the second web graph to generate a second ranking of the web pages based on the second transition probability matrix and the second jumping vector. - View Dependent Claims (7, 8)
-
-
9. A computing device for ranking web pages, the computing device comprising:
-
a memory storing computer-executable instructions of; a component that provides access to a first web graph representing links between the web pages at a first time and a second web graph representing links between the web pages at a second time that is later than the first time; a component that determines a first ranking of the web pages based on the first web graph by; initializing a first jumping vector to probabilities of visiting the web pages without using a link, the initializing including setting the probability of visiting a suspected spam web page without using a link based on confidence that the suspected spam web page is actually spam so that the set probability decreases as the confidence increases; and generating a first transition probability matrix indicating probabilities of visiting the web pages using links; and applying a page ranking algorithm to the first web graph to generate a first ranking of the web pages based on the first transition probability matrix and the first jumping vector wherein the ranking of a suspected spam web page is lowered as a result of setting the probability of visiting the suspect spam web page without using a link based on confidence that the suspected web page is actually spam; and a component that determines a second ranking of the web pages based on the second web graph and the first ranking of the web pages based on the first web graph by; initializing a second jumping vector indicating probabilities of visiting the web pages without using a link, the second jumping vector being initialized based on the first ranking of web pages such that a higher first ranking increases the probability of visiting a web page without a link; and generating a second transition probability matrix indicating probabilities of visiting the web pages using links; and applying a page ranking algorithm to the second web graph to generate a second ranking of the web pages based on the second transition probability matrix and the second jumping vector; and a processor that executes the computer-executable instructions stored in the memory. - View Dependent Claims (10, 11)
-
Specification