Methods and apparatus for user-centered web crawling
First Claim
Patent Images
1. A computer-based method of performing document retrieval in accordance with an information network, the method comprising the steps of:
- obtaining a query comprising at least a user-defined predicate;
determining a group of one or more users for a set of one or more documents that satisfy the predicate, the user group comprising one or more users who have previously accessed at least one of the one or more documents in the set, wherein a determination of whether a user has previously accessed a document is obtained from a log that maintains data representing user document access behavior;
determining a topical inclination value for each user in the user group, the topical inclination value for each user being indicative of a level of interest the user has in the one or more documents in the set;
determining a topical affinity value for each document accessed by the user group based on the topical inclination value determined for each user, the topical affinity value for each document being indicative of the likelihood that each document satisfies the predicate based on the access behavior associated with the one or more users in the user group; and
outputting the one or more documents ranked in accordance with their respective topical affinity values.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for user-centered search and crawling on an information network such as the world wide web. The techniques identify the nature of the web pages which are most relevant to a given predicate. The behavior of users is used to identify and determine the web pages which are most relevant to a specific crawl. Thus, the techniques are implemented in a web crawling system which can obtain the web pages specific to a given topic by leveraging the nature of the interests of the users in different topics.
31 Citations
24 Claims
-
1. A computer-based method of performing document retrieval in accordance with an information network, the method comprising the steps of:
-
obtaining a query comprising at least a user-defined predicate;
determining a group of one or more users for a set of one or more documents that satisfy the predicate, the user group comprising one or more users who have previously accessed at least one of the one or more documents in the set, wherein a determination of whether a user has previously accessed a document is obtained from a log that maintains data representing user document access behavior;
determining a topical inclination value for each user in the user group, the topical inclination value for each user being indicative of a level of interest the user has in the one or more documents in the set;
determining a topical affinity value for each document accessed by the user group based on the topical inclination value determined for each user, the topical affinity value for each document being indicative of the likelihood that each document satisfies the predicate based on the access behavior associated with the one or more users in the user group; and
outputting the one or more documents ranked in accordance with their respective topical affinity values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. Apparatus for performing document retrieval in accordance with an information network, the apparatus comprising:
-
a memory; and
at least one processor coupled to the memory and operative to;
(i) obtain a query comprising at least a user-defined predicate;
(ii) determine a group of one or more users for a set of one or more documents that satisfy the predicate, the user group comprising one or more users who have previously accessed at least one of the one or more documents in the set, wherein a determination of whether a user has previously accessed a document is obtained from a log that maintains data representing user document access behavior;
(iii) determine a topical inclination value for each user in the user group, the topical inclination value for each user being indicative of a level of interest the user has in the one or more documents in the set;
(iv) determine a topical affinity value for each document accessed by the user group based on the topical inclination value determined for each user, the topical affinity value for each document being indicative of the likelihood that each document satisfies the predicate based on the access behavior associated with the one or more users in the user group; and
(v) output the one or more documents ranked in accordance with their respective topical affinity values. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An article of manufacture for performing document retrieval in accordance with an information network, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
obtaining a query comprising at least a user-defined predicate;
determining a group of one or more users for a set of one or more documents that satisfy the predicate, the user group comprising one or more users who have previously accessed at least one of the one or more documents in the set, wherein a determination of whether a user has previously accessed a document is obtained from a log that maintains data representing user document access behavior;
determining a topical inclination value for each user in the user group, the topical inclination value for each user being indicative of a level of interest the user has in the one or more documents in the set;
determining a topical affinity value for each document accessed by the user group based on the topical inclination value determined for each user, the topical affinity value for each document being indicative of the likelihood that each document satisfies the predicate based on the access behavior associated with the one or more users in the user group; and
outputting the one or more documents ranked in accordance with their respective topical affinity values. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification