Decision-theoretic web-crawling and predicting web-page change
First Claim
Patent Images
1. A computer implemented system that facilitates web-crawling, comprising at least a processor, one or more memories with following components stored thereon:
- a managing component that performs a predictive analysis to predict when a web page will change, and determines when, and how to perform web-crawling;
a server computer component that implements a web-crawling component that crawls subsets of web pages as a function of the predictive analysis, discovers and updates the pages in a catalogue of possible search results; and
a decision-theoretic component that determines an appropriate time to crawl the at least one web page and makes predictions regarding changes in at least one web page based at least in part on;
a probability that a particular outcome will occur, Pr; and
a utility factor associated with each outcome, Utility(O);
an action, a, selected from a set of possible actions, A, to be performed on the at least one web page, which maximizes the value of;
where o is an outcome selected from a set of possible outcomes, O, wherein the outcome o maximizes the efficiency of the web-crawling component in discovering and updating changed web pages.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that facilitate predictive web-crawling in a computer environment. Aspects of the invention provide for predictive, utility-based, and decision theoretic probability assessments of changes in subsets of web pages, enhancing web-crawling ability and ensuring that web page information is maintained in a fresh state. Additionally, the invention facilitates selective crawling of pages with a high probability of change.
63 Citations
18 Claims
-
1. A computer implemented system that facilitates web-crawling, comprising at least a processor, one or more memories with following components stored thereon:
-
a managing component that performs a predictive analysis to predict when a web page will change, and determines when, and how to perform web-crawling; a server computer component that implements a web-crawling component that crawls subsets of web pages as a function of the predictive analysis, discovers and updates the pages in a catalogue of possible search results; and a decision-theoretic component that determines an appropriate time to crawl the at least one web page and makes predictions regarding changes in at least one web page based at least in part on; a probability that a particular outcome will occur, Pr; and a utility factor associated with each outcome, Utility(O); an action, a, selected from a set of possible actions, A, to be performed on the at least one web page, which maximizes the value of;
where o is an outcome selected from a set of possible outcomes, O, wherein the outcome o maximizes the efficiency of the web-crawling component in discovering and updating changed web pages.- View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer readable medium that has computer executable instructions stored thereon to:
-
predict when a web page will change in order to determine when, and how to perform web-crawling; crawl subsets of web pages based on the predicting when a web page will change, to catalogue possible web page search results; and determine an appropriate time to crawl the web page and make predictions regarding changes in at least one web page based at least in part on; a probability that a particular outcome will occur, Pr; and a utility factor associated with each outcome, Utility(O); an action, a, selected from a set of possible actions, A, to be performed on the at least one web page, which maximizes the value of;
where o is an outcome selected from a set of possible outcomes, O, wherein the outcome o maximizes the efficiency of crawling in discovering and updating changed web pages.- View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer readable medium having stored thereon components that facilitate web-crawling, the components comprising:
-
a managing component that performs a predictive analysis to predict when a web page will change, and determines when, and how to perform web-crawling; a server computer component that implements a web-crawling component that crawls subsets of web pages as a function of the predictive analysis, discovers and updates the pages in a catalogue of possible search results; and a decision-theoretic component that determines an appropriate time to crawl the at least one web page and makes predictions regarding changes in at least one web page based at least in part on; a probability that a particular outcome will occur, Pr; and a utility factor associated with each outcome, Utility(O); an action, a, selected from a set of possible actions, A, to be performed on the at least one web page, which maximizes the value of;
where o is an outcome selected from a set of possible outcomes, O, wherein the outcome o maximizes the efficiency of the web-crawling component in discovering and updating changed web pages.- View Dependent Claims (14, 15, 16, 17, 18)
-
Specification