Systems and methods for client-based web crawling
First Claim
1. A data analysis system, comprising:
- a first component that facilitates generation of a first data set related to web page information obtained via a communication system; and
a second component that coordinates a data set relating to web page information from at least one distributed resource which interacts with the communication system;
the second data set is utilized to refine the first data set.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides systems and methods for obtaining information from a networked system utilizing a distributed web crawler. The distributed nature of clients of a server is leveraged to provide fast and accurate web crawling data. Information gathered by a server'"'"'s web crawler is compared to data retrieved by clients of the server to update the crawler'"'"'s data. In one instance of the present invention, data comparison is achieved by utilizing information disseminated via a search engine results page. In another instance of the present invention, data validation is accomplished by client dictionaries, emanating from a server, that summarize web crawler data. The present invention also facilitates data analysis by providing a means to resist spoofing of a web crawler to increase data accuracy.
275 Citations
116 Claims
-
1. A data analysis system, comprising:
-
a first component that facilitates generation of a first data set related to web page information obtained via a communication system; and
a second component that coordinates a data set relating to web page information from at least one distributed resource which interacts with the communication system;
the second data set is utilized to refine the first data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 114, 116)
-
-
37. A method for facilitating data analysis, comprising:
-
generating a first data set relating to a second data set obtained from web pages interactive with a communication system;
receiving a third data set from at least one distributed resource that is interactive with the communication system;
the third data set comprising web page related information generated by the distributed resource; and
refining the second data set to reflect information obtained from the third data set. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 115)
-
-
57. A data analysis system, comprising:
-
means for generating at least one first data set from a communication system;
means for receiving and coordinating at least one second data set from at least one distributed resource which interacts with the communication system; and
means for refining the first data set utilizing at least one second data set. - View Dependent Claims (58, 59, 60)
-
-
61. A data analysis system, comprising:
a first component that generates web page information from at least one visited web site for utilization in a distributed web crawling system;
the web page information transmitted by the first component to a second component via a communication system.- View Dependent Claims (62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91)
-
92. A method for facilitating data analysis, comprising:
-
compiling a first data set derived from accessing web pages via a communication system; and
transmitting, selectively, the first data set to an entity of a distributed crawling system that is interactive with the communication system. - View Dependent Claims (93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112)
-
-
113. A data packet transmitted between two or more computer components that facilitate information gathering, the data packet comprising, at least in part, information relating to web crawling that utilizes, at least in part, a distributed system for gathering information about web pages.
Specification