System and method for enhancing crawling by extracting requests for webpages in an information flow
First Claim
1. A method for searching and analysing the traffic content at access points in data networks, wherein the data networks are shared network resources on the Internet, wherein the method comprises:
- collecting information in the form of data extracted from the information flow at said access points in the data network,indexing said collected information,searching said indexed information,retrieving information based on the searching,storing collected information by caching in one or more cache means provided at one or more access points,processing the cached information repeatedly and generating an index thereof, andusing the traffic content observed between two indexing operations on the cached information for generating a temporal search index with fresh traffic content since a last indexing operation performed of said cached information, and performing searching by combining searching in both said temporal search index and a most recent search index generated by indexing said cached information.
11 Assignments
0 Petitions
Accused Products
Abstract
A method for providing searching and alerting capabilities in traffic content at access points in data networks is disclosed. Typical access points for Internet, intranet and wireless traffic are described. Traffic flow through an Internet Service Provider is used as a preferred embodiment to exemplify the data traffic used as the input source in the invention. The invention teaches how proper privacy and content filters can be applied to the traffic source. The filtered data stream from the traffic flow can be used to improve the quality of existing searching and alerting services. The invention also teaches how a cache can be developed optimized for holding fresh searchable information captured in the traffic flow. It is further disclosed how the said cache can be converted to a searchable index and either separately or in cooperation with external search indexes be used as a basis for improved search services. The invention also discloses how the traffic flow can be analyzed in order to derive added information for measuring document relevance, access similarity between documents, personalized ranking of search results, and regional differences in document accesses.
182 Citations
26 Claims
-
1. A method for searching and analysing the traffic content at access points in data networks, wherein the data networks are shared network resources on the Internet, wherein the method comprises:
-
collecting information in the form of data extracted from the information flow at said access points in the data network, indexing said collected information, searching said indexed information, retrieving information based on the searching, storing collected information by caching in one or more cache means provided at one or more access points, processing the cached information repeatedly and generating an index thereof, and using the traffic content observed between two indexing operations on the cached information for generating a temporal search index with fresh traffic content since a last indexing operation performed of said cached information, and performing searching by combining searching in both said temporal search index and a most recent search index generated by indexing said cached information. - View Dependent Claims (2, 3, 4)
-
-
5. A method for searching and analysing the traffic content at access points in data networks, wherein the data networks are shared network resources on the Internet, wherein the method comprises:
-
collecting information in the form of data extracted from the information flow at said access points in the data network, indexing said collected information, searching said indexed information, and retrieving information based on the searching, wherein the searching step is implemented by at least one collaborating search engine, the searching step including substeps for dispatching search requests to said at least one collaborating search engine, collecting search result from a local traffic index, collecting search results from said at least one collaborating search engine and combining said collected search results to provide a unified result to an initial search request. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A method for searching and analysing the traffic content at access points in data networks, wherein the data networks are shared network resources on the Internet, wherein the method comprises:
-
collecting information in the form of data extracted from the information flow at said access points in the data network, indexing said collected information, searching said indexed information, retrieving information based on the searching, and collecting document identifiers for requested documents, annotating said documents identifiers with spatial information about users submitting the requests, computing access statistics for at least one document including at least the number of document requests from a spatial region and the total number of requests from said spatial region, and determining which documents are most specific for a given spatial region by comparing the access statistics for said given spatial region with the corresponding access statistics for at least a second spatial region. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for searching and analysing the traffic content at access points in data networks, wherein the data networks are shared network resources on the Internet, wherein the method comprises:
-
collecting information in the form of data extracted from the information flow at said access points in the data network, indexing said collected information, searching said indexed information, and retrieving information based on the searching, collecting document identifiers for the requested documents, annotating document requests such that consecutive requests from the same user can be identified, and computing a document similarity between a document “
b” and
a reference document “
a”
by comparing the number of “
b”
requests in the proximity of “
a”
requests with an average frequency of “
b”
requests. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
Specification