System and method facilitating page indexing employing reference information
First Claim
1. Computer-readable instructions embodied on a computer-readable storage medium that when executed on one or more processors implement a page index system, the system comprising:
- a page data store that stores reference information associated with a page, the reference information is obtained from at least one other page and is accumulated incrementally from each other page as each other page is crawled, the reference information comprising descriptive information that is adjacent to anchor text associated with a referencing uniform resource locator that references the page; and
a crawler component that receives the page, retrieves the reference information associated with the page from the page data store, and provides the page and the reference information to at least an index building component;
wherein failure to receive a requested page after a first predetermined period of time causes the URL for the page to be removed from the page data store after a second predetermined period of time.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method facilitating page indexing employing reference information (e.g., anchor text) is provided. In accordance with an aspect of the present invention, a page index system having a page data store and a crawler component is provided. The page data store stores reference information associated with pages. The crawler component receives a page, retrieves reference information associated with the page from the page data store, and, provides the page and associated reference information, for example, to an index building system. The system can thus facilitate indexing of pages based, at least in part, upon reference information (e.g., anchor text) associated with the pages.
-
Citations
24 Claims
-
1. Computer-readable instructions embodied on a computer-readable storage medium that when executed on one or more processors implement a page index system, the system comprising:
-
a page data store that stores reference information associated with a page, the reference information is obtained from at least one other page and is accumulated incrementally from each other page as each other page is crawled, the reference information comprising descriptive information that is adjacent to anchor text associated with a referencing uniform resource locator that references the page; and a crawler component that receives the page, retrieves the reference information associated with the page from the page data store, and provides the page and the reference information to at least an index building component;
wherein failure to receive a requested page after a first predetermined period of time causes the URL for the page to be removed from the page data store after a second predetermined period of time. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A crawler embodied on a computer-readable storage medium comprising:
-
an input component that receives one or more pages; a parser component that parses the one or more pages for another page referenced on the one or more pages, and accumulatively stores reference information associated with the another page in a page data store, the reference information comprising descriptive information that is in proximity to anchor text associated with a referencing uniform resource locator that references the another page, wherein failure to receive the one or more page after a first predetermined period of time causes the URL for the page to be removed from the page data store after a second period of time; a retrieval component that receives the another page and retrieves the reference information associated with the another page from the page data store; and an output component that provides an output, comprising the another page merged with the reference information associated with the another page, to an index building system. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer implemented method facilitating page indexing comprising:
-
retrieving reference information associated with a page from at least one other page, the reference information comprising descriptive information that is in proximity to anchor text associated with a referencing uniform resource locator that references the page; storing the reference information associated with the page in a data store; incrementally accumulating the reference information from each other page as each other page is crawled; merging the page with the reference information; providing an output comprising the page merged with the reference information associated with the page to at least an index building system; and deleting the information for a page from the data store when the page cannot be retrieved for a predetermined period of time. - View Dependent Claims (14, 15, 16)
-
-
17. One or more computer readable storage media storing computer executable components of a crawler comprising:
-
an input component that receives one or more pages; a parser component that parses the one or more pages for another page referenced on the one or more pages, incrementally accumulates reference information associated with the another page from each of the one or more pages when crawled, and stores such reference information in a page data store, the reference information comprising descriptive information that is in proximity to anchor text associated with a referencing uniform resource locator that references the another page, wherein failure to receive the one or more pages after a first predetermined period of time causes the URL for the one or more pages to be removed from the page data store after a second period of time; a retrieval component that receives the another page and retrieves the reference information associated with the another page from the page data store; and an output component that provides an output, comprising the another page merged with the reference information associated with the another page, to at least an index building system. - View Dependent Claims (18, 19, 20)
-
-
21. A page index system embodied on a computer-readable storage medium, comprising:
-
means for retrieving reference information associated with a page from at least one other page; means for incrementally accumulating the reference information from each other page as each other page is crawled; means for storing the reference information in a data store, the reference information comprising descriptive information that is adjacent to anchor text associated with a uniform referencing locator that references the page; means for receiving the page; means for retrieving the reference information associated with the page from means for storing the reference information; means for providing an output to at least an index building system, the output comprising the page merged with the reference information associated with the page; and means for removing the page from the data store when the page cannot be received after a predetermined period of time. - View Dependent Claims (22, 23, 24)
-
Specification