PSEUDO-ANCHOR TEXT EXTRACTION
First Claim
1. A computer-implemented method comprising:
- identifying an object in content;
identifying information associated with the object, and extracting the information associated with the object;
constructing a pseudo uniform resource locator (pseudo-URL) of the object based at least on the information extracted;
associating the pseudo-URL with the object; and
providing the information extracted for search.
2 Assignments
0 Petitions
Accused Products
Abstract
A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help rank the objects in a search result to improve search performance. The method may be used in vertical search of objects such as published articles, products and images that lack explicit URLs and anchor text information.
9 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
identifying an object in content; identifying information associated with the object, and extracting the information associated with the object; constructing a pseudo uniform resource locator (pseudo-URL) of the object based at least on the information extracted; associating the pseudo-URL with the object; and providing the information extracted for search. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; a memory coupled to the processor, the memory storing a plurality of modules, at least one of the plurality of modules upon execution by the processor performing pseudo-anchor text extraction comprising extracting pseudo-anchor text from at least one candidate anchor block. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable media having computer-readable instructions encoded thereon, the computer-readable instructions upon execution configuring a search engine to perform operations comprising:
-
identifying a plurality of objects in content; identifying at least one candidate anchor block associated with each of the plurality of objects, and extracting the candidate anchor block; in an event more than one candidate anchor block is extracted; aggregating candidate anchor blocks that are associated with a common object; extracting pseudo-anchor text from the candidate anchor blocks that were aggregated; and providing the pseudo-anchor text for search. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification