Pseudo-anchor text extraction
First Claim
1. A method implemented by a computer, the method comprising:
- identifying an object in content;
identifying a first block of information associated with the object and a second block of information associated with the object;
extracting the first block of information associated with the object and the second block of information associated with the object;
aggregating the first block of information associated with the object and the second block of information associated with the object to obtain aggregated information associated with the object;
constructing, by the computer, a pseudo uniform resource locator (pseudo-URL) of the object based at least on the information being extracted and aggregated;
associating the pseudo-URL with the object;
storing for repeated use, the pseudo-URL being associated with the object and the information being extracted; and
providing, via the pseudo-URL, the information extracted for search.
2 Assignments
0 Petitions
Accused Products
Abstract
A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help rank the objects in a search result to improve search performance. The method may be used in vertical search of objects such as published articles, products and images that lack explicit URLs and anchor text information.
18 Citations
20 Claims
-
1. A method implemented by a computer, the method comprising:
-
identifying an object in content; identifying a first block of information associated with the object and a second block of information associated with the object; extracting the first block of information associated with the object and the second block of information associated with the object; aggregating the first block of information associated with the object and the second block of information associated with the object to obtain aggregated information associated with the object; constructing, by the computer, a pseudo uniform resource locator (pseudo-URL) of the object based at least on the information being extracted and aggregated; associating the pseudo-URL with the object; storing for repeated use, the pseudo-URL being associated with the object and the information being extracted; and providing, via the pseudo-URL, the information extracted for search. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; a memory coupled to the processor, the memory storing a plurality of modules, at least one of the plurality of modules upon execution by the processor performing pseudo-anchor text extraction comprising; identifying an object in content; identifying a first candidate anchor block associated with the object and a second candidate anchor block associated with the object; extracting pseudo-anchor text from the first candidate anchor block and the second candidate anchor block, wherein the pseudo-anchor text being extracted includes text including a possible description of the object; accumulating the pseudo-anchor text extracted from the first candidate anchor block and the pseudo-anchor text extracted from the second candidate anchor block to obtain pseudo-anchor text associated with the object; constructing a pseudo uniform resource locator (pseudo-URL) of the object based at least on the pseudo-anchor text associated with the object; associating the pseudo-URL with the object; storing for repeated use, the pseudo-URL being associated with the object and the pseudo-anchor text associated with the object; and providing for search, via the pseudo-URL, the pseudo-anchor text associated with the object. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable media having computer-readable instructions encoded thereon, the computer-readable instructions upon execution configuring a search engine to perform operations comprising:
-
identifying an object in content; identifying a first candidate anchor block associated with the object and a second candidate anchor block associated with the object; extracting the first candidate anchor block associated with the object and the second candidate anchor block associated with the object; aggregating the first candidate anchor block associated with the object and the second candidate anchor block associated with the object to obtain pseudo-anchor text associated with the object; extracting the pseudo-anchor text from the candidate anchor blocks that were aggregated; constructing a pseudo uniform resource locator (pseudo-URL) of the object based at least on the extracted pseudo-anchor text; associating the pseudo-URL with the object; storing for repeated use, the pseudo-URL being associated with the object and the extracted pseudo-anchor text; and providing, via the pseudo-URL, the pseudo-anchor text extracted for search. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification