Canonicalized online document sitelink generation
First Claim
Patent Images
1. A system for canonicalized online document sitelink generation, comprising:
- a data processing system comprising at least one processor and memory to;
receive digital information generated by an audio codec that converts spoken information from a user to the digital information;
identify, based on the digital information, a content item associated with a first uniform resource locator (URL) including a campaign parameter;
generate a canonicalized content item URL comprising a canonical form by removing the campaign parameter from the first URL;
generate a content item URL group with the canonicalized content item URL;
receive a sitelink associated with a second URL indexed in a database, the second URL including a URL parameter;
crawl the second URL with the URL parameter to identify a landing page;
crawl the second URL without the URL parameter to identify the same landing page;
generate, responsive to crawl of the second URL with and without the URL parameter and identification of the same landing page, a canonicalized sitelink URL for the second URL by removal of the URL parameter, wherein the canonicalized content item URL is in the canonical form configured to reduce repeated calculations as compared to the first URL not in the canonical form;
match the canonicalized sitelink URL with the content item of the content item URL group based on an indication of similarity between text of the content item and text of the canonicalized sitelink URL;
determine, based on a filter configured to eliminate excluded content items based on a geographic policy, that the content item is compatible with the canonicalized sitelink URL; and
select, in response to receipt of the digital information generated by the audio codec that converts the spoken information from the user to the digital information, the content item matched with the sitelink associated with the canonicalized sitelink URL based on the filter.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for improved processor efficiency via reductions in repeated calculations are provided. A plurality of candidate sitelinks are identified in response to a search for online content. Each sitelink has associated with it a plurality of candidate creatives with which the sitelink may be presented to the user. The creatives are canonicalized to form clusters of candidate creatives. The sitelinks are also canonicalized. The creatives are matched to the candidate canonicalized sitelinks so as to provide enhanced sitelinks having increased relevance to the user search.
210 Citations
18 Claims
-
1. A system for canonicalized online document sitelink generation, comprising:
a data processing system comprising at least one processor and memory to; receive digital information generated by an audio codec that converts spoken information from a user to the digital information; identify, based on the digital information, a content item associated with a first uniform resource locator (URL) including a campaign parameter; generate a canonicalized content item URL comprising a canonical form by removing the campaign parameter from the first URL; generate a content item URL group with the canonicalized content item URL; receive a sitelink associated with a second URL indexed in a database, the second URL including a URL parameter; crawl the second URL with the URL parameter to identify a landing page; crawl the second URL without the URL parameter to identify the same landing page; generate, responsive to crawl of the second URL with and without the URL parameter and identification of the same landing page, a canonicalized sitelink URL for the second URL by removal of the URL parameter, wherein the canonicalized content item URL is in the canonical form configured to reduce repeated calculations as compared to the first URL not in the canonical form; match the canonicalized sitelink URL with the content item of the content item URL group based on an indication of similarity between text of the content item and text of the canonicalized sitelink URL; determine, based on a filter configured to eliminate excluded content items based on a geographic policy, that the content item is compatible with the canonicalized sitelink URL; and select, in response to receipt of the digital information generated by the audio codec that converts the spoken information from the user to the digital information, the content item matched with the sitelink associated with the canonicalized sitelink URL based on the filter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method of canonicalized online document sitelink generation, comprising:
-
receiving, by a data processing system comprising at least one processor and memory, digital information generated by an audio codec that converts spoken information from a user to the digital information; identifying, by the data processing system, based on the digital information, a content item associated with a first uniform resource locator (URL) including a campaign parameter; generating, by the data processing system, the first URL into a canonicalized content item URL comprising a canonical form by removing the campaign parameter from the first URL; generating, by the data processing system, a content item URL group with the canonicalized content item URL; receiving, by the data processing system, a sitelink associated with a second URL indexed in a database, the second URL including a URL parameter; crawling the second URL with the URL parameter to identify a landing page; crawling the second URL without the URL parameter to identify the same landing page; generating, by the data processing system responsive to crawling the second URL with and without the URL parameter and identifying the same landing page, a canonicalized sitelink URL for the second URL by removing the URL parameter, wherein the canonicalized content item URL is in the canonical form configured to reduce repeated calculations as compared to the first URL not in the canonical form; matching, by the data processing system, the canonicalized sitelink URL with the content item of the content item URL group based on an indication of similarity between text of the content item and text of the canonicalized sitelink URL; determining, by the data processing system, based on a filter configured to eliminate excluded content items based on a geographic policy, that the content item is compatible with the canonicalized sitelink URL; and selecting, by the data processing system in response to receipt of the digital information generated by the audio codec that converts the spoken information from the user to the digital information, the content item matched with the sitelink associated with the canonicalized sitelink URL based on the filter. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification