Domain-aware snippets for search results
First Claim
1. One or more computer-readable storage media having computer-usable instructions stored thereon for performing a method of providing a domain-aware snippet for a search result, the method comprising:
- identifying source code for a plurality of web pages that belong to a single domain;
identifying one or more tag patterns of a predetermined number of sections within the source code of the plurality of web pages belonging to the single domain, wherein a predetermined number of web pages of the plurality of web pages belonging to the single domain share at least one identical tag pattern;
based on the identifying of the one or more tag patterns of the predetermined number of sections within the source code of the plurality of web pages belonging to the single domain, determining that a template exists;
extracting the template of the predetermined number of web pages of the plurality of web pages belonging to the single domain, wherein the template is a structured layout used to construct each of the plurality of web pages belonging to the single domain, and wherein the template is shared by each of the predetermined number of web pages of the plurality of web pages belonging to the single domain;
associating the template and content of the predetermined number of web pages of the plurality web pages related to the template with a Uniform Resource Locator pattern of the single domain, the Uniform Resource Locator pattern is a portion of a Uniform Resource Locator that is shared among all of the predetermined number of web pages of the plurality of web pages associated with the template; and
storing the association of the template, the related content, and the Uniform Resource Locator pattern in a database.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are disclosed for providing a domain-aware snippet for a search result. With such techniques, a domain classification component is provided for identifying a template used to generate a plurality of web pages of a domain, associating the template and content of the web pages related to the template with a Uniform Resource Locator pattern of the plurality of web pages, and storing the associated template, the related content, and the Uniform Resource Locator pattern in a database. A snippet extraction component is also provided for extracting text from a section of a web page of the plurality of web pages for a snippet of a search result corresponding to a search query, wherein the extracted text is based on a ranking value of the section and the relevance of the extracted text to the search query.
-
Citations
11 Claims
-
1. One or more computer-readable storage media having computer-usable instructions stored thereon for performing a method of providing a domain-aware snippet for a search result, the method comprising:
-
identifying source code for a plurality of web pages that belong to a single domain; identifying one or more tag patterns of a predetermined number of sections within the source code of the plurality of web pages belonging to the single domain, wherein a predetermined number of web pages of the plurality of web pages belonging to the single domain share at least one identical tag pattern; based on the identifying of the one or more tag patterns of the predetermined number of sections within the source code of the plurality of web pages belonging to the single domain, determining that a template exists; extracting the template of the predetermined number of web pages of the plurality of web pages belonging to the single domain, wherein the template is a structured layout used to construct each of the plurality of web pages belonging to the single domain, and wherein the template is shared by each of the predetermined number of web pages of the plurality of web pages belonging to the single domain; associating the template and content of the predetermined number of web pages of the plurality web pages related to the template with a Uniform Resource Locator pattern of the single domain, the Uniform Resource Locator pattern is a portion of a Uniform Resource Locator that is shared among all of the predetermined number of web pages of the plurality of web pages associated with the template; and storing the association of the template, the related content, and the Uniform Resource Locator pattern in a database. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for providing a domain-aware snippet for a search result, the system comprising:
-
a computing device having a processor and a memory; a domain classification component configured to; identify and extract a template used to generate a plurality of web pages of a single domain, wherein the template is a structured layout used to construct each of the plurality of web page belonging to the single domain, and wherein the template is shared by each of the plurality of web pages belonging to the single domain, associate the template and content of the plurality of web pages related to the template with a Uniform Resource Locator pattern of the plurality of web pages, the Uniform Resource Locator pattern being a portion of a Uniform Resource Locator that is shared among all of the plurality of web pages associated with the template, and store the associated template, the related content, and the Uniform Resource Locator pattern in a database, and a snippet extraction component configured to extract text from a section of at least one web page of the plurality of web pages to be used in a snippet of a search result corresponding to a search query, wherein the extracted text is based on a ranking value of the section of the template from which the text was extracted and a relevance value of the extracted text to the search query. - View Dependent Claims (9, 10, 11)
-
Specification