System and method for extracting content for submission to a search engine
First Claim
Patent Images
1. A computer-implemented method, comprising:
- analyzing a structure of elements of a template of a first web page;
detecting, using a processor, non-essential information of a second web page based on at least the structure of the first web page; and
generating instructions for extracting content from the second web page, based on at least the detected non-essential information,wherein the non-essential information is associated with a template, andwherein the method further comprises generating instructions for extracting content, other than the template, from the second web page.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and a method for automatically submitting Web pages to a search engine, which is preferably used for submitting dynamic Web pages, but may optionally be used for any type of Web page. The present invention features a gateway server for providing these Web pages to the search engine, either directly or optionally through an autonomous software search program. Optionally and more preferably, the gateway server modifies the Web page before serving it to the autonomous software search program and/or search engine.
76 Citations
23 Claims
-
1. A computer-implemented method, comprising:
-
analyzing a structure of elements of a template of a first web page; detecting, using a processor, non-essential information of a second web page based on at least the structure of the first web page; and generating instructions for extracting content from the second web page, based on at least the detected non-essential information, wherein the non-essential information is associated with a template, and wherein the method further comprises generating instructions for extracting content, other than the template, from the second web page. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method, comprising:
-
analyzing a structure of elements of a template of a first web page; detecting, using a processor, non-essential information of a second web page based on at least the structure of the first web page; and
generating instructions for extracting content from the second web page, based on at least the detected non-essential information,wherein the analyzing comprises analyzing structures of a plurality of first web pages to identify a repeating pattern of elements of the template within the plurality of first web pages. - View Dependent Claims (9, 10)
-
-
11. An apparatus, comprising:
-
a storage device; and a processor coupled to the storage device, wherein the storage device stores a program for controlling the processor, wherein the processor, being operative with the program, is configured to; analyze a structure of elements of a template of a first web page; detect non-essential information of a second web page based on at least the structure of the first web page; and generate instructions for extracting content from the second web page, based on at least the detected non-essential information, and wherein the non-essential information of the second web page is associated with a template, and wherein the processor is further configured to generate instructions for extracting content, other than the template, from the second web page. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. An apparatus, comprising:
-
a storage device; and a processor coupled to the storage device, wherein the storage device stores a program for controlling the processor, wherein the processor, being operative with the program, is configured to; analyze a structure of elements of a template of a first web page; detect non-essential information of a second web page based on at least the structure of the first web page; and generate instructions for extracting content from the second web page, based on at least the detected non-essential information, and wherein the processor configured to analyze is further configured to analyze structures of a plurality of first web pages to identify a repeating pattern of elements of the template within the plurality of first web pages. - View Dependent Claims (19, 20)
-
-
21. A computer storage device storing instructions that, when executed by a processor, perform a method comprising the steps of:
-
analyzing a structure of elements of a template a first web page; detecting, using a processor, non-essential information of a second web page based on at least the structure of the first web page; and generating instructions for extracting content from the second web page, based on at least the detected non-essential information, whereby the extracted content does not include the detected non-essential information, wherein the analyzing comprises analyzing structures of a plurality of first web pages to identify a repeating pattern of elements of the template within the plurality of first web pages. - View Dependent Claims (22, 23)
-
Specification