Method and apparatus for defining data of interest
First Claim
1. A computer-implemented method executed by one or more computing devices for extracting data of interest to a user from a web site, the method comprising:
- receiving, by at least one of the one or more computing devices, a description of data of interest from a user, the description of the data of interest being associated with an extraction parameter;
querying, by at least one of the one or more computing devices, a web site using a value of the extraction parameter and an extraction pattern, the extraction pattern being associated with the description of data of interest, wherein the extraction pattern is adapted to identify at least a portion of an output of a web site and extract information from one or more web pages associated with the web site, and wherein the extraction pattern comprises a regular expression;
extracting, by at least one of the one or more computing devices, the data of interest from the web site based on the query; and
storing, by at least one of the one or more computing devices, the extracted data of interest.
6 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments of the invention include tools for extracting data of interest from the world wide web (WWW). The extraction is accomplished using descriptions of data of interest. The descriptions of data of interest can include computer programs comprising a sequence of instructions and extractor patterns. The extractor patterns can be developed interactively using a web browser integrated into the graphical development environment for creating the descriptions of data of interest. The instructions can be selected from a predetermined list of instructions designed for extracting information from the WWW. The descriptions of data of interest can be grouped into categories sharing common query elements. Multiple descriptions of data of interest in the same category can executed simultaneously using the same query. The descriptions of data of interest can be accessed by a client computer using a web browser to initiate a query. In some embodiments, the descriptions of data of interest are used to provide information about products available for sale over the WWW.
24 Citations
27 Claims
-
1. A computer-implemented method executed by one or more computing devices for extracting data of interest to a user from a web site, the method comprising:
-
receiving, by at least one of the one or more computing devices, a description of data of interest from a user, the description of the data of interest being associated with an extraction parameter; querying, by at least one of the one or more computing devices, a web site using a value of the extraction parameter and an extraction pattern, the extraction pattern being associated with the description of data of interest, wherein the extraction pattern is adapted to identify at least a portion of an output of a web site and extract information from one or more web pages associated with the web site, and wherein the extraction pattern comprises a regular expression; extracting, by at least one of the one or more computing devices, the data of interest from the web site based on the query; and storing, by at least one of the one or more computing devices, the extracted data of interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for extracting data of interest to a user from a web site, the system comprising:
-
a processor; and memory operatively coupled to the processor and containing instructions that, when executed by the processor, cause the processor to carry out the steps of; receiving a description of data of interest from a user, the description of the data of interest being associated with an extraction parameter; querying a web site using a value of the extraction parameter and an extraction pattern, the extraction pattern being associated with the description of data of interest, wherein the extraction pattern is adapted to identify at least a portion of an output of a web site and extract information from one or more web pages associated with the web site, and wherein the extraction pattern comprises a regular expression; extracting the data of interest from the web site based on the query; and storing the extracted data of interest. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. Non-transitory computer readable media having instructions recorded thereon that, when executed by a processor, cause the processor to carry out a method for extracting data of interest to a user from a web site, the method comprising the steps of:
-
receiving a description of data of interest from a user, the description of the data of interest being associated with an extraction parameter; querying a web site using a value of the extraction parameter and an extraction pattern, the extraction pattern being associated with the description of data of interest, wherein the extraction pattern is adapted to identify at least a portion of an output of a web site and extract information from one or more web pages associated with the web site, and wherein the extraction pattern comprises a regular expression; extracting the data of interest from the web site based on the query; and storing the extracted data of interest. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification