Method and apparatus for defining data of interest
First Claim
1. A computer-implemented method for extracting data from a data source by one or more computing devices, the method comprising:
- applying, by at least one of the one or more computing devices, an extractor pattern to a computer-readable data source, wherein the extractor pattern is generated based at least in part on a prior identification of data of interest and wherein the extractor pattern includes one or more regular expressions which are configured to identify additional data of interest in the computer-readable data source;
retrieving, by at least one of the one or more computing devices, the additional data of interest from the computer-readable data source using the one or more regular expressions; and
storing, by at least one of the one or more computing devices, the additional data of interest in a data storage device.
5 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments of the invention include tools for extracting data of interest from the world wide web (WWW). The extraction is accomplished using descriptions of data of interest. The descriptions of data of interest can include computer programs comprising a sequence of instructions and extractor patterns. The extractor patterns can be developed interactively using a web browser integrated into the graphical development environment for creating the descriptions of data of interest. The instructions can be selected from a predetermined list of instructions designed for extracting information from the WWW. The descriptions of data of interest can be grouped into categories sharing common query elements.
-
Citations
27 Claims
-
1. A computer-implemented method for extracting data from a data source by one or more computing devices, the method comprising:
-
applying, by at least one of the one or more computing devices, an extractor pattern to a computer-readable data source, wherein the extractor pattern is generated based at least in part on a prior identification of data of interest and wherein the extractor pattern includes one or more regular expressions which are configured to identify additional data of interest in the computer-readable data source; retrieving, by at least one of the one or more computing devices, the additional data of interest from the computer-readable data source using the one or more regular expressions; and storing, by at least one of the one or more computing devices, the additional data of interest in a data storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for extracting data from a data source, the apparatus comprising:
-
one or more processors; and one or more memories operatively coupled to at least one of the one or more processors and having instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to; apply an extractor pattern to a computer-readable data source, wherein the extractor pattern is generated based at least in part on a prior identification of data of interest and wherein the extractor pattern includes one or more regular expressions which are configured to identify additional data of interest in the computer-readable data source; retrieve the additional data of interest from the computer-readable data source using the one or more regular expressions; and store the additional data of interest in a data storage device. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. At least one non-transitory computer-readable medium storing computer-readable instructions that, when executed by one or more computing devices, cause at least one of the one or more computing devices to:
-
apply an extractor pattern to a computer-readable data source, wherein the extractor pattern is generated based at least in part on a prior identification of data of interest and wherein the extractor pattern includes one or more regular expressions which are configured to identify additional data of interest in the computer-readable data source; retrieve the additional data of interest from the computer-readable data source using the one or more regular expressions; and store the additional data of interest in a data storage device. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification