Method and apparatus for structuring the querying and interpretation of semistructured information
First Claim
1. A method of generating a wrapper for accessing semistructured information in order to provide access to at least one of a plurality of attributes in said semistructured information as tuples for a relational database system, said method comprising the steps of:
- examining the semistructured information to identify patterns of interest, said patterns of interest including at least one of said attributes;
generating a description file, including regular expressions, for the patterns of interest, said regular expressions specifying at least one of a plurality of locations of the attributes within the semistructured information; and
generating said wrapper based upon the description file.
6 Assignments
0 Petitions
Accused Products
Abstract
A method is provided for determining how semistructured information is organized in disparate semistructured resources by providing a wrapper to extract information and to provide structured information (e.g., tuples of an SQL database) to a mapper coupled to a standard relational database engine. In a specific embodiment, a querying agent is provided on top of the mapper. Further according to the invention, structured high-level user queries are processed across the disparate semistructured resources using a plurality of wrappers each dedicated to a particular resource.
-
Citations
24 Claims
-
1. A method of generating a wrapper for accessing semistructured information in order to provide access to at least one of a plurality of attributes in said semistructured information as tuples for a relational database system, said method comprising the steps of:
-
examining the semistructured information to identify patterns of interest, said patterns of interest including at least one of said attributes; generating a description file, including regular expressions, for the patterns of interest, said regular expressions specifying at least one of a plurality of locations of the attributes within the semistructured information; and generating said wrapper based upon the description file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for generating a parser for parsing semistructured information, said parsing to produce an input to a relational database by structuring, querying and interpreting said semistructured information to form at least one of a plurality of organized tuples, said method comprising the steps of:
-
a) lexically analyzing the semistructured information to identify patterns of interest; b) cataloging, without a priori information, the patterns of interest, said cataloging further comprising; associating a name and a position with each pattern of interest; providing a nested structure; incorporating each pattern of interest along with said name and said position into said nested structure; c) examining the patterns of interest in the nested structure to identify attributes that correspond to fields of a relational schema of a relational database; d) further examining the patterns of interest in the nested structure to identify embedded patterns of interest upon which to apply the cataloging step; e) yet further examining the patterns of interest in the nested structure to identify links to other semistructured information sources to examine;
thereupon,applying the lexically analyzing, cataloging, examining, further examining and yet further examining steps to said other semistructured information sources; f) forming a plurality of regular expressions of the semistructured information in the nested structure; g) providing said plurality of regular expressions in a definition for use by a dedicated program translator; and h) providing the definition as input to the program translator for building said parser for the relational database. - View Dependent Claims (10, 11, 12)
-
-
13. A method for responding to a single high-level structured user query over a plurality of disparate semistructured information resources, comprising the steps of:
-
providing a wrapper for each one of the disparate semistructured information resources, each wrapper employing a definition of semistructured information for a specific semistructured information resource, each wrapper created by; examining the semistructured information to identify patterns of interest, said patterns of interest including at least one of said attributes; generating a description file, including regular expressions, for the patterns of interest, said regular expressions specifying at least one of a plurality of locations of the attributes within the semistructured information; and generating said wrapper based upon the description file;
thereuponissuing the user query through the wrappers to the plurality of disparate semistructured information resources; receiving tuples from each one of the wrappers in response to the query; providing the tuples to a relational database; and executing the query on the relational database to return results to the user. - View Dependent Claims (14, 15, 16)
-
-
17. A method for responding to a single high-level structured user query over a plurality of disparate semistructured information resources, comprising the steps of:
-
providing a wrapper for each one of the disparate semistructured information resources, each wrapper employing a definition of semistructured information for a specific semistructured information resource, each wrapper created by; examining the semistructured information to identify patterns of interest, said patterns of interest including at least one of said attributes; generating a description file, including regular expressions, for the patterns of interest, said regular expressions specifying at least one of a plurality of locations of the attributes within the semistructured information; and generating said wrapper based upon the description file;
thereupon;issuing a preselected set of load queries through the wrappers to the plurality of disparate semistructured information resources to collect tuples; receiving the tuples from each one of the wrappers in response to the query; storing the tuples in a common relational database; and executing the single user query on the common relational database to return results to the user.
-
-
18. A method of generating a wrapper for accessing information in a web page, comprising the steps of:
-
examining the web page to identify patterns of interest that include attributes; generating a description file including regular expressions for the patterns of interest which specify locations of the attributes within the web page; and generating a wrapper based upon the description file said wrapper, providing access to the attributes in the web page as tuples for a relational database system. - View Dependent Claims (19, 20, 21, 22, 23, 24)
-
Specification