Method and system for schema matching of web databases
First Claim
1. A method in a computer system for generating an occurrence cube, the method comprising:
- for each global attribute of a domain of a database,for each interface attribute of the database,submitting queries to the database, each query having a value of the interface attribute of the database set to a global attribute value of the global attribute of the domain of the database; and
for each result of each submitted query, counting the number of times the value of the global attribute occurs within each result attribute of the result; and
for each global attribute, interface attribute, and result attribute combination, storing as an element of the occurrence cube an accumulation of the counts of the number of times the value of the global attribute occurs within each result attribute in a result from a query submitted with the interface attribute set to a global attribute value of the global attribute,wherein the stored elements form the occurrence cube.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for identifying schemas of web databases is provided. A schema matching system generates a mapping between an interface schema and a result schema of a web database, which is used to represent the underlying database schema. The schema matching system also generates a mapping of the interface attributes and the result attributes of the web database to global attributes of a global schema whose semantics are known. Using these mappings, a search engine service can formulate queries using the global attributes, map those queries to the corresponding interface attributes, submit the query, and retrieve the values from the result attributes that correspond to the desired global attributes.
15 Citations
15 Claims
-
1. A method in a computer system for generating an occurrence cube, the method comprising:
-
for each global attribute of a domain of a database, for each interface attribute of the database, submitting queries to the database, each query having a value of the interface attribute of the database set to a global attribute value of the global attribute of the domain of the database; and for each result of each submitted query, counting the number of times the value of the global attribute occurs within each result attribute of the result; and for each global attribute, interface attribute, and result attribute combination, storing as an element of the occurrence cube an accumulation of the counts of the number of times the value of the global attribute occurs within each result attribute in a result from a query submitted with the interface attribute set to a global attribute value of the global attribute, wherein the stored elements form the occurrence cube. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method in a computer system for identifying attributes of a database within a domain, the method comprising:
-
providing counts of occurrences associated with global attributes of a global schema of the domain and interface attributes of an interface schema and result attributes of a result schema of the database, each count representing, for each global attribute, interface attribute, and result attribute combination, the number of occurrences in which a global attribute value for the global attribute occurs as a value of the result attribute in a result of a guery submitted to the database with the interface attribute set to the global attribute value; estimating mutual information between pairs of schemas based on the provided counts; identifying from the estimated mutual information which attributes match; and storing an indication of the matching attributes. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
Specification