Similarity and ranking of databases based on database metadata

US 10,303,793 B2
Filed: 11/12/2014
Issued: 05/28/2019
Est. Priority Date: 03/19/2014
Status: Active Grant

First Claim

Patent Images

1. A method for determining similarity of databases, the method comprising:

(a) one or more processors selecting a first database and a second database from a plurality of databases that includes one or more additional databases;

(b) one or more processors identifying one or more terms found in the first database and found in the second database of the plurality of databases as one or more common terms, wherein each term of the one or more terms is comprised of metadata of a structure of a database of the plurality of databases that defines the objects in the database;

(c) one or more processors determining for a common term of the one or more common terms, a quantity of databases of the plurality of databases in which the common term is found, wherein the quantity of databases in which the common term of the one or more common terms is found constitutes a group, and wherein a range of groups includes each quantity of databases, from a group of two databases to a group of a quantity of databases equal to the plurality of databases; and

(d) one or more processors determining a similarity score between the first database and the second database of the plurality of databases based on a tuple formed from the quantity of the one or more common terms found in each group of the range of groups.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A processor selects a first database and a second database from a plurality of databases. The processor determines one or more terms found in the first and second database, wherein each term of the one or more terms includes metadata of a database of the plurality of databases. The processor identifies one or more common terms between the first database and the second database and determines the one or more common terms found in each of a plurality of groups of databases of the plurality of databases, wherein each group of databases corresponds to a number of databases which constitute the group of databases. The processor determines a similarity score between the first database and the second database of the plurality of databases based on the one or more common terms found in each group of databases of the plurality of databases.

Citations

16 Claims

1. A method for determining similarity of databases, the method comprising:
- (a) one or more processors selecting a first database and a second database from a plurality of databases that includes one or more additional databases;
  
  (b) one or more processors identifying one or more terms found in the first database and found in the second database of the plurality of databases as one or more common terms, wherein each term of the one or more terms is comprised of metadata of a structure of a database of the plurality of databases that defines the objects in the database;
  
  (c) one or more processors determining for a common term of the one or more common terms, a quantity of databases of the plurality of databases in which the common term is found, wherein the quantity of databases in which the common term of the one or more common terms is found constitutes a group, and wherein a range of groups includes each quantity of databases, from a group of two databases to a group of a quantity of databases equal to the plurality of databases; and
  
  (d) one or more processors determining a similarity score between the first database and the second database of the plurality of databases based on a tuple formed from the quantity of the one or more common terms found in each group of the range of groups.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
    - performing steps (a) through (d) for each pairing of the first database with each database of the plurality of databases other than the second database.
  - 3. The method of claim 2, further comprising:
    - ranking the similarity scores for each pairing of the first database with each database of the plurality of databases.
  - 4. The method of claim 1, further comprising:
    - performing steps (a) through (d) on all pairings of databases of the plurality of databases, other than pairings with the first database; and
      
      ranking the similarity scores of all the pairings of the plurality of databases, other than pairings with the first database.
  - 5. The method of claim 1, further comprising:
    - creating a histogram based on the quantity of the one or more common terms found in each group of databases of the plurality of groups of databases, wherein the histogram is associated with a similarity of the second database to the first database.
  - 6. The method of claim 1, wherein each of the one or more common terms includes a triplet comprised of a database table name, a database table column name, and a database table column type.
  - 7. The method of claim 1, wherein at least one term of the one or more terms is a hash derived from the metadata of the database of the plurality of databases.
  - 8. The method of claim 1, wherein determining the one or more common terms includes determining a partial match of the one or more common terms between the first database and the second database.
  - 9. The method of claim 1, wherein determining the similarity score between the first database and the second database of the plurality of databases includes generating a tuple in which each element of the tuple corresponds to the quantity of the one or more common terms found in a group of a particular quantity of databases, and applying weighting factors to each element of the tuple.

10. A method for determining a similarity of databases to search criteria, the method comprising:
- (a) one or more processors receiving search criteria, wherein the search criteria includes one or more terms, wherein the one or more terms are comprised of metadata of a structure of a database of the plurality of databases that defines the objects in the database, and wherein the one or more terms are selected from a list presented to a user;
  
  (b) one or more processors determining the one or more terms found in both the search criteria and a first database of a plurality of databases, wherein the one or more terms found in both the search criteria and a first database are one or more common terms;
  
  (c) one or more processors determining, a quantity of the one or more common terms found in each of a plurality of groups of databases of the plurality of databases, wherein a group of databases of the plurality of groups of databases includes a quantity of databases in which a common term of the one or more common terms is found, and wherein a range of the plurality of groups of databases extends from two databases to the quantity of the plurality of databases; and
  
  (d) one or more processors determining a similarity score of the first database of the plurality of databases to the search criteria, based on a tuple formed from the quantity of the one or more common terms found in each group of the range of groups of databases, wherein the similarity of the first database to the search criteria is based on the similarity score.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10, wherein determining a similarity score of the first database, further comprises:
    - performing steps (a) through (d) for each pairing of the search criteria and each database of the plurality of databases other than the first database.
  - 12. The method of claim 11, further comprising:
    - ranking the similarity of each database of the plurality of databases to the search criteria, based on the similarity score of each database of the plurality of databases.
  - 13. The method of claim 10, wherein the one or more terms of the search criteria includes a triplet comprised of metadata of the structure of the first database the plurality of databases, the metadata containing at least one of:
    - a database table name, a database table column name, and a database table column type.
  - 14. The method of claim 10, wherein the search criteria is a hash derived from the one or more terms of the search criteria.
  - 15. The method of claim 10, wherein determining the one or more common terms includes determining a partial match of the one or more common terms between the search criteria and the first database.
  - 16. The method of claim 10, wherein determining the similarity score between the first database and the second database of the plurality of databases includes generating a tuple in which each element of the tuple corresponds to the quantity of the one or more common terms found in a group of a particular quantity of databases, and applying weighting factors to each element of the tuple.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bhagavan, Srini, Kiernan, Gerald G.
Primary Examiner(s)
Hershley, Mark E

Application Number

US14/538,882
Publication Number

US 20150269161A1
Time in Patent Office

1,658 Days
Field of Search

707750
US Class Current
CPC Class Codes

G06F 16/152   using file content signatur...

G06F 16/24578   using ranking

G06F 16/285   Clustering or classification

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

Similarity and ranking of databases based on database metadata

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Similarity and ranking of databases based on database metadata

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links