SIMILARITY AND RANKING OF DATABASES BASED ON DATABASE METADATA

US 20150269154A1
Filed: 03/19/2014
Published: 09/24/2015
Est. Priority Date: 03/19/2014
Status: Active Grant

First Claim

Patent Images

1-14. -14. (canceled)

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A processor selects a first database and a second database from a plurality of databases. The processor determines one or more terms found in the first and second database, wherein each term of the one or more terms includes metadata of a database of the plurality of databases. The processor identifies one or more common terms between the first database and the second database and determines the one or more common terms found in each of a plurality of groups of databases of the plurality of databases, wherein each group of databases corresponds to a number of databases which constitute the group of databases. The processor determines a similarity score between the first database and the second database of the plurality of databases based on the one or more common terms found in each group of databases of the plurality of databases.

18 Citations

View as Search Results

31 Claims

1-14. -14. (canceled)

15. A computer program product for determining a similarity of databases, the computer program product comprising:
- a computer-readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising;
  
  (a) computer readable program code configured to select a first database and a second database from a plurality of databases;
  
  (b) computer readable program code configured to determine if one or more terms found in the first database are also found in the second database, wherein each term of the one or more terms includes metadata of a database of the plurality of databases, and wherein the one or more terms found in both databases are one or more common terms;
  
  (c) computer readable program code configured to determine a quantity of the one or more common terms found in each of a plurality of groups of databases of the plurality of databases, wherein each group of databases corresponds to a number of databases which constitute the group of databases; and
  
  (d) computer readable program code configured to determine a similarity score between the first database and the second database of the plurality of databases based on the quantity of the one or more common terms found in each group of databases of the plurality of databases.
- View Dependent Claims (16, 17, 19, 20, 21)
- - 16. The computer program product of claim 15, further comprising:
    - computer readable program code configured to perform steps (a) through (d) for each pairing of the first database with each database of the plurality of databases other than the second database; and
      
      computer readable program code configured to rank the similarity scores for each pairing of the first database with each database of the plurality of databases.
  - 17. The computer program product of claim 15, further comprising:
    - computer readable program code configured to perform steps (a) through (d) on all pairings of the plurality of databases, other than pairings with the first database; and
      
      computer readable program code configured to rank the similarity scores of pairings of the plurality of databases, other than pairings with the first database.
  - 19. The computer program product of claim 15, wherein the metadata of each of the one or more common terms includes at least one of a database table name, a database table column name, and a database table column type.
  - 20. The computer program product of claim 15, wherein determining the one or more common terms includes determining a partial match of the one or more common terms between the first database and the second database.
  - 21. The computer program product of claim 15, wherein at least one term of the one or more terms is a hash derived from the metadata of the database of the plurality of databases.

18. (canceled)

22. A computer system for determining a similarity of databases, the computer program product comprising:
- one or more computer processors;
  
  one or more computer readable storage media; and
  
  program instructions stored on the computer readable storage media for execution by at least one of the one or more processors, the program instructions comprising;
  
  (a) program instructions to select a first database and a second database from a plurality of databases;
  
  (b) program instructions to determine if one or more terms found in the first database are also found in the second database, wherein each term of the one or more terms includes metadata of a database of the plurality of databases, and wherein the one or more terms found in both databases are one or more common terms;
  
  (c) program instructions to determine a quantity of the one or more common terms found in each of a plurality of groups of databases of the plurality of databases, wherein each group of databases corresponds to a number of databases which constitute the group of databases; and
  
  (d) program instructions to determine a similarity score between the first database and the second database of the plurality of databases based on the quantity of the one or more common terms found in each group of databases of the plurality of databases.
- View Dependent Claims (23, 24, 25, 26, 27)
- - 23. The computer system of claim 22, further comprising:
    - program instructions to perform steps (a) through (d) for each pairing of the first database with each database of the plurality of databases other than the second database; and
      
      program instructions to rank the similarity scores for each pairing of the first database with each database of the plurality of databases, other than the second database.
  - 24. The computer system of claim 22, further comprising:
    - program instructions to perform steps (a) through (d) on all pairings of databases of the plurality of databases, other than pairings with the first database; and
      
      program instructions to rank the similarity scores of the pairings of the plurality of databases, other than pairings with the first database.
  - 25. The computer system of claim 22, wherein the metadata of each of the one or more common terms includes at least one of a database table name, a database table column name, and a database table column type.
  - 26. The computer system of claim 22, wherein determining the one or more common terms includes determining a partial match of the one or more common terms between the first database and the second database.
  - 27. The computer system of claim 22, further comprising:
    - program instructions to create a graph based on the quantity of the one or more common terms found in each group of databases of the plurality of groups of databases wherein the graph is associated with a similarity of the second database to the first database.

28. A computer program product for determining a similarity of databases to search criteria, the method comprising:
- (a) computer readable program code configured to receive search criteria, wherein the search criteria includes one or more terms;
  
  (b) computer readable program code configured to determine the one or more terms found in both the search criteria and a first database of a plurality of databases, wherein the one or more terms found in both the search criteria and a first database are one or more common terms;
  
  (c) computer readable program code configured to determine a quantity of the one or more common terms found in each of a plurality of groups of databases of the plurality of databases, wherein a group of databases of the plurality of groups of databases corresponds to a number of databases which constitutes the group of databases; and
  
  (d) computer readable program code configured to determine a similarity score of the first database of the plurality of databases based on the quantity of the one or more common terms found in each group of databases of the plurality of databases, wherein the similarity of the first database to the search criteria is based on the similarity score.
- View Dependent Claims (29, 30, 31)
- - 29. The computer program product of claim 28, wherein determining a similarity score of the first database, further comprises:
    - computer readable program code configured to perform steps (a) through (d) for each pairing of the search criteria and each database of the plurality of databases other than the first database; and
      
      computer readable program code configured to rank the similarity of each database of the plurality of databases to the search criteria, based on the similarity score of each database of the plurality of databases.
  - 30. The computer program product claim 28, wherein the one or more terms of the search criteria include metadata of one or more databases, the metadata having at least one of a database table name, a database table column name, and a database table column type.
  - 31. The computer program product of claim 28, wherein determining one or more common terms includes determining a partial match of the one or more common terms between the search criteria and the first database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bhagavan, Srini, Kiernan, Gerald G.

Granted Patent

US 9,740,748 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/152   using file content signatur...

G06F 16/24578   using ranking

G06F 16/285   Clustering or classification

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

SIMILARITY AND RANKING OF DATABASES BASED ON DATABASE METADATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

18 Citations

31 Claims

Specification

Use Cases

Quick Links

Others

SIMILARITY AND RANKING OF DATABASES BASED ON DATABASE METADATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

31 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others