System And Method For Generating An Amalgamated Database

US 20090012928A1
Filed: 07/03/2008
Published: 01/08/2009
Est. Priority Date: 11/06/2002
Status: Abandoned Application

First Claim

Patent Images

1. A method for creating an amalgamated bioinformatics database from at least a first database and a second database comprising the steps of:

identifying a first field from the records of the first database;

identifying a second field from the records of the second database, the second field having data related to the first field;

identifying a first set of concepts by traversing a mediating database using terms associated with the first field;

identifying a second set of concepts by traversing the mediating database using terms associated with the second field;

wherein at least one of the steps of identifying the first set of concepts or identifying the second set of concepts is performed using non-trivial terminological mapping;

determining a set of related concepts in the first set of concepts and the second set of concepts; and

generating a record in the amalgamated bioinformatics database comprising data from records of the first database, data from records of the second database and at least a portion of the related concepts from the mediating database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for creating an amalgamated bioinformatics database from at least a first database and a second database is presented. Concepts are identified in a first field from the records of the first database. A second field from the records of the second database which has data related to the first field is also identified. A first set of concepts is identified by traversing a mediating database using terms associated with the first field and a second set of concepts is also identified by traversing the mediating database using terms associated with the second field. Either the first set of concepts or the second set of concepts, or both, is identified using non-trivial terminological mapping. The set of related concepts in the first set of concepts and the second set of concepts is identified and a record is generated in the amalgamated bioinformatics database.

84 Citations

64 Claims

1. A method for creating an amalgamated bioinformatics database from at least a first database and a second database comprising the steps of:
- identifying a first field from the records of the first database;
  
  identifying a second field from the records of the second database, the second field having data related to the first field;
  
  identifying a first set of concepts by traversing a mediating database using terms associated with the first field;
  
  identifying a second set of concepts by traversing the mediating database using terms associated with the second field;
  
  wherein at least one of the steps of identifying the first set of concepts or identifying the second set of concepts is performed using non-trivial terminological mapping;
  
  determining a set of related concepts in the first set of concepts and the second set of concepts; and
  
  generating a record in the amalgamated bioinformatics database comprising data from records of the first database, data from records of the second database and at least a portion of the related concepts from the mediating database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 57, 58, 59, 60, 61, 62, 63, 64)
- - 2. The method of claim 1, wherein at least one of the first database and second database includes clinical data associated with at least one disease.
  - 3. The method of claim 1, wherein at least one of the first database and second database includes genomic data associated with at least one disease.
  - 4. The method of claim 1, wherein one of the first and second database includes clinical data associated with at least one disease, the other of the first and second database includes genomic data associated with the least one disease and wherein the related concepts associate said clinical data and said genomic data.
  - 5. The method of claim 1 wherein the step of terminological mapping includes at least one term expansion operation.
  - 6. The method of claim 1 wherein the step of terminological mapping includes at least one term normalization operation.
  - 7. The method of claim 1 wherein the step of terminological mapping includes at least one term expansion operation and at least one term normalization operation.
  - 8. The method of claim 1 wherein the step of terminological mapping includes a natural language processing operation.
  - 9. The method of claim 1 wherein the step of terminological mapping includes a semantic processing operation for part of speech identification.
  - 10. The method of claim 1, wherein at least one of the first database and second database is a structured database.
  - 11. The method of claim 1, wherein at least one of the first database and second database is a semi-structured database.
  - 12. The method of claim 1, wherein at least one of the first database and second database is an unstructured database.
  - 57. An amalgamated database produced by the method of claim 1.
  - 58. A method of performing a biomedical informatics analysis comprising(a) querying the amalgamated database of claim 57 with a classification schema to transform the data into a phenotype/trait format;
    - and(b) exporting the data transformed in step (i) to a biomedical informatics software.
  - 59. The method of claim 58, further comprising the step of performing a clustering analysis selected from the group consisting of self-organizing maps and hierarchical clustering.
  - 60. The method of claim 59, comprising the further step of identifying a statistical correlation between a pair of bioobjects or biodata items.
  - 61. The method of claim 60, wherein one bioobject or biodata item is a phenotypic trait and the other is a gene.
  - 62. The method of claim 60, wherein one bioobject or biodata item is a phenotypic trait and the other is a protein.
  - 63. A map of genomic DNA produced using an amalgamated database of claim 57, showing a genetic linkage between clinical manifestations of disease.
  - 64. A method of identifying linkage between a phenotypic trait and a gene, comprising producing a map according to claim 63 and co-locating the phenotypic trait and the gene.

13. A method for creating an amalgamated bioinformatics database from at least a first database and a second database comprising the steps of:
- identifying a first field from the records of the first database;
  
  identifying a second field from the records of the second database, the second field having data related to the first field;
  
  identifying a first set of concepts by traversing a mediating database using terms associated with the first field;
  
  identifying a second set of concepts by traversing the mediating database using terms associated with the second field;
  
  determining a set of related concepts in the first set of concepts and the second set of concepts;
  
  for least a portion of the related concepts, inheriting relationships of the related concepts from the mediating database; and
  
  generating a record in the amalgamated bioinformatics database comprising data from records of the first database, data from records of the second database and the related concepts and inherited relationships from the mediating database.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The method of claim 13, wherein at least one of the first database and second database includes clinical data associated with at least one disease.
  - 15. The method of claim 13, wherein at least one of the first database and second database includes genomic data associated with at least one disease.
  - 16. The method of claim 13, wherein one of the first and second database includes clinical data associated with at least one disease, the other of the first and second database includes genomic data associated with the least one disease and wherein the related concepts associate said clinical data and said genomic data.
  - 17. The method of claim 13, wherein at least one of the first database and second database is a structured database.
  - 18. The method of claim 13, wherein at least one of the first database and second database is a semi-structured database.
  - 19. The method of claim 13, wherein at least one of the first database and second database is an unstructured database.
  - 20. The method of claim 13, wherein the inherited relationships include an “
    - is a”
      
      relationship.
  - 21. The method of claim 13, wherein the inherited relationships include a partonomy relationship.
  - 22. The method of claim 13, wherein the inherited relationships includes at least one ancestral relationship.
  - 23. The method of claim 13, wherein the inherited relationships includes at least one relationship selected from the group consisting of an “
    - is a”
      
      , a partonomy and an ancestral relationship.

24. A method for creating an amalgamated bioinformatics database from at least a first database and a second database comprising the steps of:
- identifying a first field from the records of the first database;
  
  identifying a second field from the records of the second database, the second field having data related to the first field;
  
  identifying a first set of concepts by traversing a mediating database using terms associated with the first field;
  
  identifying a second set of concepts by traversing the mediating database using terms associated with the second field;
  
  wherein at least one of the steps of identifying the first set of concepts or identifying the second set of concepts is performed using terminological mapping;
  
  determining a set of related concepts in the first set of concepts and the second set of concepts;
  
  for least a portion of the related concepts, inheriting relationships of the related concepts from the mediating database; and
  
  generating a record in the amalgamated bioinformatics database comprising data from the records of the first database, data from the records of the second database and the related concepts and inherited relationships from the mediating database.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 25. The method of claim 24, wherein at least one of the first database and second database includes clinical data associated with at least one disease.
  - 26. The method of claim 24, wherein at least one of the first database and second database includes genomic data associated with at least one disease.
  - 27. The method of claim 24, wherein one of the first and second database includes clinical data associated with at least one disease, the other of the first and second database includes genomic data associated with the least one disease and wherein the related concepts associate said clinical data and said genomic data.
  - 28. The method of claim 24 wherein the step of terminological mapping includes at least one term expansion operation.
  - 29. The method of claim 24 wherein the step of terminological mapping includes at least one term normalization operation.
  - 30. The method of claim 24 wherein the step of terminological mapping includes at least one term expansion operation and at least one term normalization operation.
  - 31. The method of claim 24 wherein the step of terminological mapping includes a natural language processing operation.
  - 32. The method of claim 24 wherein the step of terminological mapping includes a semantic processing operation for part of speech identification.
  - 33. The method of claim 24, wherein the inherited relationships include an “
    - is a”
      
      relationship.
  - 34. The method of claim 24, wherein the inherited relationships include a partonomy relationship.
  - 35. The method of claim 24, wherein the inherited relationships includes at least one ancestral relationship.
  - 36. The method of claim 24, wherein the inherited relationships includes at least one relationship selected from the group consisting of an “
    - is a”
      
      , a partonomy and an ancestral relationship.
  - 37. The method of claim 24, wherein at least one of the first database and second database is a structured database.
  - 38. The method of claim 24, wherein at least one of the first database and second database is a semi-structured database.
  - 39. The method of claim 24, wherein at least one of the first database and second database is an unstructured database.

40. A method for creating a knowledge base of relationships between at least one biodata item that is a molecule and at least one other biodata item, comprising the steps of:
- (a) using a first database storing at least one biodata item that is a molecule associated with at least one other biodata item, said other biodata item being contained in a first set;
  
  (b) using a second database storing a second set of at least one biodata item and any information associated therewith, wherein the first set and the second set are not identical;
  
  (c) using at least one non-trivial terminological mapping operation in connection with a mediating database for identifying an association between a biodata item of the first set with a biodata item of the second set,(d) for each association identified in step (c), finding a relationship between the biodata item that is a molecule associated with the other biodata item of the first set of the association and the information associated with the biodata item of the second set of the association;
  
  (e) storing each relationship found in step (d) in a knowledge base.

41. A method of integrating a first and second database which are interoperable heterogeneous databases without a common key,wherein the first database contains a bioobject associated with a first record comprising a biodata item that is a molecule and a first correlating biodata item;
- wherein the second database contains a bioobject associated with a second record comprising a second correlating biodata item and a unique biodata item, where there is no equivalent to the unique biodata item in the first database;
  
  comprising the steps of;
  
  (a) using a mediating database to link the first correlating biodata item in the first database to the second correlating biodata item in the second data base using at least one non-trivial terminological mapping operation;
  
  (b) creating relationships between the biodata items in the first record and the second record, thereby producing an amalgamated third record comprising the biodata item which is the molecule and a plurality of biodata items, including the unique biodata item; and
  
  (c) storing the amalgamated record in an amalgamated database.
- View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
- - 43. The method of claim 41 wherein one database is structured.
  - 44. The method of claim 41 wherein one database is semi-structured.
  - 45. The method of claim 41 wherein one database is a genomic database.
  - 46. The method of claim 41 wherein one database is a genetic database.
  - 47. The method of claim 41 wherein one database is a proteomic database.
  - 48. The method of claim 41 wherein one database is a gene expression database.
  - 49. The method of claim 41 wherein one database comprises a biodata item derived from a non-human organism.
  - 50. The method of claim 41 wherein one of the first or second database is selected from the group consisting of a genomic database, a genetic database, a proteomic database, and a gene expression database and the other of the first or second database is selected from the group consisting of a preclinical database and a clinical database.
  - 51. The method of claim 41 wherein the one of the first or second correlating biodata item is a disorder and the other of the first or second correlating biodata item is a disease.
  - 52. The method of claim 50, wherein the first correlating biodata item is a disorder and the second correlating biodata item is a disease
  - 53. The method of claim 50, wherein the unique biodata item is a clinical manifestation of a disease.
  - 54. The method of claim 51, wherein the unique biodata item is a clinical manifestation of a disease.
  - 55. The method of claim 52, wherein the unique biodata item is a clinical manifestation of a disease.
  - 56. The method of claim 41 wherein the mediating database is generated using an automated network created from a source selected from the group consisting of related terminologies and databases using a method selected from the group consisting of exact index matching, norm, MMTX and a combination thereof.

42. A method of integrating a first and second database which are interoperable heterogeneous databases without a common key,wherein the first database contains a bioobject associated with a first record comprising a biodata item that is a molecule and a first correlating biodata item;
- wherein the second database contains a bioobject associated with a second record comprising a second correlating biodata item and a unique biodata item, where there is no equivalent to the unique biodata item in the first database;
  
  comprising the steps of;
  
  (a) transforming at least one of the databases into a generic format;
  
  (b) using at least one terminological mapping operation to a mediating database to link the first correlating biodata item in the first database to the second correlating biodata item in the second data base;
  
  (c) creating relationships between the biodata items in the first record and the second record, thereby producing an amalgamated third record comprising the biodata item which is the molecule and a plurality of biodata items, including the unique biodata item; and
  
  (d) storing the amalgamated record in an amalgamated database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Trustees Of Columbia University In The City Of New York (Columbia University)
Original Assignee
Trustees Of Columbia University In The City Of New York (Columbia University)
Inventors
Sarkar, Indra Neil, Cantor, Michael, Lussier, Yves A.

Application Number

US12/167,715
Publication Number

US 20090012928A1
Time in Patent Office

Days
Field of Search
US Class Current

706/59
CPC Class Codes

G06F 16/30   of unstructured textual dat...

G06F 40/242   Dictionaries

G06F 40/30   Semantic analysis

G16B 50/00   ICT programming tools or da...

G16B 50/10   Ontologies; Annotations

G16B 50/30   Data warehousing; Computing...

System And Method For Generating An Amalgamated Database

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

84 Citations

64 Claims

Specification

Solutions

Use Cases

Quick Links

System And Method For Generating An Amalgamated Database

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

84 Citations

64 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links