Ontology mapper

US 8,856,156 B1
Filed: 10/05/2012
Issued: 10/07/2014
Est. Priority Date: 10/07/2011
Status: Active Grant

First Claim

Patent Images

1. A non-transitory Computer-readable media having computer-executable instructions embodied thereon that when executed provide a method for facilitating decision support by determining nomenclature linkages between variables in databases that have different ontologies, the method comprising:

identifying a first set of documents from a first record system having a first ontology;

identifying a second set of documents from a second record system having a second ontology that is different than the first ontology;

determining a use-case present in the first and second sets of documents;

determining a set of variables relevant to the use-case;

receiving from the first set of documents, a first document containing at least one first-document variable from the set of variables;

wherein each first-document variable has a first-document value associated with it;

receiving from the second set of documents, a second document containing at least one second-document variable from the set of variables;

(1) wherein the second-document variable has a second-document value associated with it, and(2) wherein the second-document variable is also contained in the first document;

based on the determined use-case and set of variables, generating a decision-tree classifier;

for each first-document variable contained in the first document, applying the decision tree classifier to transform the first-document value associated with the first-document variable to a categorical datatype;

for each second-document variable contained in the second document, applying the decision tree classifier to transform the second-document value associated with the second-document variable to a categorical datatype;

based on the categorical datatypes of the first document and the categorical datatypes of the second document, generating a set of textmatrices;

applying latent semantic analysis to the set of textmatrices to determine a latent semantic space associated with the at least one first-document variable and the at least one second document variable;

specifying a threshold of similarity;

for a first comparison-variable, from the at least one first-document variables associated with the latent semantic space;

determining a measure of similarity to a second-comparison variable from the at least one second-document variables associated with the latent semantic space;

performing a comparison of the measure similarity to the threshold; and

based on the comparison, determining that the measure similarity satisfies the threshold, associating the first comparison variable with the second comparison variable, and designating the association as a synonymy, wherein the threshold is satisfied if the measure of similarity is greater than the threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods and computer-readable media are provided for facilitating patient health care by providing discovery, validation, and quality assurance of nomenclatural linkages between pairs of terms or combinations of terms in databases extant on multiple different health information systems that do not share a set of unified codesets, nomenclatures, or ontologies, or that may in part rely upon unstructured free-text narrative content instead of codes or standardized tags. Embodiments discover semantic structures existing naturally in documents and records, including relationships of synonymy and polysemy between terms arising from disparate processes, and maintained by different information systems. In some embodiments, this process is facilitated by applying Latent Semantic Analysis in concert with decision-tree induction and similarity metrics. In some embodiments, data is re-mined and regression testing is applied to new mappings against an existing mapping base, thereby permitting these embodiments to “learn” ontology mappings as clinical, operational, or financial patterns evolve.

66 Citations

View as Search Results

20 Claims

1. A non-transitory Computer-readable media having computer-executable instructions embodied thereon that when executed provide a method for facilitating decision support by determining nomenclature linkages between variables in databases that have different ontologies, the method comprising:
- identifying a first set of documents from a first record system having a first ontology;
  
  identifying a second set of documents from a second record system having a second ontology that is different than the first ontology;
  
  determining a use-case present in the first and second sets of documents;
  
  determining a set of variables relevant to the use-case;
  
  receiving from the first set of documents, a first document containing at least one first-document variable from the set of variables;
  
  wherein each first-document variable has a first-document value associated with it;
  
  receiving from the second set of documents, a second document containing at least one second-document variable from the set of variables;
  
  (1) wherein the second-document variable has a second-document value associated with it, and(2) wherein the second-document variable is also contained in the first document;
  
  based on the determined use-case and set of variables, generating a decision-tree classifier;
  
  for each first-document variable contained in the first document, applying the decision tree classifier to transform the first-document value associated with the first-document variable to a categorical datatype;
  
  for each second-document variable contained in the second document, applying the decision tree classifier to transform the second-document value associated with the second-document variable to a categorical datatype;
  
  based on the categorical datatypes of the first document and the categorical datatypes of the second document, generating a set of textmatrices;
  
  applying latent semantic analysis to the set of textmatrices to determine a latent semantic space associated with the at least one first-document variable and the at least one second document variable;
  
  specifying a threshold of similarity;
  
  for a first comparison-variable, from the at least one first-document variables associated with the latent semantic space;
  
  determining a measure of similarity to a second-comparison variable from the at least one second-document variables associated with the latent semantic space;
  
  performing a comparison of the measure similarity to the threshold; and
  
  based on the comparison, determining that the measure similarity satisfies the threshold, associating the first comparison variable with the second comparison variable, and designating the association as a synonymy, wherein the threshold is satisfied if the measure of similarity is greater than the threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computer-readable media of claim 1, wherein the measure of similarity is determined using Salton'"'"'s cosine.
  - 3. The computer-readable media of claim 2, wherein the threshold is specified as 0.62.
  - 4. The computer-readable media of claim 2, wherein the threshold is specified as 0.8 and further wherein the association is designated as a strong synonymy.
  - 5. The computer-readable media of claim 1, wherein the measure of similarity is determined using Pearson'"'"'s correlation coefficient.
  - 6. The computer-readable media of claim 1, wherein the first record system is a first electronic health record system for a first hospital, and the second record system is a second electronic health record system for a second hospital.
  - 7. The computer-readable media of claim 1, wherein the first record system is designated as a “
    - gold standard.”
  - 8. The computer-readable media of claim 1, wherein applying latent semantic analysis includes singular value decomposition.
  - 9. The computer-readable media of claim 1, wherein the first document comprises a set of records from the first record system, and the second document comprises a set of records from the second record system.
  - 10. The computer-readable media of claim 1, further comprising displaying to a user the first comparison variable and the second comparison variable as a designated synonymy.
  - 11. The computer-readable media of claim 10, further comprising receiving an indication from the user confirming or rejecting the designated synonymy.

12. A method for facilitating decision support by determining nomenclature linkages between variables in databases having different ontologies, the method comprising:
- identifying a first set of documents from a first record system having a first ontology;
  
  identifying a second set of documents from a second record system having a second ontology that is different than the first ontology;
  
  determining a use-case present in the first and second sets of documents;
  
  determining a set of variables relevant to the use-case;
  
  receiving from the first set of documents, a first document containing at least one first-document variable from the set of variables;
  
  wherein each first-document variable has a first-document value associated with it;
  
  receiving from the second set of documents, a second document containing at least one second-document variable from the set of variables;
  
  (1) wherein the second-document variable has a second-document value associated with it, and(2) wherein the second-document variable is also contained in the first document;
  
  based on the determined use-case and set of variables, generating a decision-tree classifier;
  
  for each first-document variable contained in the first document, applying the decision tree classifier to transform the first-document value associated with the first-document variable to a categorical datatype;
  
  for each second-document variable contained in the second document, applying the decision tree classifier to transform the second-document value associated with the second-document variable to a categorical datatype;
  
  based on the categorical datatypes of the first document and the categorical datatypes of the second document, generating a set of textmatrices;
  
  applying latent semantic analysis to the set of textmatrices to determine a latent semantic space associated with the at least one first-document variable and the at least one second document variable;
  
  specifying a threshold of similarity;
  
  for a first comparison-variable, from the at least one first-document variables associated with the latent semantic space;
  
  determining a measure of similarity to a second-comparison variable from the at least one second-document variables associated with the latent semantic space;
  
  performing a comparison of the measure similarity to the threshold; and
  
  based on the comparison, determining that the measure similarity satisfies the threshold, associating the first comparison variable with the second comparison variable, and designating the association as a synonymy, wherein the threshold is satisfied if the measure of similarity is greater than the threshold.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The method of claim 12, wherein the measure of similarity is determined using Salton'"'"'s cosine.
  - 14. The method of claim 13, wherein the threshold is specified as 0.62.
  - 15. The method of claim 12, wherein the measure of similarity is determined using Pearson'"'"'s correlation coefficient.
  - 16. The method of claim 12, wherein the first record system is a first electronic health record system for a first hospital, and the second record system is a second electronic health record system for a second hospital.
  - 17. The method of claim 12, wherein applying latent semantic analysis includes singular value decomposition.
  - 18. The method of claim 12, wherein the first document comprises a set of records from the first record system, and the second document comprises a set of records from the second record system.

19. A non-transitory Computer-readable media having computer-executable instructions embodied thereon that when executed provide a method for discovering and validating latent relationships in data, the method comprising:
- determining a use-case context;
  
  determining a plurality of variables associated with the use case;
  
  based on the determined context and plurality of variables, generating a decision-tree classifier;
  
  receiving a plurality of documents from two or more record-keeping systems;
  
  wherein each document contains one or more document variables of the determined plurality of variables, and wherein the received plurality of documents comprises a set of documents;
  
  for each document in the set, applying the decision tree classifier to transform a value associated with each variable contained in the document into a categorical datatype;
  
  based on the set of documents, generating a set of textmatricies;
  
  applying latent semantic analysis to the set of textmatrices to determine a latent semantic space;
  
  specifying a threshold of similarity;
  
  for a first-document variable, from a first document, associated with the latent semantic space;
  
  determining a measure of similarity to a second-document variable, from a second document, associated with the latent semantic space;
  
  performing a comparison of the measure similarity to the threshold; and
  
  based on the comparison, determining that the measure similarity satisfies the threshold, associating the first-document variable with the second-document variable, and designating the association as a synonymy, wherein the threshold is satisfied if the measure of similarity is greater than the threshold.
- View Dependent Claims (20)
- - 20. The computer-readable media of claim 19, wherein the measure of similarity is determined using Salton'"'"'s cosine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerner Innovation, Inc. (Oracle Corporation)
Original Assignee
Cerner Innovation, Inc. (Oracle Corporation)
Inventors
McNair, Douglas S., Murrish, John Christopher, Kailasam, Kanakasabha
Primary Examiner(s)
LE, DEBBIE M

Application Number

US13/645,896
Time in Patent Office

732 Days
Field of Search

707/756, 707/755, 707/791, 707/740
US Class Current

707/756
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/3347   using vector based model

G06F 16/367   Ontology

G06F 16/93   Document management systems

G06N 5/02   Knowledge representation; S...

Ontology mapper

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

66 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Ontology mapper

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

66 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others