Automatic match tuning

US 20050278139A1
Filed: 05/28/2004
Published: 12/15/2005
Est. Priority Date: 05/28/2004
Status: Abandoned Application

First Claim

Patent Images

1. A computer program product, tangibly embodied in an information carrier, for identifying matches between disparate schemas, the computer program product being operable to cause data processing apparatus to:

calculate a degree of similarity between elements of two schemas using each of a plurality of matching processes;

combine the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity, with the first weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; and

tune the weighting coefficients using information relating to a predicted degree of matching accuracy associated with the first weighting vector.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus, including computer program products, for identifying matches between disparate schemas calculates a degree of similarity between elements of two schemas using each of multiple matching processes. The calculated degrees of similarity are combined using a first weighting vector to produce first combined degrees of similarity. The first weighting vector includes multiple weighting coefficients and each weighting coefficient corresponds to one of the matching processes. The weighting coefficients are tuned using information relating to a predicted degree of matching accuracy associated with the first weighting vector.

Citations

25 Claims

1. A computer program product, tangibly embodied in an information carrier, for identifying matches between disparate schemas, the computer program product being operable to cause data processing apparatus to:
- calculate a degree of similarity between elements of two schemas using each of a plurality of matching processes;
  
  combine the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity, with the first weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; and
  
  tune the weighting coefficients using information relating to a predicted degree of matching accuracy associated with the first weighting vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer program product of claim 1 wherein:
    - the calculated degrees of similarity are combined using each of a plurality of weighting vectors, with each weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; and
      
      the weighting coefficients are tuned by determining, using the combined degrees of similarity for each of the plurality of weighting vectors, a predicted degree of matching accuracy associated with each of the plurality of weighting vectors and selecting a second weighting vector to determine possible matches between the elements of the two schemas, with the second weighting vector selected based on a comparison of information relating to the respective predicted degrees of matching accuracy associated with the first weighting vector and the second weighting vector.
  - 3. The computer program product of claim 2 wherein each predicted degree of matching accuracy is determined using at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches.
  - 4. The computer program product of claim 1 wherein the weighting coefficients are tuned by:
    - identifying a set of possible matches between the elements of the two schemas based on the first combined degrees of similarity;
      
      receiving user feedback relating to a subset of the possible matches and using the user feedback to produce the information relating to a predicted degree of matching accuracy associated with the first weighting vector; and
      
      modifying the first weighting vector based on the information relating to the predicted degree of matching accuracy to produce a second weighting vector.
  - 5. The computer program product of claim 4, with the computer program product being operable to cause data processing apparatus to further:
    - combine the calculated degrees of similarity using the second weighting vector to produce second combined degrees of similarity; and
      
      identify a modified set of possible matches between the elements of the two schemas based on the second combined degrees of similarity.
  - 6. The computer program product of claim 1 wherein the calculated degrees of similarity are combined by multiplying each calculated degree of similarity for each matching process by the corresponding weighting coefficient to obtain weighted degrees of similarity and summing the weighted degrees of similarity.
  - 7. The computer program product of claim 1 wherein a degree of similarity is calculated between multiple pairs of elements, with each pair of elements having one element selected from a source schema and one element selected from a target schema.

8. A method for identifying matches between disparate schemas, the method comprising:
- calculating a degree of similarity between elements of two schemas using each of a plurality of matching processes;
  
  combining the calculated degrees of similarity using each of a plurality of weighting vectors, with each weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes;
  
  determining, using the combined degrees of similarity, a level of ambiguity for each weighting vector; and
  
  selecting a particular weighting vector to determine possible matches between the elements of the two schemas, wherein the particular weighting vector is selected based on the level of ambiguity for each weighting vector.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 9. The method of claim 8 wherein determining a level of ambiguity comprises determining at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches and the particular weighting vector is selected based on at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches.
  - 10. The method of claim 9 further comprising:
    - for each weighting vector, calculating a factor using at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches; and
      
      wherein selecting the particular weighting vector is based on a value of the factor for the particular weighting vector relative to values of the factors for others of the plurality of weighting vectors.
  - 11. The method of claim 10 wherein selecting the particular weighting vector based on the value of the factor for the particular weighting vector comprises selecting, as the particular weighting vector, a weighting vector having a factor that tends to indicate one of a relatively high number of ambiguous matches or a relatively high number of unambiguous matches.
  - 12. The method of claim 10 wherein selecting the particular weighting vector based on the value of the factor for the particular weighting vector comprises selecting, as the particular weighting vector, a weighting vector having a factor that tends to indicate at least one of a relatively low number of ambiguous matches, a relatively low number of impossible matches, or a relatively low number of unambiguous matches.
  - 13. The method of claim 12 wherein selecting the particular weighting vector based on the value of the factor for the particular weighting vector comprises selecting, as the particular weighting vector, a weighting vector having a factor that tends to indicate a relatively low number of ambiguous matches and a relatively low number of impossible matches.
  - 14. The method of claim 10 wherein selecting the particular weighting vector further comprises:
    - selecting a candidate weighting vector; and
      
      tuning the candidate weighting vector by modifying the weighting coefficients for the candidate weighting vector to produce the particular weighting vector, wherein the factor for the particular weighting vector indicates a favorable weighting relative to the factor for the candidate weighting vector.
  - 15. The method of claim 9 wherein determining the number of unambiguous matches comprises one of:
    - identifying, as representing an unambiguous match for a particular element, a maximum combined degree of similarity for the particular element;
      
      or identifying, as representing an unambiguous match for a particular element, a combined degree of similarity for the particular element that exceeds a predetermined threshold and that exceeds all other combined degrees of similarity for the particular element by at least a predetermined amount.
  - 16. The method of claim 9 wherein determining the number of ambiguous matches comprises at least one of:
    - identifying, as representing an ambiguous match for a particular element, a combined degree of similarity for the particular element that exceeds a first threshold and is less than a second threshold;
      
      or identifying, as representing an ambiguous match for a particular element, a combined degree of similarity for the particular element that exceeds a predetermined threshold and that is within a predetermined range of other combined degrees of similarity for the particular element.
  - 17. The method of claim 9 wherein determining the number of impossible matches comprises identifying an impossible match by determining, for a particular element, that no combined degree of similarity for the particular element exceeds a predetermined minimum threshold.
  - 18. The method of claim 8 wherein the plurality of matching processes include matching criteria selected from the group consisting of schema-based criteria, content-based criteria, per-element criteria, structural criteria, linguistic criteria, and constraint-based criteria.
  - 19. The method of claim 8 further comprising:
    - determining a set of possible matches between the elements of the two schemas using the combined degrees of similarity for the particular weighting vector;
      
      receiving user feedback relating to a subset of the possible matches;
      
      tuning the particular weighting vector based on the user feedback;
      
      combining the calculated degrees of similarity using the tuned weighting vector; and
      
      determining a new set of possible matches between the elements of the two schemas using the combined degrees of similarity for the tuned weighting vector.

20. A method for identifying matches between disparate schemas, the method comprising:
- calculating a degree of similarity between elements of two schemas using each of a plurality of matching processes;
  
  combining the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity, with the first weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes;
  
  identifying a set of possible matches between the elements of the two schemas based on the first combined degrees of similarity;
  
  receiving user feedback relating to a subset of the possible matches;
  
  modifying the first weighting vector based on the user feedback to produce a second weighting vector;
  
  combining the calculated degrees of similarity using the second weighting vector to produce second combined degrees of similarity; and
  
  identifying a modified set of possible matches between the elements of the two schemas based on the second combined degrees of similarity.
- View Dependent Claims (21)
- - 21. The method of claim 20 wherein the first weighting vector comprises one of a plurality of weighting vectors and modifying the first weighting vector based on the user feedback comprises adjusting the first weighting vector to incorporate weighting features of another of the plurality of weighting vectors selected based on the user feedback.

22. A system for identifying matches between disparate schemas, the system comprising:
- means for calculating a degree of similarity between elements of two schemas using each of a plurality of matching processes;
  
  means for combining the calculated degrees of similarity using a first weighting vector to produce first combined degrees of similarity, with the first weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes; and
  
  means for tuning the weighting coefficients using information relating to a predicted degree of matching accuracy associated with the first weighting vector.
- View Dependent Claims (23, 24, 25)
- - 23. The system of claim 22 wherein the means for combining the calculated degrees of similarity is operable to combine the calculated degrees of similarity using each of a plurality of weighting vectors, with each weighting vector including a plurality of weighting coefficients and each weighting coefficient corresponding to one of the plurality of matching processes, and the means for tuning comprises:
    - means for determining, using the combined degrees of similarity for each of the plurality of weighting vectors, at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches; and
      
      means for selecting a second weighting vector to determine possible matches between the elements of the two schemas, wherein the second weighting vector is selected based on a comparison of information relating to a predicted degree of accuracy associated with each of the first weighting vector and the second weighting vector, with the information relating to the predicted degree of accuracy determined using at least one quantity selected from the group consisting of a number of ambiguous matches, a number of unambiguous matches, and a number of impossible matches.
  - 24. The system of claim 22 wherein the means for tuning comprises:
    - means for identifying a set of possible matches between the elements of the two schemas based on the first combined degrees of similarity;
      
      means for receiving user feedback relating to a subset of the possible matches and using the user feedback to produce the information relating to a predicted degree of matching accuracy associated with the first weighting vector; and
      
      means for modifying the first weighting vector based on the information relating to the predicted degree of matching accuracy to produce a second weighting vector, the system further comprising;
      
      means for combining the calculated degrees of similarity using the second weighting vector to produce second combined degrees of similarity; and
      
      means for identifying a modified set of possible matches between the elements of the two schemas based on the second combined degrees of similarity.
  - 25. The system of claim 22 wherein the first weighting vector is selected based on at least one selected from the group consisting of a context associated with the two schemas and a similarity of at least one of the schema to schema for which the first weighting vector was previously used.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SAP AG (SAP SE)
Original Assignee
SAP AG (SAP SE)
Inventors
Stuhec, Gunther, Glaenzer, Helmut K.

Application Number

US10/856,694
Publication Number

US 20050278139A1
Time in Patent Office

Days
Field of Search
US Class Current

702/179
CPC Class Codes

G06F 18/254   of classification results, ...

G06F 40/16   Automatic learning of trans...

G06F 40/194   Calculation of difference b...

Automatic match tuning

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic match tuning

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links