×

Techniques for similarity analysis and data enrichment using knowledge sources

  • US 10,210,246 B2
  • Filed: 09/24/2015
  • Issued: 02/19/2019
  • Est. Priority Date: 09/26/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving, by a cloud computing infrastructure system of a data enrichment system, an input data set from one or more input data sources, wherein the data enrichment system comprises a user experience layer configured to provide access to the data enrichment system and a scheduler service configured to manage requests and responses received through the user experience layer and configured to manage the cloud computing infrastructure system;

    comparing, by the cloud computing infrastructure system of the data enrichment system the input data set to one or more reference data sets obtained from a reference source;

    computing, by the cloud computing infrastructure system, a similarity metric for each of the one or more reference data sets, the similarity metric indicating a measure of similarity of each of the one or more reference data sets in comparison to the input data set, wherein the similarity metric is a matching score computed for each of the one or more reference data sets with respect to the input data set, and wherein the similarity metric is computed as a value based on cardinality of an intersection of the one or more reference data sets in comparison to the input data set wherein the value is normalized by the cardinality, and wherein the value is reduced by a first factor based on a size of the one or more reference data sets, and the value is reduced by a second factor based on a type of the one or more reference data sets;

    identifying, by the cloud computing infrastructure system, a match between the input data set and the one or more reference data sets based on the similarity metric;

    generating, by an interactive visualization system of the cloud computing infrastructure system, an interactive graphical interface that indicates the similarity metric computed for each of the one or more reference data sets and that indicates the match identified between the input data set and the one or more reference data sets in order to visually identify the one or more reference data sets having a highest similarity metric- with respect to the input data set; and

    rendering, using the interactive graphical interface, a graphical visualization that indicates the similarity metric computed for each of the one or more reference data sets and that indicates the match identified between the input data set and the one or more reference data sets in order to identify the matching one or more reference data sets in order to perform large scale data enrichment while reducing load on resources of the cloud computing infrastructure system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×