AUTOMATED ENTITY CORRELATION AND CLASSIFICATION ACROSS HETEROGENEOUS DATASETS
First Claim
Patent Images
1. A method comprising:
- receiving, by a computing system, a data set from one or more data sources;
creating, by the computing system, a normalized data set by normalizing the data set based on a format of the data set;
identifying, by the computing system, a set of patterns for a set of entities in the normalized data set;
extracting, by the computing system, based on the set of patterns, entity information corresponding to the set of entities in the normalized data set;
determining, by the computing system, a classification of the set of entities using the entity information;
generating, by the computing system, based on the classification and the entity information, a transformed data set for the set of entities in the normalized data set; and
rendering a graphical interface that displays the transformed data set and information indicating a transformation to the normalized data set to generate the transformed data set.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure describes techniques for entity classification and data enrichment of data sets. A data enrichment system is disclosed that can extract, repair, and enrich datasets, resulting in more precise entity resolution and classification for purposes of subsequent indexing and clustering. Disclosed techniques may include performing entity recognition to identify segments of interest that relate to an entity. Related data may be analyzed for classification, which can be used to transform the data for enrichment to its users.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving, by a computing system, a data set from one or more data sources; creating, by the computing system, a normalized data set by normalizing the data set based on a format of the data set; identifying, by the computing system, a set of patterns for a set of entities in the normalized data set; extracting, by the computing system, based on the set of patterns, entity information corresponding to the set of entities in the normalized data set; determining, by the computing system, a classification of the set of entities using the entity information; generating, by the computing system, based on the classification and the entity information, a transformed data set for the set of entities in the normalized data set; and rendering a graphical interface that displays the transformed data set and information indicating a transformation to the normalized data set to generate the transformed data set. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A data enrichment system comprising:
-
a plurality of data sources; and a cloud computing infrastructure system comprising; one or more processors communicatively coupled to the plurality of data sources over at least one communication network; and a memory coupled to the one or more processors, the memory storing instructions to provide a data enrichment service, wherein the instructions, when executed by the one or more processors, cause the one or more processors to; receive a data set from one or more data sources of the plurality of data sources; create a normalized data set by normalizing the data set based on a format of the data set; identify a set of patterns for a set of entities in the normalized data set; extract, based on the set of patterns, entity information corresponding to the set of entities in the normalized data set; determine a classification of the set of entities using the entity information; generate, based on the classification and the entity information, a transformed data set for the set of entities in the normalized data set; and render a graphical interface that displays the transformed data set and information indicating a transformation to the normalized data set to generate the transformed data set. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer readable storage medium including instructions stored thereon which, when executed by one or more processors, cause the one or more processors to:
-
receive, by a computing system, a data set from one or more data sources; create, by the computing system, a normalized data set by normalizing the data set based on a format of the data set; identify, by the computing system, a set of patterns for a set of entities in the normalized data set; extract, by the computing system, based on the set of patterns, entity information corresponding to the set of entities in the normalized data set; determine, by the computing system, a classification of the set of entities using the entity information; generate, by the computing system, based on the classification and the entity information, a transformed data set for the set of entities in the normalized data set; and render a graphical interface that displays the transformed data set and information indicating a transformation to the normalized data set to generate the transformed data set. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification