System and method for ontology induction through statistical profiling and reference schema matching
First Claim
1. A method for use with a data integration or other computing environment comprising:
- receiving input defining one or more schemas;
accessing the one or more schemas to obtain one or more entity definitions associated with entities provided by the reference of one or more schemas;
generating a sample data for the one or more entities from the one or more schemas;
profiling the sample data to determine one or more metrics associated with the sample data;
generating one or more rules based on the entity definitions; and
generating a functional type system based on the generated one or more rules, for use in processing a data input.
4 Assignments
0 Petitions
Accused Products
Abstract
In accordance with various embodiments, described herein is a system (Data Artificial Intelligence system, Data AI system), for use with a data integration or other computing environment, that leverages machine learning (ML, DataFlow Machine Learning, DFML), for use in managing a flow of data (dataflow, DF), and building complex dataflow software applications (dataflow applications, pipelines). In accordance with an embodiment, the system can perform an ontology analysis of a schema definition, to determine the types of data, and datasets or entities, associated with that schema; and generate, or update, a model from a reference schema that includes an ontology defined based on relationships between datasets or entities, and their attributes. A reference HUB including one or more schemas can be used to analyze data flows, and further classify or make recommendations such as, for example, transformations enrichments, filtering, or cross-entity data fusion of an input data.
77 Citations
20 Claims
-
1. A method for use with a data integration or other computing environment comprising:
-
receiving input defining one or more schemas; accessing the one or more schemas to obtain one or more entity definitions associated with entities provided by the reference of one or more schemas; generating a sample data for the one or more entities from the one or more schemas; profiling the sample data to determine one or more metrics associated with the sample data; generating one or more rules based on the entity definitions; and generating a functional type system based on the generated one or more rules, for use in processing a data input. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for ontology analysis of a schema definition for use with a data integration or other computing environment, comprising:
one or more processors operable to; receiving input defining one or more schemas; accessing the one or more schemas to obtain one or more entity definitions associated with entities provided by the reference of one or more schemas; generating a sample data for the one or more entities from the one or more schemas; profiling the sample data to determine one or more metrics associated with the sample data; generating one or more rules based on the entity definitions; and generating a functional type system based on the generated one or more rules, for use in processing a data input. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer readable storage medium, including instructions stored thereon which when read and executed by one or more computers cause the one or more computers to perform a method comprising:
-
receiving input defining one or more schemas; accessing the one or more schemas to obtain one or more entity definitions associated with entities provided by the reference of one or more schemas; generating a sample data for the one or more entities from the one or more schemas; profiling the sample data to determine one or more metrics associated with the sample data; generating one or more rules based on the entity definitions; and generating a functional type system based on the generated one or more rules, for use in processing a data input. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification