Quality analysis on object notation data sources
First Claim
Patent Images
1. A method comprising:
- receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema;
receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store;
for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema;
determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value;
for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;
(i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema;
for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;
(i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema;
determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas; and
selecting the first data store to supply data based, at least in part, upon the first data store variance value.
1 Assignment
0 Petitions
Accused Products
Abstract
Determination of a degree of similarity among and between a set of text notation schema instances. One type of text notation schema instance is the JSON type. In some embodiments, the degree of similarity is expressed as a schema variance value which is determined by individually comparing the schema instances of the set of text notation schema instances to a representative majority schema. Also, determining a quality of a data source associated with the plurality of text notation schema instances based, at least in part, upon the similarity value.
7 Citations
3 Claims
-
1. A method comprising:
-
receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema; receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store; for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema; determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value; for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;
(i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema;for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;
(i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema;determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas; and selecting the first data store to supply data based, at least in part, upon the first data store variance value.
-
-
2. A computer program product (CPP) comprising:
-
a machine readable storage device; and computer code stored on the machine readable storage device, with the computer code including instructions for causing a processor(s) set to perform operations including the following; receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema, receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store, for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema, determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value, for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;
(i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema,for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;
(i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema,determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas, and selecting the first data store to supply data based, at least in part, upon the first data store variance value.
-
-
3. A computer system (CS) comprising:
-
a processor(s) set; a machine readable storage device; and computer code stored on the machine readable storage device, with the computer code including instructions for causing the processor(s) set to perform operations including the following; receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema, receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store, for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema, determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value, for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;
(i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema,for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;
(i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema,determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas, and selecting the first data store to supply data based, at least in part, upon the first data store variance value.
-
Specification