Quality analysis on object notation data sources

US 10,324,981 B2
Filed: 10/13/2015
Issued: 06/18/2019
Est. Priority Date: 10/13/2015
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema;

receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store;

for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema;

determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value;

for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;

(i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema;

for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;

(i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema;

determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas; and

selecting the first data store to supply data based, at least in part, upon the first data store variance value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Determination of a degree of similarity among and between a set of text notation schema instances. One type of text notation schema instance is the JSON type. In some embodiments, the degree of similarity is expressed as a schema variance value which is determined by individually comparing the schema instances of the set of text notation schema instances to a representative majority schema. Also, determining a quality of a data source associated with the plurality of text notation schema instances based, at least in part, upon the similarity value.

7 Citations

3 Claims

1. A method comprising:
- receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema;
  
  receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store;
  
  for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema;
  
  determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value;
  
  for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;
  
  (i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema;
  
  for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;
  
  (i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema;
  
  determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas; and
  
  selecting the first data store to supply data based, at least in part, upon the first data store variance value.

2. A computer program product (CPP) comprising:
- a machine readable storage device; and
  
  computer code stored on the machine readable storage device, with the computer code including instructions for causing a processor(s) set to perform operations including the following;
  
  receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema,receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store,for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema,determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value,for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;
  
  (i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema,for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;
  
  (i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema,determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas, andselecting the first data store to supply data based, at least in part, upon the first data store variance value.

3. A computer system (CS) comprising:
- a processor(s) set;
  
  a machine readable storage device; and
  
  computer code stored on the machine readable storage device, with the computer code including instructions for causing the processor(s) set to perform operations including the following;
  
  receiving a plurality of JSON schemas used to store some of the data in a first data store, with each JSON schema of the plurality JSON schemas respectively corresponding to a set of attributes and attribute types for parsing JSON data objects according the respectively corresponding JSON schema,receiving a first JSON data store data set including information indicative of attribute values, attribute types and attribute organization of first plurality of JSON data objects stored in a first JSON data store,for each given JSON schema of the plurality of JSON schemas, determining a proportion value corresponding to a proportion of the first plurality of JSON objects that conform to the given JSON schema,determining a first majority JSON schema as a JSON schema of the first plurality of JSON schemas that has the largest respectively corresponding proportion value,for each given JSON schema of the plurality of JSON schemas, determining a first similarity score between the given JSON schema and the first majority JSON schema, with;
  
  (i) the first similarity score of the first majority JSON schema being one, and (ii) first similarity scores for each given JSON schema other than the first majority JSON schema is based upon a degree of similarity between the given JSON schema and the first majority JSON schema,for each given JSON schema of the plurality of JSON schemas, determining a product corresponding to the given JSON schema, with the product being a product of;
  
  (i) the proportion value of the given JSON schema, multiplied by (ii) the first similarity score of the given JSON schema,determining a first data store variance value by summing the products respectively corresponding to the JSON schema of the plurality of JSON schemas, andselecting the first data store to supply data based, at least in part, upon the first data store variance value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Ayyagari, Phani Kumar V. U., Bhide, Manish A., Eshwar, Bhavani K., Jasti, Purnachandra R.
Primary Examiner(s)
Ly, Cheyne D

Application Number

US14/881,202
Publication Number

US 20170102923A1
Time in Patent Office

1,344 Days
Field of Search

707769, 707791, 707792, 707803, 717116
US Class Current
CPC Class Codes

G06F 16/80 of semi-structured data, e....

G06F 9/45504 Abstract machines for progr...

Quality analysis on object notation data sources

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

7 Citations

3 Claims

Specification

Solutions

Use Cases

Quick Links

Quality analysis on object notation data sources

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

3 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links