COLLABORATIVE DATASET CONSOLIDATION VIA DISTRIBUTED COMPUTER NETWORKS

US 20170364569A1
Filed: 06/19/2016
Published: 12/21/2017
Est. Priority Date: 06/19/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving data representing a query into a collaborative dataset consolidation system, the dataset being associated with an identifier;

identifying datasets relevant to the query, the datasets being disposed in disparate data repositories;

determining a level of authorization associated with the identifier to access each of the datasets;

generating one or more queries based on the query to access the disparate data repositories;

retrieving data representing query results from the accessed disparate data repositories.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various embodiments relate generally to data science and data analysis, computer software and systems, and wired and wireless network communications to provide an interface between repositories of disparate datasets and computing machine-based entities that seek access to the datasets, and, more specifically, to a computing and data storage platform that facilitates consolidation of one or more datasets, whereby a collaborative data layer and associated logic facilitate, for example, efficient access to, and implementation of, collaborative datasets. In some examples, a method may include receiving data representing a query into a collaborative dataset consolidation system, identifying datasets relevant to the query, generating one or more queries to access disparate data repositories, and retrieving data representing query results. In some cases, one or more queries are applied (e.g., as a federated query) to atomized datasets stored in one or more atomized data stores, at least two of which may be different.

Citations

18 Claims

1. A method comprising:
- receiving data representing a query into a collaborative dataset consolidation system, the dataset being associated with an identifier;
  
  identifying datasets relevant to the query, the datasets being disposed in disparate data repositories;
  
  determining a level of authorization associated with the identifier to access each of the datasets;
  
  generating one or more queries based on the query to access the disparate data repositories;
  
  retrieving data representing query results from the accessed disparate data repositories.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein the datasets comprise atomized datasets.
  - 3. The method of claim 1 wherein the atomized datasets include subsets of linked data points.
  - 4. The method of claim 1 wherein retrieving the data representing the query results from the accessed disparate data comprises:
    - accessing an external repository that is external to the collaborative dataset consolidation system.
  - 5. The method of claim 1 wherein identifying the datasets relevant to the query comprises:
    - determining a subset of data attributes associated with the query; and
      
      retrieving a subset of atomized datasets that include data associated with one or more of the data attributes.
  - 6. The method of claim 5 wherein determining the subset of data attributes associated with the query comprises:
    - searching for a derived attribute as at least one of data attributes.
  - 7. The method of claim 6 further comprising:
    - analyzing a plurality of datasets associated with the collaborative dataset consolidation system to infer data representing the derived attribute.
  - 8. The method of claim 1 further comprising:
    - receiving data representing another query into the collaborative dataset consolidation system, the another query being associated with another identifier;
      
      identifying the datasets relevant to the another query; and
      
      denying access to datasets to perform the another query if the level of authorization is absent.
  - 9. The method of claim 1 further comprising:
    - receiving data representing another query into the collaborative dataset consolidation system, the another query being associated with another identifier;
      
      identifying the datasets relevant to the another query; and
      
      granting access to at least one dataset to perform the another query if the level of authorization is present.
  - 10. The method of claim 1 wherein generating the one or more queries comprises:
    - generating a federated query.
  - 11. The method of claim 10 wherein generating the federated query comprises:
    - querying disparate data stores.
  - 12. The method of claim 11 wherein querying the disparate data stores comprises:
    - querying different triplestores.

13. A method comprising:
- receiving a data file including a dataset into a collaborative dataset consolidation system;
  
  formatting the dataset to form a first atomized dataset including atomized data points each including data representing at least two objects and an association between the two objects;
  
  forming a second atomized dataset including the first atomized dataset and one or more other atomized datasets;
  
  receiving data representing a query into the collaborative dataset consolidation system, the query being associated with an identifier;
  
  identifying a subset of the second atomized dataset relevant to the query, wherein portions of the second atomized dataset are disposed in different data repositories;
  
  generating a plurality of sub-queries each of which is configured to access at least one of the different data repositories; and
  
  retrieving data representing query results from the at least one of the different data repositories.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13 wherein generating the plurality of sub-queries comprises:
    - classifying query portions.
  - 15. The method of claim 14 wherein classifying the query portions comprises:
    - identifying a classification type for a portion of the query.
  - 16. The method of claim 13 wherein the datasets comprise linked data points.
  - 17. The method of claim 16 wherein linked data points comprise triples.
  - 18. The method of claim 17 wherein at least one triple of the triples are formatted to comply with a Resource Description Framework (“
    - RDF”
      
      ) data model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Data.World, Inc.
Original Assignee
Data.World, Inc.
Inventors
Jacob, Bryon Kristen, Griffith, David Lee, Le, Triet Minh, Keen, Arthur Albert, Zelenak, Alexander John, Loyens, Jon, Hurt, Brett A., Reynolds, Shad William, Boutros, Joseph

Granted Patent

US 10,102,258 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/215 Improving data quality; Dat...

G06F 16/2471 Distributed queries

COLLABORATIVE DATASET CONSOLIDATION VIA DISTRIBUTED COMPUTER NETWORKS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

COLLABORATIVE DATASET CONSOLIDATION VIA DISTRIBUTED COMPUTER NETWORKS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links