Pluggable storage system for parallel query engines across non-native file systems

US 9,984,083 B1
Filed: 03/29/2013
Issued: 05/29/2018
Est. Priority Date: 02/25/2013
Status: Active Grant

First Claim

Patent Images

1. A method for managing data, comprising:

receiving, by one or more processors, a query from a client via one or more networks;

based on the received query, analyzing a catalog, which stores mappings of file names and file locations, for location information, wherein the catalog is associated with a universal namenode that provides a single namespace for accessing a plurality of files stored across a plurality of storage systems, and wherein the location information stored in connection with the catalog indicates a storage system on which a file is located among the plurality of storage systems;

based on the analysis, determining, by one or more processors, a first storage system of the plurality of storage systems, an associated first file system, an associated first protocol translator to use in connection with communication with the first storage system, a second storage system of the plurality of storage systems, an associated second file system, and an associated second protocol translator to use in connection with communication with the second storage system;

identifying, by one or more processors, a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system, and wherein a first portion of the query is performed on the first storage system and a second portion of the query is performed on the second storage system, wherein the first storage system is different from the second storage system, and wherein a first protocol used in connection with communication with the first storage system is different from a second protocol used in connection with communication with the second storage system;

running, by one or more processors, a first job on the first data using the associated first protocol translator, wherein the first job is not a native job of the first file system; and

running, by one or more processors, a second job on the second data using the associated second protocol translator, wherein the second job is not a native job of the second file system.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, article of manufacture, and apparatus for managing data. In some embodiments, this includes receiving a query from a client, based on the received query, analyzing a catalog for location information, based on the analysis, determining a first storage system, an associated first file system, an associated first protocol translator, a second storage system, an associated second file system, and an associated second protocol translator, identifying a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system, running a first job on the first data using the associated first protocol translator, wherein the first job is not a native job of the first file system, and running a second job on the second data using the associated second protocol translator, wherein the second job is not a native job of the second file system.

147 Citations

29 Claims

1. A method for managing data, comprising:
- receiving, by one or more processors, a query from a client via one or more networks;
  
  based on the received query, analyzing a catalog, which stores mappings of file names and file locations, for location information, wherein the catalog is associated with a universal namenode that provides a single namespace for accessing a plurality of files stored across a plurality of storage systems, and wherein the location information stored in connection with the catalog indicates a storage system on which a file is located among the plurality of storage systems;
  
  based on the analysis, determining, by one or more processors, a first storage system of the plurality of storage systems, an associated first file system, an associated first protocol translator to use in connection with communication with the first storage system, a second storage system of the plurality of storage systems, an associated second file system, and an associated second protocol translator to use in connection with communication with the second storage system;
  
  identifying, by one or more processors, a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system, and wherein a first portion of the query is performed on the first storage system and a second portion of the query is performed on the second storage system, wherein the first storage system is different from the second storage system, and wherein a first protocol used in connection with communication with the first storage system is different from a second protocol used in connection with communication with the second storage system;
  
  running, by one or more processors, a first job on the first data using the associated first protocol translator, wherein the first job is not a native job of the first file system; and
  
  running, by one or more processors, a second job on the second data using the associated second protocol translator, wherein the second job is not a native job of the second file system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method as recited in claim 1, wherein the associated first protocol translator is stored on the first storage system.
  - 3. The method as recited in claim 1, wherein the associated second protocol translator is stored on the second storage system.
  - 4. The method as recited in claim 1, further comprising running the first job on the second data.
  - 5. The method as recited in claim 4, further comprising running the second job on the first data.
  - 6. The method as recited in claim 5, wherein the second job is not a native job of the first file system.
  - 7. The method as recited in claim 4, wherein the first job is not a native job of the second file system.
  - 8. The method of claim 1, wherein the query queries the universal namenode that is associated with the plurality of storage systems.
  - 9. The method of claim 8, wherein the universal namenode serves as a domain that unifies respective domains of the plurality of storage systems, and wherein the query does not specify the respective domains of the corresponding ones of the plurality of storage systems associated with data relating to the query.
  - 10. The method of claim 9, wherein a response to the query is provided to the client, wherein the response to the query is presented as the single namespace corresponding to a namespace of the universal namenode.
  - 11. The method of claim 9, wherein a first file stored on the first storage system and a second file stored on the second storage system are identified as having a location in the single namespace in a manner in which a location of the first file on the first storage system and location of the second file on the second storage system are transparent to the client.
  - 12. The method of claim 1, wherein the first portion of the query includes running the first job on the first data, and wherein the second portion of the query includes running the second job on the second data.
  - 13. The method of claim 1, wherein the first storage system and the second storage system reside under the universal namenode.
  - 14. The method of claim 1, further comprising:
    - in the event that the file is moved from the first storage system to the second storage system, updating an entry in the catalog corresponding to the file to indicate a location of the file as being the second storage system.
  - 15. The method of claim 1, wherein the first protocol translator and the second protocol translator are used by the universal namenode to respectively communicate with the first storage system and the second storage system, and wherein the universal namenode is associated with the plurality of storage systems and is used in connection with processing the query.
  - 16. The method of claim 15, wherein the universal namenode tracks a status of the first job and the second job that are respectively associated with the query.

17. A system for managing data, comprising a processor configured to:
- receive a query from a client via one or more networks;
  
  based on the received query, analyze a catalog, which stores mappings of file names and file locations, for location information, wherein the catalog is associated with a universal namenode that provides a single namespace for accessing a plurality of files stored across a plurality of storage systems, and wherein the location information stored in connection with the catalog indicates a storage system on which a file is located among the plurality of storage systems;
  
  based on the analysis, determine a first storage system of the plurality of storage systems, an associated first file system, an associated first protocol translator to use in connection with communication with the first storage system, a second storage system of the plurality of storage systems, an associated second file system, and an associated second protocol translator to use in connection with communication with the second storage system;
  
  identify a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system, and wherein a first portion of the query is performed on the first storage system and a second portion of the query is performed on the second storage system, wherein the first storage system is different from the second storage system;
  
  run a first job on the first data using the associated first protocol translator, wherein the first job is not a native job of the first file system; and
  
  run a second job on the second data using the associated second protocol translator, wherein the second job is not a native job of the second file system.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The system as recited in claim 17, wherein the associated first protocol translator is stored on the first storage system.
  - 19. The system as recited in claim 17, wherein the associated second protocol translator is stored on the second storage system.
  - 20. The system as recited in claim 17, the processor further configured to run the first job on the second data.
  - 21. The system as recited in claim 18, the processor further configured to run the second job on the first data.
  - 22. The system as recited in claim 21, wherein the second job is not a native job of the first file system.
  - 23. The system as recited in claim 17, wherein the first job is not a native job of the second file system.

24. A computer program product for processing data, comprising a non-transitory computer readable medium having program instructions embodied therein for:
- receiving, by one or more processors, a query from a client via one or more networks;
  
  based on the received query, analyzing a catalog, which stores mappings of file names and file locations, for location information, wherein the catalog is associated with a universal namenode that provides a single namespace for accessing a plurality of files stored across a plurality of storage systems, and wherein the location information stored in connection with the catalog indicates a storage system on which a file is located among the plurality of storage systems;
  
  based on the analysis, determining, by one or more processors, a first storage system of the plurality of storage systems, an associated first file system, an associated first protocol translator to use in connection with communication with the first storage system, a second storage system of the plurality of storage systems, an associated second file system, and an associated second protocol translator to use in connection with communication with the second storage system;
  
  identifying, by one or more processors, a first data and a second data, wherein the first data is stored on the first storage system, and the second data is stored on the second storage system, and wherein a first portion of the query is performed on the first storage system and a second portion of the query is performed on the second storage system, wherein the first storage system is different from the second storage system, and wherein a first protocol used in connection with communication with the first storage system is different from a second protocol used in connection with communication with the second storage system;
  
  running, by one or more processors, a first job on the first data using the associated first protocol translator, wherein the first job is not a native job of the first file system; and
  
  running, by one or more processors, a second job on the second data using the associated second protocol translator, wherein the second job is not a native job of the second file system.
- View Dependent Claims (25, 26, 27, 28, 29)
- - 25. The computer program product as recited in claim 24, wherein the associated first protocol translator is stored on the first storage system.
  - 26. The computer program product as recited in claim 24, wherein the associated second protocol translator is stored on the second storage system.
  - 27. The computer program product as recited in claim 24, further comprising instructions for running the first job on the second data.
  - 28. The computer program product as recited in claim 27, further comprising instructions for running the second job on the first data.
  - 29. The computer program product as recited in claim 27, wherein the first job is not a native job of the second file system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Original Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Inventors
Tiwari, Sameer, Bhandarkar, Milind Arun, Mogal, Bhooshan Deepak
Primary Examiner(s)
Burke, Jeffrey A
Assistant Examiner(s)
Vu, Thong

Application Number

US13/853,479
Time in Patent Office

1,887 Days
Field of Search

707722, 707794, 707737, 707644, 707651, 707966, 707791, 707721, 707723, 707769, 707765, 707741, 707707, 707 9, 707 3, 707693, 707781, 709214, 709220, 709230, 711209
US Class Current
CPC Class Codes

G06F 16/10   File systems; File servers

G06F 16/148   File search processing

G06F 16/182   Distributed file systems

G06F 16/24524   Access plan code generation...

G06F 16/24534   Query rewriting; Transforma...

Pluggable storage system for parallel query engines across non-native file systems

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

147 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Pluggable storage system for parallel query engines across non-native file systems

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

147 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links