Outputting map-reduce jobs to an archive file

US 10,146,779 B2
Filed: 06/26/2015
Issued: 12/04/2018
Est. Priority Date: 09/10/2014
Status: Expired due to Fees

First Claim

Patent Images

1. A processor-implemented method for outputting map-reduce jobs to an archive file, comprising:

providing, by a processor, an archive manager and exposing an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system;

using a buffering database as a temporary cache to buffer updates to the archive file;

handling by the archive manager calls from map-reduce jobs to allow;

reading directly from the archive file or from a job index in the buffering database; and

writing to the job index in the buffering database used as a temporary cache to buffer the updates;

outputting the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; and

wherein handling by the archive manager calls from map-reduce jobs further comprises;

receiving a write call for a task of a map-reduce job;

connecting to the buffering database;

looking up a unique token for a map-reduce job at a pending index provided at the buffering database; and

writing to the job index provided at the buffering database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Method and system are provided for writing output from map-reduce jobs to an archive file. The method may include providing an archive manager and exposing an interface to be called from map-reduce jobs to output to an archive file in a map-reduce distributed file system. The method may also include using a buffering database as a temporary cache to buffer updates to the archive file. Handling by the archive manager calls from map-reduce jobs may allow: reading directly from an archive file or from a job index at the buffering database; writing to a job index at the buffering database used as a temporary cache to buffer updates; and serializing updates from the buffering database to the archive file.

Citations

18 Claims

1. A processor-implemented method for outputting map-reduce jobs to an archive file, comprising:
- providing, by a processor, an archive manager and exposing an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system;
  
  using a buffering database as a temporary cache to buffer updates to the archive file;
  
  handling by the archive manager calls from map-reduce jobs to allow;
  
  reading directly from the archive file or from a job index in the buffering database; and
  
  writing to the job index in the buffering database used as a temporary cache to buffer the updates;
  
  outputting the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; and
  
  wherein handling by the archive manager calls from map-reduce jobs further comprises;
  
  receiving a write call for a task of a map-reduce job;
  
  connecting to the buffering database;
  
  looking up a unique token for a map-reduce job at a pending index provided at the buffering database; and
  
  writing to the job index provided at the buffering database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method as claimed in claim 1, wherein handling by the archive manager calls from map-reduce jobs further comprises:
    - receiving a read call for a task of a map-reduce job;
      
      connecting to the buffering database;
      
      looking up a unique token for the map-reduce job at a pending index and a committed index provided at the buffering database; and
      
      depending on the status of the job, either reading from the archive file or reading from the job index provided at the buffering database.
  - 3. The method as claimed in claim 1, wherein the archive manager manages access to the archive file, further comprises:
    - allowing only one map-reduce job to open the archive file for writing the updates at a time and committing the updates on completion of a job; and
      
      allowing reading from the archive file by all jobs but without allowing reading of uncommitted writes.
  - 4. The method as claimed in claim 1, further comprising:
    - maintaining a pending index at the buffering database to be accessed by a map-reduce job, wherein the pending index includes keys of archive file paths and values of unique tokens, wherein a unique token is allocated to a map-reduce job that has opened the archive file for writing; and
      
      the pending index including entries for archive files containing uncommitted updates buffered in the buffering database.
  - 5. The method as claimed in claim 1, further comprising:
    - maintaining a committed index at the buffering database to be accessed by a map-reduce job, wherein the committed index includes keys of archive file paths and values of unique tokens, wherein a unique token is allocated to a map-reduce job that has opened the archive file for writing; and
      
      the committed index including entries for archive files for which updates have been committed but not yet serialized to the archive file.
  - 6. The method as claimed in claim 1, further comprising:
    - serializing any committed updates buffered in the buffering database to the archive file, including mapping an archive file path name and the job index containing updates to the archive file.
  - 7. The method as claimed in claim 1, wherein handling calls from map-reduce jobs by the archive manager includes a map-reduce job for opening the archive file for writing further comprises:
    - connecting to the buffering database;
      
      creating a new unique token for the job and associating it with a path to the archive file; and
      
      creating a job index at the buffering database for the archive file to buffer updates to the archive file.
  - 8. The method as claimed in claim 1, wherein handling calls from map-reduce jobs by the archive manager includes a map-reduce job for committing changes to the archive file further comprises:
    - connecting to the buffering database;
      
      creating a serializing job to serialize updates buffered in the job index at the buffering database, to the archive file; and
      
      moving an entry for the archive path and unique job token to a committed index at the buffering database.
  - 9. The method as claimed in claim 1, wherein handling calls from map-reduce jobs by the archive manager includes a map-reduce job for rollback of changes to an archive file further comprises:
    - connecting to the buffering database; and
      
      removing an entry for the archive path and unique job token from a pending index at the buffering database.

10. A computer system for outputting map-reduce jobs to an archive file, the computer system comprising:
- one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising;
  
  an archive manager including an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system;
  
  a buffering database providing a temporary cache to buffer updates to the archive file;
  
  wherein the archive manager handles calls from map-reduce jobs to;
  
  read directly from the archive file or from a job index at the buffering database;
  
  write to the job index at the buffering database used as a temporary cache to buffer the updates;
  
  outputting the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; and
  
  wherein handling by the archive manager calls from map-reduce jobs further comprises;
  
  receiving a write call for a task of a map-reduce job;
  
  connecting to the buffering database;
  
  looking up a unique token for a map-reduce job at a pending index provided at the buffering database; and
  
  writing to the job index provided at the buffering database.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system as claimed in claim 10, wherein the buffering database includes the job index to buffer updates to the archive file for a job, and wherein the name of the job index is a unique token of an updating job.
  - 12. The system as claimed in claim 10, further comprising:
    - maintaining a pending index at the buffering database to be accessed by a map-reduce job, wherein the pending index includes keys of archive file paths and values of unique tokens, wherein a unique token is allocated to a map-reduce job that has opened the archive file for writing; and
      
      the pending index including entries for archive files containing uncommitted updates buffered in the buffering database.
  - 13. The system as claimed in claim 10 further comprising:
    - maintaining a committed index at the buffering database to be accessed by a map-reduce job, wherein the committed index includes keys of archive file paths and values of unique tokens, wherein a unique token is allocated to a map-reduce job that has opened the archive file for writing; and
      
      the committed index including entries for archive files for which updates have been committed but not yet serialized to the archive file.
  - 14. The system as claimed in claim 10, further comprising:
    - serializing any committed updates buffered in the buffering database to the archive file, including mapping an archive file path name and the job index containing updates to the archive file.

15. A computer program stored on a non-transitory computer readable medium and loadable into the internal memory of a digital computer, comprising software code portions, when said program is run on a computer, for performing a method for outputting map-reduce jobs to an archive file comprising:
- program instructions to provide an archive manager and expose an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system;
  
  program instructions to use a buffering database as a temporary cache to buffer updates to the archive file;
  
  program instructions to handle by the archive manager calls from map-reduce jobs to allow;
  
  program instructions to read directly from the archive file or from a job index in the buffering database; and
  
  program instructions to write to the job index in the buffering database used as a temporary cache to buffer the updates;
  
  program instructions to output the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; and
  
  wherein handling by the archive manager calls from map-reduce jobs further comprises;
  
  program instructions to receive a write call for a task of a map-reduce job;
  
  program instructions to connect to the buffering database;
  
  program instructions to look up a unique token for a map-reduce job at a pending index provided at the buffering database; and
  
  program instructions to write to the job index provided at the buffering database.
- View Dependent Claims (16, 17, 18)
- - 16. The computer program as claimed in claim 15, wherein handling by the archive manager calls from map-reduce jobs further comprises:
    - program instructions to receive a read call for a task of a map-reduce job;
      
      program instructions to connect to the buffering database;
      
      program instructions to look up a unique token for the map-reduce job at a pending index and a committed index provided at the buffering database; and
      
      depending on the status of the job, program instructions to either reading from the archive file or read from the job index provided at the buffering database.
  - 17. The computer program as claimed in claim 15, wherein the archive manager manages access to the archive file, further comprises:
    - program instructions to allow only one map-reduce job to open the archive file for writing the updates at a time and committing the updates on completion of a job; and
      
      program instructions to allow reading from the archive file by all jobs but without allowing reading of uncommitted writes.
  - 18. The computer program as claimed in claim 15, further comprising:
    - program instructions to maintain a pending index at the buffering database to be accessed by a map-reduce job, wherein the pending index includes keys of archive file paths and values of unique tokens,wherein a unique token is allocated to a map-reduce job that has opened the archive file for writing; and
      
      the pending index including entries for archive files containing uncommitted updates buffered in the buffering database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Browning, Curtis N., McCarroll, Niall F.
Primary Examiner(s)
Leroux, Etienne P
Assistant Examiner(s)
Glasser, Dara J

Application Number

US14/752,137
Publication Number

US 20160070711A1
Time in Patent Office

1,257 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/113   Details of archiving lifecy...

G06F 16/182   Distributed file systems

G06F 16/1865   Transactional file systems

Outputting map-reduce jobs to an archive file

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Outputting map-reduce jobs to an archive file

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links