Outputting map-reduce jobs to an archive file
First Claim
1. A processor-implemented method for outputting map-reduce jobs to an archive file, comprising:
- providing, by a processor, an archive manager and exposing an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system;
using a buffering database as a temporary cache to buffer updates to the archive file;
handling by the archive manager calls from map-reduce jobs to allow;
reading directly from the archive file or from a job index in the buffering database; and
writing to the job index in the buffering database used as a temporary cache to buffer the updates;
outputting the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; and
wherein handling by the archive manager calls from map-reduce jobs further comprises;
receiving a write call for a task of a map-reduce job;
connecting to the buffering database;
looking up a unique token for a map-reduce job at a pending index provided at the buffering database; and
writing to the job index provided at the buffering database.
1 Assignment
0 Petitions
Accused Products
Abstract
Method and system are provided for writing output from map-reduce jobs to an archive file. The method may include providing an archive manager and exposing an interface to be called from map-reduce jobs to output to an archive file in a map-reduce distributed file system. The method may also include using a buffering database as a temporary cache to buffer updates to the archive file. Handling by the archive manager calls from map-reduce jobs may allow: reading directly from an archive file or from a job index at the buffering database; writing to a job index at the buffering database used as a temporary cache to buffer updates; and serializing updates from the buffering database to the archive file.
-
Citations
18 Claims
-
1. A processor-implemented method for outputting map-reduce jobs to an archive file, comprising:
-
providing, by a processor, an archive manager and exposing an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system; using a buffering database as a temporary cache to buffer updates to the archive file; handling by the archive manager calls from map-reduce jobs to allow; reading directly from the archive file or from a job index in the buffering database; and writing to the job index in the buffering database used as a temporary cache to buffer the updates;
outputting the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; andwherein handling by the archive manager calls from map-reduce jobs further comprises; receiving a write call for a task of a map-reduce job; connecting to the buffering database; looking up a unique token for a map-reduce job at a pending index provided at the buffering database; and writing to the job index provided at the buffering database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system for outputting map-reduce jobs to an archive file, the computer system comprising:
-
one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising; an archive manager including an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system; a buffering database providing a temporary cache to buffer updates to the archive file; wherein the archive manager handles calls from map-reduce jobs to; read directly from the archive file or from a job index at the buffering database; write to the job index at the buffering database used as a temporary cache to buffer the updates; outputting the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; and wherein handling by the archive manager calls from map-reduce jobs further comprises; receiving a write call for a task of a map-reduce job; connecting to the buffering database; looking up a unique token for a map-reduce job at a pending index provided at the buffering database; and writing to the job index provided at the buffering database. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer program stored on a non-transitory computer readable medium and loadable into the internal memory of a digital computer, comprising software code portions, when said program is run on a computer, for performing a method for outputting map-reduce jobs to an archive file comprising:
-
program instructions to provide an archive manager and expose an interface to be called from map-reduce jobs to output to the archive file in a map-reduce distributed file system; program instructions to use a buffering database as a temporary cache to buffer updates to the archive file; program instructions to handle by the archive manager calls from map-reduce jobs to allow; program instructions to read directly from the archive file or from a job index in the buffering database; and program instructions to write to the job index in the buffering database used as a temporary cache to buffer the updates; program instructions to output the updates from the job index to the archive file, wherein the archive file is a single, zip formatted file, and the updates are concurrently written to the archive file from a plurality of map-reduced tasks running within a single map-reduced job while the single map-reduced job is running; and wherein handling by the archive manager calls from map-reduce jobs further comprises; program instructions to receive a write call for a task of a map-reduce job; program instructions to connect to the buffering database; program instructions to look up a unique token for a map-reduce job at a pending index provided at the buffering database; and program instructions to write to the job index provided at the buffering database. - View Dependent Claims (16, 17, 18)
-
Specification