Systems and methods for handling data
First Claim
1. A method for handling a plurality of files, the method comprising:
- receiving a service request comprising a filter to exclude a portion of the plurality of files from processing;
enumerating at least a portion of the files not excluded by the filter using at least one crawler of a set of crawlers, the set of crawlers including a first crawler and a second crawler, wherein enumerating includes using the first crawler or the second crawler to determine a number of files in the portion of the files not excluded by the filter and to determine an amount of processing work to process the number of files in the portion of the files not excluded by the filter;
identifying a first set of files of the portion of the files not excluded by the filter;
excluding the first set of files from a second set of files, the second set of files to be processed, the second set of files including a first batch of the files and a second batch of the files;
submitting a first set of file identifiers associated with the first batch to a first queue;
spawning a first set of service providers in a first set of nodes according to first workload associated with the first queue;
processing, using the first set of service providers, the first batch of the portion of the files not excluded by the filter;
submitting a second set of file identifiers associated with the second batch to at least one of the first queue and a second queue;
spawning a second set of service providers in a second set of nodes according to second workload associated with the at least one of the first queue and the second queue; and
processing, using the second set of service providers, the second batch of the portion of the files not excluded by the filter.
12 Assignments
0 Petitions
Accused Products
Abstract
A method for handling files to timely provide reports concerning the files is disclosed. The method may include crawling (or enumerating) the files, to figure out how many files/data are to be processed and/or how much processing work is to be performed. The method may also include processing the files in batches. Identification information (e.g., filenames, file paths, and/or object identifiers) pertaining to the files may be sent to one or more queues for batch processing of the files. The method may further include generating a report after processing of a batch among the batches is completed. The report may be generated before subsequent processing of a subsequent batch is completed.
17 Citations
71 Claims
-
1. A method for handling a plurality of files, the method comprising:
-
receiving a service request comprising a filter to exclude a portion of the plurality of files from processing; enumerating at least a portion of the files not excluded by the filter using at least one crawler of a set of crawlers, the set of crawlers including a first crawler and a second crawler, wherein enumerating includes using the first crawler or the second crawler to determine a number of files in the portion of the files not excluded by the filter and to determine an amount of processing work to process the number of files in the portion of the files not excluded by the filter; identifying a first set of files of the portion of the files not excluded by the filter; excluding the first set of files from a second set of files, the second set of files to be processed, the second set of files including a first batch of the files and a second batch of the files; submitting a first set of file identifiers associated with the first batch to a first queue; spawning a first set of service providers in a first set of nodes according to first workload associated with the first queue; processing, using the first set of service providers, the first batch of the portion of the files not excluded by the filter; submitting a second set of file identifiers associated with the second batch to at least one of the first queue and a second queue; spawning a second set of service providers in a second set of nodes according to second workload associated with the at least one of the first queue and the second queue; and processing, using the second set of service providers, the second batch of the portion of the files not excluded by the filter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A computer program product, comprising a computer usable medium having a computer readable program code embodied therein, the computer readable program code adapted to be executed to implement a method for handling a plurality of files, the method comprising:
-
a set of queues including at least a first queue and a second queue; a job manager configured to receive a service request comprising a filter to exclude a portion of the files from being processed; a set of crawlers configured to; enumerate at least a portion of the files not excluded by the filter, the set of crawlers including at least a first crawlers and a second crawler, the set of crawlers further configured to determine a number of files in the portion of the files not excluded by the filter and to determine an amount of processing work to process the number of files in the portion of the files not excluded by the filter; identifying a first set of files of the portion of the files not excluded by the filter; excluding the first set of files from a second set of files, the second set of files to be processed, the second set of files including a first batch of the files and a second batch of the files; submit a first set of file identifiers associated with the first batch of the files to the first queue; and submit a second set of file identifiers associated with the second batch of the files to at least one of the first queue and the second queue; a first set of nodes configured to spawn a first set of service providers according to first workload associated with the first queue, the first set of service providers configured to process the first batch of the files; and a second set of nodes configured to spawn a second set of service providers according to second workload associated with the at least one of the first queue and the second queue, the second set of service providers configured to process the second batch of the files. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71)
-
Specification