Repartitioning parallel SVM computations using dynamic timeout

US 7,865,898 B2
Filed: 01/27/2006
Issued: 01/04/2011
Est. Priority Date: 01/27/2006
Status: Active Grant

First Claim

Patent Images

1. A method for reducing execution time of a parallel support vector machine (SVM) application, comprising:

partitioning an input data set into chunks of data;

distributing the partitioned chunks of data across a plurality of available computing nodes;

executing the parallel SVM application on the chunks of data in parallel across the plurality of available computing nodes;

computing a mean of completion times for a portion of the plurality of available computing nodes that have completed processing their respective chunks of data;

setting a first timeout period equal to a constant factor times the mean of the completion times minus a current elapsed time;

determining if the first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data; and

if so,repartitioning the input data set into chunks of data that are different from the partitioned chunks of data;

redistributing the repartitioned chunks of data across some or all of the plurality of available computing nodes; and

executing the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that reduces execution time of a parallel SVM application. During operation, the system partitions an input data set into chunks of data. Next, the system distributes the partitioned chunks of data across a plurality of available computing nodes and executes the parallel SVM application on the chunks of data in parallel across the plurality of available computing nodes. The system then determines if a first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data. If so, the system (1) repartitions the input data set into different chunks of data; (2) redistributes the repartitioned chunks of data across some or all of the plurality of available computing nodes; and (3) executes the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes.

16 Citations

View as Search Results

20 Claims

1. A method for reducing execution time of a parallel support vector machine (SVM) application, comprising:
- partitioning an input data set into chunks of data;
  
  distributing the partitioned chunks of data across a plurality of available computing nodes;
  
  executing the parallel SVM application on the chunks of data in parallel across the plurality of available computing nodes;
  
  computing a mean of completion times for a portion of the plurality of available computing nodes that have completed processing their respective chunks of data;
  
  setting a first timeout period equal to a constant factor times the mean of the completion times minus a current elapsed time;
  
  determining if the first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data; and
  
  if so,repartitioning the input data set into chunks of data that are different from the partitioned chunks of data;
  
  redistributing the repartitioned chunks of data across some or all of the plurality of available computing nodes; and
  
  executing the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein if the first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data, prior to repartitioning the input data set into different chunks of data, the method further comprises:
    - cancelling the execution of the parallel SVM application; and
      
      deleting results of the parallel SVM application.
  - 3. The method of claim 1, wherein determining if the first timeout period has been exceeded before all available computing nodes have finished processing their respective chunks of data involves:
    - determining if a portion of the plurality of available computing nodes have completed processing associated chunks of data;
      
      if not, waiting until the portion of the plurality of available computing nodes have completed processing their respective chunks of data.
  - 4. The method of claim 3, wherein the constant factor is three.
  - 5. The method of claim 1, wherein if the first timeout period has not been exceeded before all available computing nodes have finished processing their respective chunks of data, the method further comprises:
    - combining the results from each processed chunk of data;
      
      executing the parallel SVM application to process the combined results;
      
      computing the mean of the completion time for processing all chunks of data;
      
      setting a second timeout period to a constant factor times the mean of the completion times for all chunks of data;
      
      determining if the second timeout period has been exceeded before the parallel SVM application has completed processing the combined results;
      
      if so,cancelling the execution of the parallel SVM application on the combined results;
      
      deleting results of the parallel SVM application;
      
      repartitioning the input data set into different chunks of data;
      
      redistributing the repartitioned chunks of data across the plurality of available computing nodes; and
      
      executing the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes; and
      
      otherwise, outputting the results.
  - 6. The method of claim 1, wherein repartitioning the input data set into different chunks of data involves:
    - removing one computing node from the plurality of available computing nodes; and
      
      repartitioning the input data set into the different chunks of data for the remaining computing nodes in the plurality of available computing nodes.
  - 7. The method of claim 1, wherein if only a single computing node is available, the method further comprises:
    - executing the parallel SVM application on the single computing node to process the input data set; and
      
      outputting the result.
  - 8. The method of claim 1, wherein partitioning the input data set into chunks of data involves partitioning the input data set so that the number of chunks of data equals the number of available computing nodes.

9. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for reducing execution time of a parallel support vector machine (SVM) application, the method comprising:
- partitioning an input data set into chunks of data;
  
  distributing the partitioned chunks of data across a plurality of available computing nodes;
  
  executing the parallel SVM application on the chunks of data in parallel across the plurality of available computing nodes;
  
  computing a mean of completion times for a portion of the plurality of available computing nodes that have completed processing their respective chunks of data;
  
  setting a first timeout period equal to a constant factor times the mean of the completion times minus a current elapsed time;
  
  determining if the first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data; and
  
  if so,repartitioning the input data set into chunks of data that are different from the partitioned chunks of data;
  
  redistributing the repartitioned chunks of data across some or all of the plurality of available computing nodes; and
  
  executing the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The computer-readable storage medium of claim 9, wherein if the first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data, prior to repartitioning the input data set into different chunks of data, the method further comprising:
    - cancelling the execution of the parallel SVM application; and
      
      deleting results of the parallel SVM application.
  - 11. The computer-readable storage medium of claim 9, wherein determining if the first timeout period has been exceeded before all available computing nodes have finished processing their respective chunks of data involves:
    - determining if a portion of the plurality of available computing nodes have completed processing associated chunks of data;
      
      if not, waiting until the portion of the plurality of available computing nodes have completed processing their respective chunks of data.
  - 12. The computer-readable storage medium of claim 11, wherein the constant factor is three.
  - 13. The computer-readable storage medium of claim 9, wherein if the first timeout period has not been exceeded before all available computing nodes have finished processing their respective chunks of data, the method further comprising:
    - combining the results from each processed chunk of data;
      
      executing the parallel SVM application to process the combined results;
      
      computing the mean of the completion time for processing all chunks of data;
      
      setting a second timeout period to a constant factor times the mean of the completion times for all chunks of data;
      
      determining if the second timeout period has been exceeded before the parallel SVM application has completed processing the combined results;
      
      if so,cancelling the execution of the parallel SVM application on the combined results;
      
      deleting results of the parallel SVM application;
      
      repartitioning the input data set into different chunks of data;
      
      redistributing the repartitioned chunks of data across the plurality of available computing nodes; and
      
      executing the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes; and
      
      otherwise, outputting the results.
  - 14. The computer-readable storage medium of claim 9, wherein repartitioning the input data set into different chunks of data involves:
    - removing one computing node from the plurality of available computing nodes; and
      
      repartitioning the input data set into the different chunks of data for the remaining computing nodes in the plurality of available computing nodes.
  - 15. The computer-readable storage medium of claim 9, wherein if only a single computing node is available, the method further comprising:
    - executing the parallel SVM application on the single computing node to process the input data set; and
      
      outputting the result.
  - 16. The computer-readable storage medium of claim 9, wherein partitioning the input data set into chunks of data involves partitioning the input data set so that the number of chunks of data equals the number of available computing nodes.

17. An apparatus that reduces execution time of a parallel support vector machine (SVM) application, comprising:
- a processor; and
  
  processor, wherein the processor is configured to;
  
  a memory coupled to the processor;
  
  partition an input data set into chunks of data;
  
  distribute the partitioned chunks of data across a plurality of available computing nodes;
  
  execute the parallel SVM application on the chunks of data in parallel across the plurality of available computing nodes;
  
  compute a mean of completion times for a portion of the plurality of available computing nodes that have completed processing their respective chunks of data;
  
  set a first timeout period equal to a constant factor times the mean of the completion times minus a current elapsed time; and
  
  determine if the first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data; and
  
  if so, torepartition the input data set into chunks of data that are different from the partitioned chunks of data;
  
  redistribute the repartitioned chunks of data across some or all of the plurality of available computing nodes; and
  
  toexecute the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes.
- View Dependent Claims (18, 19, 20)
- - 18. The apparatus of claim 17, wherein if the first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data, prior to repartitioning the input data set into different chunks of data, the processor is configured to:
    - cancel the execution of the parallel SVM application; and
      
      todelete results of the parallel SVM application.
  - 19. The apparatus of claim 17, wherein while determining if the first timeout period has been exceeded before all available computing nodes have finished processing their respective chunks of data, the processor is configured to:
    - determine if a portion of the plurality of available computing nodes have completed processing associated chunks of data;
      
      if not, to wait until the portion of the plurality of available computing nodes have completed processing their respective chunks of data.
  - 20. The apparatus of claim 17, wherein if the first timeout period has not been exceeded before all available computing nodes have finished processing their respective chunks of data,the processor is configured to:
    - combine the results from each processed chunk of data;
      
      execute the parallel SVM application to process the combined results;
      
      compute the mean of the completion time for processing all chunks of data; and
      
      toset a second timeout period to a constant factor times the mean of the completion times for all chunks of data; and
      
      the processor is configured to;
      
      determine if the second timeout period has been exceeded before the parallel SVM application has completed processing the combined results;
      
      if so, tocancel the execution of the parallel SVM application on the combined results;
      
      delete results of the parallel SVM application;
      
      repartition the input data set into different chunks of data;
      
      redistribute the repartitioned chunks of data across the plurality of available computing nodes; and
      
      toexecute the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes; and
      
      otherwise, to output the results.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Oracle America, Inc. (Oracle Corporation)
Inventors
Vaidyanathan, Kalyanaraman, Gross, Kenny C.
Primary Examiner(s)
Bullock, Jr.; Lewis A
Assistant Examiner(s)
Chew; Brian

Application Number

US11/341,000
Publication Number

US 20070179927A1
Time in Patent Office

1,803 Days
Field of Search

718/105, 709/201, 712/7
US Class Current

718/105
CPC Class Codes

G06F 18/2411 based on the proximity to a...

G06V 10/94 Hardware or software archit...

Repartitioning parallel SVM computations using dynamic timeout

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Repartitioning parallel SVM computations using dynamic timeout

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links