Spread kernel support vector machine
First Claim
1. A method for training a support vector machine, comprising the steps of:
- a) selecting, via a processor of a first processing node, a local working set of training data based on local training data stored in a memory of the first processing node;
b) transmitting, via a network interface of the first processing node, certain gradients to a second processing node, the certain gradients selected from gradients of the working set of training data;
c) receiving at the network interface of the first processing node an identification of a global working set of training data;
d) executing, via the processor of the first processing node, a quadratic function stored in a storage device of the first processing node to optimize said global working set of training data;
e) updating gradients of the training data stored in the memory of the first processing node; and
f) repeating said steps a) through e) until a convergence condition is met.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a parallel support vector machine technique for solving problems with a large set of training data where the kernel computation, as well as the kernel cache and the training data, are spread over a number of distributed machines or processors. A plurality of processing nodes are used to train a support vector machine based on a set of training data. Each of the processing nodes selects a local working set of training data based on data local to the processing node, for example a local subset of gradients. Each node transmits selected data related to the working set (e.g., gradients having a maximum value) and receives an identification of a global working set of training data. The processing node optimizes the global working set of training data and updates a portion of the gradients of the global working set of training data. The updating of a portion of the gradients may include generating a portion of a kernel matrix. These steps are repeated until a convergence condition is met. Each of the local processing nodes may store all, or only a portion of, the training data. While the steps of optimizing the global working set of training data, and updating a portion of the gradients of the global working set, are performed in each of the local processing nodes, the function of generating a global working set of training data is performed in a centralized fashion based on the selected data (e.g., gradients of the local working set) received from the individual processing nodes.
-
Citations
18 Claims
-
1. A method for training a support vector machine, comprising the steps of:
-
a) selecting, via a processor of a first processing node, a local working set of training data based on local training data stored in a memory of the first processing node; b) transmitting, via a network interface of the first processing node, certain gradients to a second processing node, the certain gradients selected from gradients of the working set of training data; c) receiving at the network interface of the first processing node an identification of a global working set of training data; d) executing, via the processor of the first processing node, a quadratic function stored in a storage device of the first processing node to optimize said global working set of training data; e) updating gradients of the training data stored in the memory of the first processing node; and f) repeating said steps a) through e) until a convergence condition is met. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 18)
-
-
9. A method for training a support vector machine, comprising the steps of:
-
a) selecting, at each of a plurality of processing nodes, via a processor of each of the processing nodes, a local working set of training data based on local training data stored in a memory of each of the processing nodes; b) generating, via a processor of a network machine, a global working set of training data using certain gradients selected from gradients of each of the working sets of training data; c) executing, at each of said plurality of processing nodes, via the processor of each of the processing nodes, a quadratic function stored in a storage device of each of the processing nodes to optimize said global working set of training data; d) updating, at each of said plurality of processing nodes, gradients of the training data stored in the memory of each of the processing nodes; and e) repeating steps a) through d) until a convergence condition is met. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
Specification