Efficient polynomial mapping of data for use with linear support vector machines
First Claim
1. A computer-implemented method, comprising:
- receiving, in a computing system comprising one or more computers, original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser constructing a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence;
generating, by the computing system, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and
using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for polynomial mapping of data for linear SVMs. In one aspect, a method includes training a linear classifier by receiving feature vectors and generating a condensed representation of a mapped vector corresponding to a polynomial mapping of each feature vector, the condensed representation including an index into a weight vector for each non-zero component of the mapped vector. A linear classifier is trained on the condensed representations. In another aspect, a method includes receiving a feature vector, identifying non-zero components resulting from a polynomial mapping of the feature vector, and mapping the combination of one or more elements of each non-zero component to a weight in a weight vector to determine a set of weights. The feature vector is classified according to a classification score derived by summing the set of weights.
-
Citations
26 Claims
-
1. A computer-implemented method, comprising:
-
receiving, in a computing system comprising one or more computers, original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser constructing a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence; generating, by the computing system, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method, comprising:
-
receiving a computer-readable list of tokens representing a sentence having words and punctuation, the tokens corresponding to the words and punctuation of the sentence; identifying, in a computing system comprising one or more computers, from the list of tokens, a feature vector corresponding to features in a current parse state of a parser, where each element of the feature vector indicates a presence or absence of a respective feature of the current parse state; identifying, by operation of the computing system, non-zero components resulting from a degree-d polynomial mapping of the feature vector to a set of polynomial components, the polynomial mapping associating each polynomial component with either a single element of the feature vector or a product of up to d elements of the feature vector; for each non-zero component, mapping, by operation of the computing system, the combination of one or more elements of the non-zero polynomial component to a single weight in a weight vector to determine a set of weights, where each position in the weight vector corresponds to a distinct combination of elements of the feature vector; deriving, by operation of the computing system, a classification score for the current parse state for a particular transition in the parser, by summing the set of weights; and using the transition to build a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform actions comprising:
-
receiving original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser constructing a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence; generating, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.
-
-
14. A computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform actions comprising:
-
receiving a list of tokens representing a sentence having words and punctuation, the tokens corresponding to the words and punctuation of the sentence; identifying, from the list of tokens, a feature vector corresponding to features in a current parse state of a parser, where each element of the feature vector indicates a presence or absence of a respective feature of the current parse state; identifying non-zero components resulting from a degree-d polynomial mapping of the feature vector to a set of polynomial components, the polynomial mapping associating each polynomial component with either a single element of the feature vector or a product of up to d elements of the feature vector; for each non-zero component, mapping the combination of one or more elements of the non-zero polynomial component to a single weight in a weight vector to determine a set of weights, where each position in the weight vector corresponds to a distinct combination of elements of the feature vector; deriving a classification score for the current parse state for a particular transition in the parser, by summing the set of weights; and using the transition to build a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.
-
-
15. A system, comprising:
-
one or more computers programmed to perform actions comprising; receiving original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser configured to construct a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence; generating, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A system, comprising:
one or more computers programmed to perform actions comprising; receiving a list of tokens representing a sentence having words and punctuation, the tokens corresponding to the words and punctuation of the sentence; identifying, from the list of tokens, a feature vector corresponding to features in a current parse state of a parser, where each element of the feature vector indicates a presence or absence of a respective feature of the current parse state; identifying non-zero components resulting from a degree-d polynomial mapping of the feature vector to a set of polynomial components, the polynomial mapping associating each polynomial component with either a single element of the feature vector or a product of up to d elements of the feature vector; for each non-zero component, mapping the combination of one or more elements of the non-zero polynomial component to a single weight in a weight vector to determine a set of weights, where each position in the weight vector corresponds to a distinct combination of elements of the feature vector; deriving a classification score for the current parse state for a particular transition in the parser, by summing the set of weights; and using the transition to build a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges. - View Dependent Claims (21, 22, 23, 24, 25, 26)
Specification