Efficient polynomial mapping of data for use with linear support vector machines

US 8,463,591 B1
Filed: 07/29/2010
Issued: 06/11/2013
Est. Priority Date: 07/31/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving, in a computing system comprising one or more computers, original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser constructing a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence;

generating, by the computing system, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and

using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for polynomial mapping of data for linear SVMs. In one aspect, a method includes training a linear classifier by receiving feature vectors and generating a condensed representation of a mapped vector corresponding to a polynomial mapping of each feature vector, the condensed representation including an index into a weight vector for each non-zero component of the mapped vector. A linear classifier is trained on the condensed representations. In another aspect, a method includes receiving a feature vector, identifying non-zero components resulting from a polynomial mapping of the feature vector, and mapping the combination of one or more elements of each non-zero component to a weight in a weight vector to determine a set of weights. The feature vector is classified according to a classification score derived by summing the set of weights.

Citations

26 Claims

1. A computer-implemented method, comprising:
- receiving, in a computing system comprising one or more computers, original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser constructing a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence;
  
  generating, by the computing system, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and
  
  using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, where d is an integer greater than one.
  - 3. The method of claim 1, where each training feature vector is a condensed representation of a corresponding feature vector, the condensed representation containing indices of only elements of the corresponding feature vector that are present in the parse state.
  - 4. The method of claim 1, where the tokens corresponding to the words and punctuation of the sentence comprise tokens each associated with a respective part of speech.
  - 5. The method of claim 1, where the weight vector only includes weights for features, and combinations of features, that are present in the original training data.

6. A computer-implemented method, comprising:
- receiving a computer-readable list of tokens representing a sentence having words and punctuation, the tokens corresponding to the words and punctuation of the sentence;
  
  identifying, in a computing system comprising one or more computers, from the list of tokens, a feature vector corresponding to features in a current parse state of a parser, where each element of the feature vector indicates a presence or absence of a respective feature of the current parse state;
  
  identifying, by operation of the computing system, non-zero components resulting from a degree-d polynomial mapping of the feature vector to a set of polynomial components, the polynomial mapping associating each polynomial component with either a single element of the feature vector or a product of up to d elements of the feature vector;
  
  for each non-zero component, mapping, by operation of the computing system, the combination of one or more elements of the non-zero polynomial component to a single weight in a weight vector to determine a set of weights, where each position in the weight vector corresponds to a distinct combination of elements of the feature vector;
  
  deriving, by operation of the computing system, a classification score for the current parse state for a particular transition in the parser, by summing the set of weights; and
  
  using the transition to build a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The method of claim 6, where d is an integer greater than one.
  - 8. The method of claim 6, where the feature vector is a condensed representation of a corresponding complete feature vector, the condensed representation containing indices of only elements of the corresponding complete feature vector that are present in the parse state.
  - 9. The method of claim 6, where:
    - the list of tokens comprises tokens each associated with a respective part of speech.
  - 10. The method of claim 6, where mapping the one or more elements of the non-zero components comprises, for at least one of the non-zero components, determining that the weight vector does not include a weight for the at least one non-zero component, and selecting a default weight for the at least one non-zero component.
  - 11. The method of claim 10, where the default weight is zero.
  - 12. The method of claim 6, where mapping the one or more elements of each non-zero component to a weight in the weight vector includes, for each component that is associated with a product of elements of the feature vector, identifying an index into the weight vector from a hash table that takes the indices of the elements as input and outputs the index, and retrieving the weight stored at that index in the weight vector.

13. A computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform actions comprising:
- receiving original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser constructing a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence;
  
  generating, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and
  
  using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.

14. A computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform actions comprising:
- receiving a list of tokens representing a sentence having words and punctuation, the tokens corresponding to the words and punctuation of the sentence;
  
  identifying, from the list of tokens, a feature vector corresponding to features in a current parse state of a parser, where each element of the feature vector indicates a presence or absence of a respective feature of the current parse state;
  
  identifying non-zero components resulting from a degree-d polynomial mapping of the feature vector to a set of polynomial components, the polynomial mapping associating each polynomial component with either a single element of the feature vector or a product of up to d elements of the feature vector;
  
  for each non-zero component, mapping the combination of one or more elements of the non-zero polynomial component to a single weight in a weight vector to determine a set of weights, where each position in the weight vector corresponds to a distinct combination of elements of the feature vector;
  
  deriving a classification score for the current parse state for a particular transition in the parser, by summing the set of weights; and
  
  using the transition to build a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.

15. A system, comprising:
- one or more computers programmed to perform actions comprising;
  
  receiving original training data including a plurality of training feature vectors and a respective decision for each training feature vector, each training feature vector representing a parse state of a parser, each element of each training feature vector indicating a presence or absence of a feature of the parse state, the parser configured to construct a dependency graph for a sentence having words and punctuation, the parser input comprising tokens corresponding to the words and punctuation of the sentence;
  
  generating, for each training feature vector, a condensed representation of a mapped vector corresponding to a degree-d polynomial mapping of the training feature vector, the mapping defining each component of the mapped vector as either a single element of the training feature vector or a product of up to d elements of the training feature vector, the condensed representation of the mapped vector including an index for each non-zero component of the mapped vector, each index indicating a position in a weight vector of a weight for the corresponding non-zero mapped vector component; and
  
  using a linear support vector machine to train a classifier on the condensed representations, the linear support vector machine configured to receive as training data the condensed representations and decisions for the training feature vectors corresponding to the condensed representations, and determine the weights in the weight vector from the condensed training data, where the classifier corresponds to a given parsing transition and uses the weight vector that was trained for the given parsing transition to generate a score for the given parsing transition, where the score for the given parsing transition is used to determine whether to perform the given parsing transition in building a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The system of claim 15, where d is an integer greater than one.
  - 17. The system of claim 15, where each training feature vector is a condensed representation of a corresponding feature vector, the condensed representation containing indices of only elements of the corresponding feature vector that are present in the parse state.
  - 18. The system of claim 15, where the tokens corresponding to the words and punctuation of the sentence comprise tokens each associated with a respective part of speech.
  - 19. The system of claim 15, where the weight vector only includes weights for features, and combinations of features, that are present in the original training data.

20. A system, comprising:
- one or more computers programmed to perform actions comprising;
  
  receiving a list of tokens representing a sentence having words and punctuation, the tokens corresponding to the words and punctuation of the sentence;
  
  identifying, from the list of tokens, a feature vector corresponding to features in a current parse state of a parser, where each element of the feature vector indicates a presence or absence of a respective feature of the current parse state;
  
  identifying non-zero components resulting from a degree-d polynomial mapping of the feature vector to a set of polynomial components, the polynomial mapping associating each polynomial component with either a single element of the feature vector or a product of up to d elements of the feature vector;
  
  for each non-zero component, mapping the combination of one or more elements of the non-zero polynomial component to a single weight in a weight vector to determine a set of weights, where each position in the weight vector corresponds to a distinct combination of elements of the feature vector;
  
  deriving a classification score for the current parse state for a particular transition in the parser, by summing the set of weights; and
  
  using the transition to build a dependency graph, the dependency graph representing syntactic modifiers for words in the sentence through labeled directed edges.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The system of claim 20, where d is an integer greater than one.
  - 22. The system of claim 20, where the feature vector is a condensed representation of a corresponding complete feature vector, the condensed representation containing indices of only elements of the corresponding complete feature vector that are present in the parse state.
  - 23. The system of claim 20, where:
    - the list of tokens comprises tokens each associated with a respective part of speech.
  - 24. The system of claim 20, where mapping the one or more elements of the non-zero components comprises, for at least one of the non-zero components, determining that the weight vector does not include a weight for the at least one non-zero component, and selecting a default weight for the at least one non-zero component.
  - 25. The system of claim 24, where the default weight is zero.
  - 26. The system of claim 24, where mapping the one or more elements of each non-zero component to a weight in the weight vector includes, for each component that is associated with a product of elements of the feature vector, identifying an index into the weight vector from a hash table that takes the indices of the elements as input and outputs the index, and retrieving the weight stored at that index in the weight vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lin, Chih-Jen, Chang, Yin-Wen, Hsieh, Cho-Jui, Chang, Kai-Wei, Ringgaard, Michael
Primary Examiner(s)
Lerner, Martin

Application Number

US12/846,741
Time in Patent Office

1,048 Days
Field of Search

704/1, 704/9, 706/10, 706/12
US Class Current

704/1
CPC Class Codes

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/216   using statistical methods

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

Efficient polynomial mapping of data for use with linear support vector machines

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient polynomial mapping of data for use with linear support vector machines

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links