Efficient parsing with structured prediction cascades

US 8,914,279 B1
Filed: 09/21/2012
Issued: 12/16/2014
Est. Priority Date: 09/23/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving, at a computing device having one or more processors, a sentence including one or more words;

determining, at the computing device, an index set of possible head-modifier dependencies for the sentence, the index set including inner arcs and outer arcs, the inners arcs representing possible head-modifier dependency between words in the sentence separated by a distance less than or equal to a first distance threshold and outer arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than the first distance threshold;

pruning, at the computing device, the outer arcs to exclude arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than a second distance threshold to obtain a first pruned index set, the second distance threshold being based on a determination of a longest head-modifier dependency distance observed in training data;

pruning, at the computing device, the first pruned index set based on an augmented vine parsing algorithm to obtain a second pruned index set, the second pruned index set including;

(i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs in the first pruned index set when a likelihood that there exists a possible outer arc that is appropriate is greater than the first threshold, wherein each specific inner arc corresponds to a specific index and wherein the likelihood that the specific inner arc is appropriate is determined based on a max-marginal value of its corresponding specific index;

pruning, at the computing device, the second pruned index set based on a second parsing algorithm to obtain a third pruned index set, the second parsing algorithm being a first-order parsing model;

pruning, at the computing device, the third pruned index set based on a third parsing algorithm to obtain a fourth pruned index set, the third parsing algorithm being a second-order parsing model;

pruning, at the computing device, the fourth pruned index set based on a fourth parsing algorithm to obtain a fifth pruned index set, the fourth parsing algorithm being a third-order parsing model;

determining, at the computing device, a most-likely parse for the sentence from the fifth pruned index set; and

outputting, from the computing device, the most-likely parse.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dependency parsing method can include determining an index set of possible head-modifier dependencies for a sentence. The index set can include inner arcs and outer arcs, inners arcs representing possible dependency between words in the sentence separated by a distance less than or equal to a threshold and outer arcs representing possible dependency between words in the sentence separated by a distance greater than the threshold. The index set can be pruned to include: (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs when a likelihood that there exists any possible outer arc that is appropriate is greater than the first threshold. The method can include further pruning the pruned index set based on a second parsing algorithm, and determining a most-likely parse for the sentence from the pruned index set.

Citations

20 Claims

1. A computer-implemented method, comprising:
- receiving, at a computing device having one or more processors, a sentence including one or more words;
  
  determining, at the computing device, an index set of possible head-modifier dependencies for the sentence, the index set including inner arcs and outer arcs, the inners arcs representing possible head-modifier dependency between words in the sentence separated by a distance less than or equal to a first distance threshold and outer arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than the first distance threshold;
  
  pruning, at the computing device, the outer arcs to exclude arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than a second distance threshold to obtain a first pruned index set, the second distance threshold being based on a determination of a longest head-modifier dependency distance observed in training data;
  
  pruning, at the computing device, the first pruned index set based on an augmented vine parsing algorithm to obtain a second pruned index set, the second pruned index set including;
  
  (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs in the first pruned index set when a likelihood that there exists a possible outer arc that is appropriate is greater than the first threshold, wherein each specific inner arc corresponds to a specific index and wherein the likelihood that the specific inner arc is appropriate is determined based on a max-marginal value of its corresponding specific index;
  
  pruning, at the computing device, the second pruned index set based on a second parsing algorithm to obtain a third pruned index set, the second parsing algorithm being a first-order parsing model;
  
  pruning, at the computing device, the third pruned index set based on a third parsing algorithm to obtain a fourth pruned index set, the third parsing algorithm being a second-order parsing model;
  
  pruning, at the computing device, the fourth pruned index set based on a fourth parsing algorithm to obtain a fifth pruned index set, the fourth parsing algorithm being a third-order parsing model;
  
  determining, at the computing device, a most-likely parse for the sentence from the fifth pruned index set; and
  
  outputting, from the computing device, the most-likely parse.

2. A computer-implemented method, comprising:
- receiving, at a computing device, a sentence including one or more words;
  
  determining, at the computing device, an index set of possible head-modifier dependencies for the sentence, the index set including inner arcs and outer arcs, the inners arcs representing possible head-modifier dependency between words in the sentence separated by a first distance less than or equal to a distance threshold and outer arcs representing possible head-modifier dependency between words in the sentence separated by a second distance greater than the distance threshold;
  
  pruning, at the computing device, the index set based on an augmented vine parsing algorithm to obtain a first pruned index set, the first pruned index set including;
  
  (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs when a likelihood that there exists any possible outer arc that is appropriate is greater than the first threshold;
  
  pruning, at the computing device, the first pruned index set based on a second parsing algorithm to obtain a second pruned index set;
  
  determining, at the computing device, a most-likely parse for the sentence from the second pruned index set; and
  
  outputting, from the computing device, the most-likely parse.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 3. The method of claim 2, wherein each specific inner arc corresponds to a specific index, and wherein the likelihood that the specific inner arc is appropriate is determined based on a max-marginal value of its corresponding specific index.
  - 4. The method of claim 2, further comprising pruning, at the computing device, the outer arcs to exclude arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than a second distance threshold before pruning the index set based on the augmented vine parsing algorithm.
  - 5. The method of claim 4, wherein the second distance threshold is based on a determination of a longest head-modifier dependency distance observed in training data.
  - 6. The method of claim 2, wherein the second parsing algorithm is a second-order parsing model.
  - 7. The method of claim 2, wherein the first threshold is determined based on the equation:
  - 8. The method of claim 2, wherein determining the most-likely parse for the sentence from the second pruned index set is based on a margin infused relaxed algorithm.
  - 9. The method of claim 2, wherein the first threshold is determined based on analysis of training data utilizing support vector machines.
  - 10. The method of claim 2, wherein:
    - each specific inner arc of the inner arcs corresponds to a specific index including a specific modifier word and a specific potential head word; and
      
      the likelihood that the specific inner arc is appropriate is based on the specific modifier word and the specific potential head word.
  - 11. The method of claim 2, wherein:
    - each specific outer arc of the outer arcs corresponds to a specific index including a specific modifier word and a specific potential head word; and
      
      the likelihood that there exists any possible outer arc that is appropriate is based on the specific modifier word.

12. A computing device, comprising:
- at least one processor; and
  
  a non-transitory computer-readable storage medium storing executable computer program code, the at least one processor configured to execute the executable computer program code to perform operations including;
  
  receiving a sentence including one or more words;
  
  determining an index set of possible head-modifier dependencies for the sentence, the index set including inner arcs and outer arcs, the inners arcs representing possible head-modifier dependency between words in the sentence separated by a first distance less than or equal to a distance threshold and outer arcs representing possible head-modifier dependency between words in the sentence separated by a second distance greater than the distance threshold;
  
  pruning the index set based on an augmented vine parsing algorithm to obtain a first pruned index set, the first pruned index set including;
  
  (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs when a likelihood that there exists any possible outer arc that is appropriate is greater than the first threshold;
  
  pruning the first pruned index set based on a second parsing algorithm to obtain a second pruned index set;
  
  determining a most-likely parse for the sentence from the second pruned index set; and
  
  outputting the most-likely parse.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The computing device of claim 12, wherein each specific inner arc corresponds to a specific index, and wherein the likelihood that the specific inner arc is appropriate is determined based on a max-marginal value of its corresponding specific index.
  - 14. The computing device of claim 12, wherein the operations further include pruning the outer arcs to exclude arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than a second distance threshold before pruning the index set based on the augmented vine parsing algorithm.
  - 15. The computing device of claim 14, wherein the second distance threshold is based on a determination of a longest head-modifier dependency distance observed in training data.
  - 16. The computing device of claim 12, wherein the second parsing algorithm is a second-order parsing model.
  - 17. The computing device of claim 12, wherein the first threshold is determined based on the equation:
  - 18. The computing device of claim 12, wherein determining the most-likely parse for the sentence from the second pruned index set is based on a margin infused relaxed algorithm.
  - 19. The computing device of claim 12, wherein:
    - each specific inner arc of the inner arcs corresponds to a specific index including a specific modifier word and a specific potential head word; and
      
      the likelihood that the specific inner arc is appropriate is based on the specific modifier word and the specific potential head word.
  - 20. The computing device of claim 12, wherein:
    - each specific outer arc of the outer arcs corresponds to a specific index including a specific modifier word and a specific potential head word; and
      
      the likelihood that there exists any possible outer arc that is appropriate is based on the specific modifier word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Petrov, Slav, Rush, Alexander
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/624,280
Time in Patent Office

816 Days
Field of Search

704/1, 704/9, 704/10
US Class Current

704/9
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

G06F 40/216 using statistical methods

Efficient parsing with structured prediction cascades

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient parsing with structured prediction cascades

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links