Ranking Parser for a Natural Language Processing System

US 20060184353A1
Filed: 03/31/2006
Published: 08/17/2006
Est. Priority Date: 07/20/2000
Status: Active Grant

First Claim

Patent Images

1. One or more computer-readable storage media having computer-executable instructions that, when executed by a computer, determine language usage probabilities of a natural language based upon a training corpus, the method comprising:

examining a training corpus, wherein such corpus includes phrases parsed in accordance with a set of grammar rules;

computing probabilities of usage of combinations of linguistic features based upon empirical tracking of appearances of instances of such combinations in phrases within the training corpus.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A natural language parse ranker of a natural language processing (NLP) system employs a goodness function to rank the possible grammatically valid parses of an utterance. The goodness function generates a statistical goodness measure (SGM) for each valid parse The parse ranker orders the parses based upon their SGM values. It presents the parse with the greatest SGM value as the one that most likely represents the intended meaning of the speaker. The goodness function of this parse ranker is highly accurate in representing the intended meaning of a speaker. It also has reasonable training data requirements. With this parse ranker, the SGM of a particular parse is the combination of all of the probabilities of each node within the parse tree of such parse. The probability at a given node is the probability of taking a transition (“grammar rule”) at that point. The probability at a node is conditioned on highly predicative linguistic phenomena, such as “phrase levels,” “null transitions,” and “syntactic history”

Citations

20 Claims

1. One or more computer-readable storage media having computer-executable instructions that, when executed by a computer, determine language usage probabilities of a natural language based upon a training corpus, the method comprising:
- examining a training corpus, wherein such corpus includes phrases parsed in accordance with a set of grammar rules;
  
  computing probabilities of usage of combinations of linguistic features based upon empirical tracking of appearances of instances of such combinations in phrases within the training corpus.
- View Dependent Claims (2, 3)
- - 2. One or more media as recited in claim 1, wherein the combinations of linguistic features comprises:
    - (transition, headword, phrase level, syntactic history, segtype);
      
      (headword, phrase level, syntactic history, segtype);
      
      (modifying headword, transition, headword); and
      
      (transition, headword).
  - 3. One or more media as recited in claim 1, wherein the computing comprises counting appearances of instances of combinations of linguistic features within the training corpus.

4. One or more computer-readable storage media having computer-executable instructions that, when executed by a computer, determine language usage probabilities of a natural language based upon a training corpus, the method comprising:
- examining a training corpus, wherein such corpus includes phrases parsed in accordance with a set of grammar rules, the phrases having been parsed, at least partially, automatically and without human intervention;
  
  computing probabilities of usage of combinations of linguistic features based upon empirical tracking of appearances of instances of such combinations in phrases within the training corpus.
- View Dependent Claims (5, 6, 7)
- - 5. One or more media as recited in claim 4, wherein the combinations of linguistic features comprises:
    - (transition, headword, phrase level, syntactic history, segtype);
      
      (headword, phrase level, syntactic history, segtype);
      
      (modifying headword, transition, headword); and
      
      (transition, headword).
  - 6. One or more media as recited in claim 4, wherein the combinations of linguistic features consist of:
    - (transition, headword, phrase level, syntactic history, segtype);
      
      (headword, phrase level, syntactic history, segtype);
      
      (modifying headword, transition, headword);
      
      or (transition, headword).
  - 7. One or more media as recited in claim 4, wherein the computing comprises counting appearances of instances of combinations of linguistic features within the training corpus.

8. One or more computer-readable storage media having computer-executable instructions that, when executed by a computer, perform a method to parse a phrase, the method comprising:
- generating at least one parse tree representing a syntactically valid parse of the phrase, wherein the parse tree has hierarchical nodes;
  
  calculating a syntactic history for each node;
  
  computing the probability for a node based upon the syntactic history calculated for that node.
- View Dependent Claims (9, 10)
- - 9. One or more media as recited in claim 8 further comprising storing the syntactic history for each node.
  - 10. One or more media as recited in claim 8, wherein the syntactic history may indicate one or more of the following syntactic phenomena:
    - passive verb phrase;
      
      negative polarity;
      
      domodal fronting;
      
      comparative;
      
      imperative;
      
      topicalization of verb object.

11. An apparatus comprising:
- a processor;
  
  a natural-language-usage parser executable on the processor to;
  
  generating at least one parse tree representing a syntactically valid parse of the phrase, wherein the parse tree has hierarchical nodes;
  
  calculating a syntactic history for each node;
  
  computing the probability for a node based upon the syntactic history calculated for that node.
- View Dependent Claims (12)
- - 12. An apparatus as recited in claim 11, wherein the syntactic history may indicate one or more of the following syntactic phenomena:
    - passive verb phrase;
      
      negative polarity;
      
      domodal fronting;
      
      comparative;
      
      imperative;
      
      topicalization of verb object.

13. A natural-language-usage probability determiner comprising:
- data-acquisition device is configured to receive language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus;
  
  probability calculator is configured to calculate a probability at a node of a parse tree based upon linguistic features of the node and the language-usage probabilities.
- View Dependent Claims (14, 15)
- - 14. A natural-language-usage probability determiner as recited in claim 13, wherein the combinations of linguistic features comprises:
    - (transition, headword, phrase level, syntactic history, segtype);
      
      (headword, phrase level, syntactic history, segtype);
      
      (modifying headword, transition, headword); and
      
      (transition, headword).
  - 15. A natural-language-usage probability determiner as recited in claim 13, wherein the probability calculator is further configured to count appearances of instances of combinations of linguistic features within the training corpus.

16. A natural-language-usage probability determiner comprising:
- data-acquisition device is configured to receive language-usage probabilities based upon appearances of instances of combinations of linguistic features within a training corpus, wherein the training corpus includes phrases parsed in accordance with a set of grammar rules, the phrases having been parsed, at least partially, automatically and without human intervention;
  
  probability calculator is configured for calculating a probability at a node of a parse tree based upon linguistic features of the node and the language-usage probabilities.
- View Dependent Claims (17, 18)
- - 17. A natural-language-usage probability determiner as recited in claim 16, wherein the combinations of linguistic features comprises:
    - (transition, headword, phrase level, syntactic history, segtype);
      
      (headword, phrase level, syntactic history, segtype);
      
      (modifying headword, transition, headword); and
      
      (transition, headword).
  - 18. A natural-language-usage probability determiner as recited in claim 16, wherein the probability calculator is further configured to count appearances of instances of combinations of linguistic features within the training corpus.

19. A data structure for use with a computer having a processor and a memory, the data structure comprising:
- a corpus comprising one or more phrases in a natural language;
  
  parse trees having hierarchical nodes, each tree representing at least one syntactically valid parse of each phrase in a subset of the corpus;
  
  wherein each of one or more nodes have a syntactic history and a probability associated therewith, wherein the probability of a node is based a node'"'"'s associated syntactic history.
- View Dependent Claims (20)
- - 20. The structure as recited in claim 19, wherein the subset of the corpus includes all phrases in the corpus.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Weise, David N.

Granted Patent

US 7,610,188 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/4
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

G06F 40/216 using statistical methods

Ranking Parser for a Natural Language Processing System

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Ranking Parser for a Natural Language Processing System

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links