Identification and Rejection of Meaningless Input During Natural Language Classification

US 20070244692A1
Filed: 04/13/2006
Published: 10/18/2007
Est. Priority Date: 04/13/2006
Status: Active Grant

First Claim

Patent Images

1. A method for generating a natural language statistical model comprising:

from a set of training data, identifying unigrams that are individually meaningless; and

assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.

16 Citations

View as Search Results

20 Claims

1. A method for generating a natural language statistical model comprising:
- from a set of training data, identifying unigrams that are individually meaningless; and
  
  assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method according to claim 1, wherein assigning the unigrams that are individually meaningless comprises categorizing the unigrams into at least one class selected from the group consisting of a nonsensical class and at least one ambiguous class.
  - 3. The method according to claim 2, wherein assigning at least a portion of the unigrams comprises assigning unigrams that are categorized as nonsensical into the first n-gram class and assigning unigrams that are categorized as ambiguous into at least a second n-gram class selected from the plurality of n-gram classes.
  - 4. The method according to claim 1, further comprising:
    - identifying bigrams that are entirely composed of meaningless unigrams;
      
      determining whether the identified bigrams are individually meaningless; and
      
      assigning at least a portion of the bigrams identified as being individually meaningless to the first n-gram class.
  - 5. The method according to claim 4, further comprising categorizing the identified bigrams into at least one class selected from the group consisting of a nonsensical class and at least one ambiguous class.
  - 6. The method according to claim 4, further comprising:
    - identifying trigrams that are entirely composed of the meaningless bigrams;
      
      determining whether the identified trigrams are individually meaningless; and
      
      assigning at least a portion of the trigrams identified as being individually meaningless to the first n-gram class.
  - 7. The method according to claim 6, further comprising categorizing the identified trigrams into at least one class selected from the group consisting of a nonsensical class and at least one ambiguous class.
  - 8. The method according to claim 1, further comprising:
    - identifying unigrams that are individually meaningful;
      
      identifying bigrams that comprise at least one of the unigrams identified as being meaningful; and
      
      categorizing the identified bigrams as meaningful.
  - 9. The method according to claim 8, further comprising assigning at least a portion of the bigrams categorized as meaningful into a second n-gram class to which unigrams comprising the bigrams are assigned.
  - 10. The method according to claim 1, further comprising:
    - identifying bigrams that are individually meaningful;
      
      identifying trigrams that comprise at least one of the bigrams identified as being meaningful; and
      
      categorizing the identified trigrams as meaningful.
  - 11. The method according to claim 10, further comprising assigning at least a portion of the trigrams categorized as meaningful into a second n-gram class to which bigrams comprising the trigrams are assigned.
  - 12. The method according to claim 1, further comprising processing the training data to generate at least one statistical model.

13. A method for generating a natural language statistical model comprising:
- from a set of training data, identifying unigrams that are individually meaningless;
  
  assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes;
  
  identifying bigrams that are entirely composed of meaningless unigrams;
  
  determining whether the identified bigrams are individually meaningless; and
  
  assigning at least a portion of the bigrams identified as being individually meaningless to the first n-gram class.

14. A machine readable storage having stored thereon a computer program having a plurality of code sections comprising:
- code for identifying unigrams that are individually meaningless from a set of training data; and
  
  code for assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The machine readable storage of claim 14, wherein the code for assigning the unigrams that are individually meaningless comprises code for categorizing the unigrams into at least one class selected from the group consisting of a nonsensical class and at least one ambiguous class.
  - 16. The machine readable storage of claim 15, wherein the code for assigning at least a portion of the unigrams comprises code for assigning unigrams that are categorized as nonsensical into the first n-gram class and assigning unigrams that are categorized as ambiguous into at least a second n-gram class selected from the plurality of n-gram classes.
  - 17. The machine readable storage of claim 14, further comprising:
    - code for identifying bigrams that are entirely composed of meaningless unigrams;
      
      code for determining whether the identified bigrams are individually meaningless; and
      
      code for assigning at least a portion of the bigrams identified as being individually meaningless to the first n-gram class.
  - 18. The machine readable storage of claim 17, further comprising code for categorizing the identified bigrams into at least one class selected from the group consisting of a nonsensical class and at least one ambiguous class.
  - 19. The machine readable storage of claim 17, further comprising:
    - code for identifying trigrams that are entirely composed of the meaningless bigrams;
      
      code for determining whether the identified trigrams are individually meaningless; and
      
      code for assigning at least a portion of the trigrams identified as being individually meaningless to the first n-gram class.
  - 20. The machine readable storage of claim 19, further comprising code for categorizing the identified trigrams into at least one class selected from the group consisting of a nonsensical class and at least one ambiguous class.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Balchandran, Rajesh, Boyer, Linda

Granted Patent

US 7,707,027 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 16/353   into predefined classes

G06F 40/216   using statistical methods

G10L 15/183   using context dependencies,...

Identification and Rejection of Meaningless Input During Natural Language Classification

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Identification and Rejection of Meaningless Input During Natural Language Classification

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links