Identification and Rejection of Meaningless Input During Natural Language Classification
First Claim
1. A method for generating a natural language statistical model comprising:
- from a set of training data, identifying unigrams that are individually meaningless; and
assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.
16 Citations
20 Claims
-
1. A method for generating a natural language statistical model comprising:
-
from a set of training data, identifying unigrams that are individually meaningless; and
assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for generating a natural language statistical model comprising:
-
from a set of training data, identifying unigrams that are individually meaningless;
assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes;
identifying bigrams that are entirely composed of meaningless unigrams;
determining whether the identified bigrams are individually meaningless; and
assigning at least a portion of the bigrams identified as being individually meaningless to the first n-gram class.
-
-
14. A machine readable storage having stored thereon a computer program having a plurality of code sections comprising:
-
code for identifying unigrams that are individually meaningless from a set of training data; and
code for assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification