Identification and rejection of meaningless input during natural language classification
First Claim
Patent Images
1. A method for generating a natural language statistical model comprising:
- receiving, at at least one system comprising a combination of hardware and software, a set of training data comprising unigrams identified as being individually meaningless;
assigning, via the at least one system, at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes; and
processing the classified training data via the at least one system to generate the natural language statistical model.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.
382 Citations
19 Claims
-
1. A method for generating a natural language statistical model comprising:
-
receiving, at at least one system comprising a combination of hardware and software, a set of training data comprising unigrams identified as being individually meaningless; assigning, via the at least one system, at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes; and processing the classified training data via the at least one system to generate the natural language statistical model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for generating a natural language statistical model comprising:
-
receiving, at at least one system comprising a combination of hardware and software, a set of training data comprising unigrams identified as being individually meaningless; assigning, via the at least one system, at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes; identifying bigrams that are entirely composed of meaningless unigrams; determining whether the identified bigrams are individually meaningless; assigning, via the at least one system, at least a portion of the bigrams identified as being individually meaningless to the first n-gram class; and processing the classified training data via the at least one system to generate the natural language statistical model.
-
-
13. A machine readable storage having stored thereon a computer program having a plurality of code sections comprising:
-
code for identifying unigrams that are individually meaningless from a set of training data; code for assigning at least a portion of the unigrams identified as being meaningless to a first n-gram class selected from a plurality of n-gram classes; and code for processing the classified training data to generate at least one statistical model. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification