Evaluating text classifier parameters based on semantic features
First Claim
1. A method, comprising:
- identifying a plurality of feature extraction parameters of a text classifier model, wherein the plurality of feature extraction parameters comprises a first attribute of a first semantic class and a second attribute of a second semantic class, wherein a value of the second attribute is produced by applying a pre-defined transformation to a value of the first attribute;
partitioning a corpus of natural language texts into a training data set comprising a first plurality of natural language texts and a validation data set comprising a second plurality of natural language texts;
determining, in view of the training data set, a set of values of the feature extraction parameters, which maximizes a number of natural language texts of the validation data set that are classified correctly by the text classifier model using the set of values of the feature extraction parameters;
performing, by a processing device, a semantico-syntactic analysis of an input natural language text to produce a semantic structure representing a set of semantic classes;
producing a plurality of values by applying, to the semantic structure representing the input natural language text, the text classifier model using the set of values of the feature extraction parameters, wherein each value of the plurality of values reflects a degree of association of the input natural language text with a particular category of natural language texts;
associating the input natural language text with a category corresponding to an optimal value among the plurality of values; and
utilizing the category to perform a natural language processing task.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for evaluating text classifier parameters based on semantic features. An example method comprises: performing a semantico-syntactic analysis of a natural language text of a corpus of natural language texts to produce a semantic structure representing a set of semantic classes; identifying a natural language text feature to be extracted using a set of values of a plurality of feature extraction parameters; partitioning the corpus of natural language texts into a training data set comprising a first plurality of natural language texts and a validation data set comprising a second plurality of natural language texts; determining, in view of the category of the training data set, the set of values of the feature extraction parameters; validating the set of values of the feature extraction parameters using the validation data set.
-
Citations
16 Claims
-
1. A method, comprising:
-
identifying a plurality of feature extraction parameters of a text classifier model, wherein the plurality of feature extraction parameters comprises a first attribute of a first semantic class and a second attribute of a second semantic class, wherein a value of the second attribute is produced by applying a pre-defined transformation to a value of the first attribute; partitioning a corpus of natural language texts into a training data set comprising a first plurality of natural language texts and a validation data set comprising a second plurality of natural language texts; determining, in view of the training data set, a set of values of the feature extraction parameters, which maximizes a number of natural language texts of the validation data set that are classified correctly by the text classifier model using the set of values of the feature extraction parameters; performing, by a processing device, a semantico-syntactic analysis of an input natural language text to produce a semantic structure representing a set of semantic classes; producing a plurality of values by applying, to the semantic structure representing the input natural language text, the text classifier model using the set of values of the feature extraction parameters, wherein each value of the plurality of values reflects a degree of association of the input natural language text with a particular category of natural language texts; associating the input natural language text with a category corresponding to an optimal value among the plurality of values; and utilizing the category to perform a natural language processing task. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
-
identifying a plurality of hyper-parameters of a text classifier model, wherein the plurality of hyper-parameters include a number of nearest neighbors to be analyzed by the text classifier model; partitioning a corpus of natural language texts into a training data set comprising a first plurality of natural language texts and a validation data set comprising a second plurality of natural language texts; determining, in view of the training data set, a set of values of the hyper-parameters of the text classifier model, which maximizes a number of natural language texts of the validation data set that are classified correctly by the text classifier model using the set of values of the hyper-parameters; performing, by a processing device, a semantico-syntactic analysis of an input natural language text to produce a semantic structure representing a set of semantic classes; producing a plurality of values by applying, to the semantic structure representing the input natural language text, the text classifier model using the set of values of the hyper-parameters, wherein each value of the plurality of values reflects a degree of association of the input natural language text with a particular category of natural language texts; associating the input natural language text with a category corresponding to an optimal value among the plurality of values; and utilizing the category to perform a natural language processing task. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system, comprising:
-
a memory; a processor, coupled to the memory, the processor configured to; identify a plurality of feature extraction parameters of a text classifier model, wherein the plurality of feature extraction parameters comprises a first attribute of a first semantic class and a second attribute of a second semantic class, wherein a value of the second attribute is produced by applying a pre-defined transformation to a value of the first attribute; partition a corpus of natural language texts into a training data set comprising a first plurality of natural language texts and a validation data set comprising a second plurality of natural language texts; determine, in view of the training data set, a set of values of the feature extraction parameters, which maximizes a number of natural language texts of the validation data set that are classified correctly by the text classifier model using the set of values of the feature extraction parameters; perform a semantico-syntactic analysis of an input natural language text to produce a semantic structure representing a set of semantic classes; produce a plurality of values by applying, to the semantic structure representing the input natural language text, the text classifier model using the set of values of the feature extraction parameters, wherein each value of the plurality of values reflects a degree of association of the input natural language text with a particular category of natural language text; associate the input natural language text with a category corresponding to an optimal value among the plurality of values; and utilize the category to perform a natural language processing task. - View Dependent Claims (12, 13)
-
-
14. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computer system, cause the computer system to:
-
identify a plurality of hyper-parameters of a text classifier model, wherein the plurality of hyper-parameters include a number of nearest neighbors to be analyzed by the text classifier model; partition a corpus of natural language texts into a training data set comprising a first plurality of natural language texts and a validation data set comprising a second plurality of natural language texts; determine, in view of the training data set, a set of values of the hyper-parameters of the text classifier model, which maximizes a number of natural language texts of the validation data set that are classified correctly by the text classifier model using the set of values of the hyper-parameters; perform a semantico-syntactic analysis of an input natural language text to produce a semantic structure representing a set of semantic classes; and produce a plurality of values by applying, to the semantic structure representing the input natural language text, the text classifier model using the set of values of the hyper-parameters, wherein each value of the plurality of values reflects a degree of association of the input natural language text with a particular category of natural language texts; associate the input natural language text with a category corresponding to an optimal value among the plurality of values; and utilize the category to perform a natural language processing task. - View Dependent Claims (15, 16)
-
Specification