Phrase-based data classification system
First Claim
Patent Images
1. A method comprising:
- receiving text data items;
receiving a set of classes into which the text data items are to be classified;
selecting a phrase-based classifier to classify the text data items into the set of classes; and
applying the phrase-based classifier to classify the text data items into the classes, the applying including;
creating a controlled vocabulary pertaining to classifying the text data items into the set of classes;
building phrases based on the text data items and the controlled vocabulary;
classifying, using at least one processor, the text data items into the set of classes based on the phrases; and
reclassifying a text data item of the text data items into the set of classes based on a consistency error.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of classifying data is disclosed. Text data items are received. A set of classes into which the text data items are to be classified is received. A phrase-based classifier to classify the text data items into the set of classes is selected. The phrase-based classifier is applied to classify the text data items into the classes. Here, the applying includes creating a controlled vocabulary pertaining to classifying the text data items into the set of classes, building phrases based on the text data items and the controlled vocabulary, and classifying the text data items into the set of classes based on the phrases.
8 Citations
14 Claims
-
1. A method comprising:
-
receiving text data items; receiving a set of classes into which the text data items are to be classified; selecting a phrase-based classifier to classify the text data items into the set of classes; and applying the phrase-based classifier to classify the text data items into the classes, the applying including; creating a controlled vocabulary pertaining to classifying the text data items into the set of classes; building phrases based on the text data items and the controlled vocabulary; classifying, using at least one processor, the text data items into the set of classes based on the phrases; and reclassifying a text data item of the text data items into the set of classes based on a consistency error. - View Dependent Claims (4, 5, 6, 12)
-
-
2. A system comprising:
-
at least one processor; and a data classification system implemented by the at least one processor and including; an online module configured to; receive text data items; receive a set of classes into which the text data items are to be classified; select a phrase-based classifier to classify the text data items into the set of classes; and apply the phrase-based classifier to classify the text data items into the classes, the applying including; receiving a set of classes into which text data items are to be classified; creating a controlled vocabulary pertaining to classifying the text data items into the set of classes; building phrases based on the text data items and the controlled vocabulary; and classifying the text data items into the set of classes based on the phrases; and reclassifying a text data item of the text data items into the set of classes based on a consistency error. - View Dependent Claims (7, 8, 9, 13)
-
-
3. A non-transitory machine readable medium embodying a set of instructions that, when executed by a processor, cause the processor to perform a method, the method comprising:
-
receiving text data items; receiving a set of classes into which the text data items are to be classified; selecting a phrase-based classifier to classify the text data items into the set of classes; and applying the phrase-based classifier to classify the text data items into the classes, the applying including; creating a controlled vocabulary pertaining to classifying the text data items into the set of classes; building phrases based on the text data items and the controlled vocabulary; and classifying, using at least one processor, the text data items into the set of classes based on the phrases; and reclassifying a text data item of the text data items into the set of classes based on a consistency error. - View Dependent Claims (10, 11, 14)
-
Specification