Fuzzy text categorizer
First Claim
Patent Images
1. A method for classifying a text object, comprising:
- extracting a set of features from the text object;
the set of features having a plurality of features;
constructing a document class fuzzy set with a plurality of ones of the set of features extracted from the text object;
each of the ones of the features extracted from the text object having a degree of membership in the document class fuzzy set and a plurality of class fuzzy sets of a knowledge base;
measuring a degree of match between each of the plurality of class fuzzy sets and the document class fuzzy set; and
using the measured degree of match to assign the text object a label that satisfies a selected decision making rule;
wherein the document class fuzzy set is computed by;
calculating a frequency of occurrence for each feature in the set of features in the text object;
normalizing the frequency of occurrence of each feature in the set of features; and
transforming the normalized frequency of occurrence of each feature in the set of features to define the document class fuzzy set.
9 Assignments
0 Petitions
Accused Products
Abstract
A text categorizer classifies a text object into one or more classes. The text categorizer includes a pre-processing module, a knowledge base, and an approximate reasoning module. The pre-processing module performs feature extraction, feature reduction, and fuzzy set generation to represent an unlabelled text object in terms of one or more fuzzy sets. The approximate reasoning module uses a measured degree of match between the one or more fuzzy set and categories represented by fuzzy rules in the knowledge base to assign labels of those categories that satisfy a selected decision making rule.
-
Citations
26 Claims
-
1. A method for classifying a text object, comprising:
-
extracting a set of features from the text object;
the set of features having a plurality of features;
constructing a document class fuzzy set with a plurality of ones of the set of features extracted from the text object;
each of the ones of the features extracted from the text object having a degree of membership in the document class fuzzy set and a plurality of class fuzzy sets of a knowledge base;
measuring a degree of match between each of the plurality of class fuzzy sets and the document class fuzzy set; and
using the measured degree of match to assign the text object a label that satisfies a selected decision making rule;
wherein the document class fuzzy set is computed by;
calculating a frequency of occurrence for each feature in the set of features in the text object;
normalizing the frequency of occurrence of each feature in the set of features; and
transforming the normalized frequency of occurrence of each feature in the set of features to define the document class fuzzy set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for classifying a text object, comprising:
-
extracting a set of granule features from the text object;
each granule feature being represented by a plurality of fuzzy sets and associated labels;
constructing a document granule feature fuzzy set using a plurality of ones of the granule features extracted from the text object;
each of the ones of the granule features extracted from the text object having a degree of membership in a corresponding granule feature fuzzy set of the document granule feature fuzzy set and a plurality of class granule feature fuzzy sets of a knowledge base;
computing a degree of match between each of the plurality of class granule feature fuzzy sets and the document granule feature fuzzy set to provide a degree of match for each of the ones of the granule features;
aggregating each degree of match of the ones of the granule features to define an overall degree of match for each feature; and
using the overall degree of match for each feature to assign the text object a class label that satisfies a selected decision making rule. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A text categorizer for classifying a text object, comprising:
-
a knowledge base for storing categories represented by class fuzzy sets and associated class labels;
a pre-processing module for representing a plurality of extracted features from the text object as a document class fuzzy set; and
an approximate reasoning module for using a measured degree of match between the class fuzzy sets in the knowledge base and the document class fuzzy set to assign to the text object the associated class labels of those categories that satisfy a selected decision making rule;
wherein the pre-processing module further comprises a fuzzy set generator for;
calculating a frequency of occurrence for the plurality of features extracted from the text object;
normalizing the frequency of occurrence of each feature of the plurality of features extracted from the text object; and
transforming the normalized frequency of occurrence of each of the plurality of features extracted from the text object to define the document class fuzzy set. - View Dependent Claims (22, 23, 24)
-
-
25. A text categorizer for classifying a text object, comprising:
-
a feature extractor for extracting a set of granule features from the text object;
each granule feature being represented by a plurality of fuzzy sets and associated labels;
a fuzzy set generator for constructing a document granule feature fuzzy set using a plurality of ones of the granule features extracted from the text object;
each of the ones of the granule features extracted from the text object having a degree of membership in a corresponding granule feature fuzzy set of the document granule feature fuzzy set and a plurality of class granule feature fuzzy sets of a knowledge base; and
an approximate reasoning module for;
computing a degree of match between each of the plurality of class granule feature fuzzy sets and the document granule feature fuzzy set to provide a degree of match for each of the ones of the granule features;
aggregating each degree of match of the ones of the granule features to define an overall degree of match for each feature; and
using the overall degree of match for each feature to assign the text object a class label that satisfies a selected decision making rule. - View Dependent Claims (26)
-
Specification