Method and system for classifying electronic text messages and spam messages
First Claim
1. A method of identifying electronic text messages as spam, the method comprising:
- (a) creating a hierarchic list of spam message categories and sub-categories, wherein the hierarchic list defines properties of key terms within the spam message categories and sub-categories;
(b) composing a database of the key terms and a database of sample messages in a human language for each of the spam message categories and message templates for sub-categories, wherein the key terms are identified using human language-specific variants of a combination of separate words in a particular human language;
(c) defining at least one spam message category from the hierarchic list of the spam message categories for which (i) a weight factor of a morphologically transformed text message exceeds a first pre-determined threshold or (ii) a similarity score of the text message exceeds a second pre-determined threshold, wherein the weight factor value and the similarity score value are compared against the respective threshold values using a precise matching comparison; and
(d) associating with the at least one spam message category the text message having (i) the weight factor value exceeding the first threshold or (ii) the similarity score value exceeding the second threshold, wherein the properties of the key terms within the spam message categories are any of;
a frequency of occurrence of the key term within the message;
a location of the key term within the message; and
a number of separate words in the key term.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for classifying electronic text messages include creating a hierarchical list of message categories, composing databases of key terms and sample phrases for each of such categories, and, based on a number and features of the key terms detected in an analyzed text message, determining if the text message is associated with at least one message category of interest. Variants of the key terms or can be produced using fuzzy text objects generation algorithms. Weight factors for the key terms and similarity scores of a text message compared to previously identified sample messages for a particular message category are calculated based on properties of the key terms detected in the text message, such as a frequency of use, location, or appearance in the text message, a number of words in the respective key terms.
-
Citations
20 Claims
-
1. A method of identifying electronic text messages as spam, the method comprising:
-
(a) creating a hierarchic list of spam message categories and sub-categories, wherein the hierarchic list defines properties of key terms within the spam message categories and sub-categories; (b) composing a database of the key terms and a database of sample messages in a human language for each of the spam message categories and message templates for sub-categories, wherein the key terms are identified using human language-specific variants of a combination of separate words in a particular human language; (c) defining at least one spam message category from the hierarchic list of the spam message categories for which (i) a weight factor of a morphologically transformed text message exceeds a first pre-determined threshold or (ii) a similarity score of the text message exceeds a second pre-determined threshold, wherein the weight factor value and the similarity score value are compared against the respective threshold values using a precise matching comparison; and (d) associating with the at least one spam message category the text message having (i) the weight factor value exceeding the first threshold or (ii) the similarity score value exceeding the second threshold, wherein the properties of the key terms within the spam message categories are any of; a frequency of occurrence of the key term within the message; a location of the key term within the message; and a number of separate words in the key term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18, 19, 20)
-
-
13. A system for classifying electronic text messages, the system comprising:
-
a processor; and a non-transitory memory device storing instructions of an operating system and an application program that, when executed by the processor, is adapted to provide; (a) a hierarchic list of spam message categories and sub-categories, wherein the hierarchic list defines properties of key terms within the spam message categories; (b) a database of the key terms and a database of sample messages in a human language for each of the spam message categories and message templates for sub-categories, wherein the key terms are identified using human language-specific variants of a combination of separate words in a particular human language; (c) wherein the system defines at least one spam message category from the hierarchic list of the spam message categories for which (i) a weight factor of a morphologically transformed text message exceeds a first pre-determined threshold or (ii) a similarity score of the text message exceeds a second pre-determined threshold, wherein the weight factor value and the similarity score value are compared against the respective threshold values using precise matching comparison; and (d) the at least one spam message category is associated with the text message having (i) the weight factor value exceeding the first threshold or (ii) the similarity score value exceeding the second threshold, wherein the properties of the key terms within the spam message categories are any of; a frequency of occurrence of the key term within the message; a location of the key term within the message; and a number of separate words in the key term. - View Dependent Claims (14, 15, 16, 17)
-
Specification