SYSTEM AND METHOD FOR EMAIL CLASSIFICATION
First Claim
Patent Images
1. A system for providing simplified end-to-end security for computing devices in standalone, LAN, WAN or Internet architectures;
- said system comprising;
an email processing module, comprising computer-executable code stored in non-volatile memory,a machine learning module, comprising computer-executable code stored in non-volatile memory,a processor, anda communications means,wherein said email processing module, said machine learning module, said processor, and said communications means are operably connected and are configured to;
receive an email;
remove hypertext markup language (HTML) from said email;
remove white space, new line, carriage returns (CR) and tabs from said email;
convert all text contained in said email to lowercase characters;
compare text to relationship terms stored in a relationship term database;
tag text matching one or more of said relationship terms;
tag text comprising dates, numbers, indicators of time, measurement units, and currency symbols;
tag text comprising parts of speech;
compare text to lemmatize terms stored in a lemmatize dictionary database;
tag text matching one or more lemmatize terms;
remove non-essential punctuation from said text;
calculate and weigh term frequency in said text using term frequency inverse document frequency;
eliminate one or more terms with the lowest calculated weight; and
classify said email based on remaining tags and terms.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention generally relates to an improved system and method for providing email classification. Specifically, the present invention relates to an email classification system and method for analyzing the signature of an email for proper classification.
-
Citations
18 Claims
-
1. A system for providing simplified end-to-end security for computing devices in standalone, LAN, WAN or Internet architectures;
- said system comprising;
an email processing module, comprising computer-executable code stored in non-volatile memory, a machine learning module, comprising computer-executable code stored in non-volatile memory, a processor, and a communications means, wherein said email processing module, said machine learning module, said processor, and said communications means are operably connected and are configured to; receive an email; remove hypertext markup language (HTML) from said email; remove white space, new line, carriage returns (CR) and tabs from said email; convert all text contained in said email to lowercase characters; compare text to relationship terms stored in a relationship term database; tag text matching one or more of said relationship terms; tag text comprising dates, numbers, indicators of time, measurement units, and currency symbols; tag text comprising parts of speech; compare text to lemmatize terms stored in a lemmatize dictionary database; tag text matching one or more lemmatize terms; remove non-essential punctuation from said text; calculate and weigh term frequency in said text using term frequency inverse document frequency; eliminate one or more terms with the lowest calculated weight; and classify said email based on remaining tags and terms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- said system comprising;
-
11. A method for classifying emails, said method comprising the steps of:
-
receiving an email at an email processing module, comprising computer-executable code stored in non-volatile memory; removing hypertext markup language (HTML) from said email; removing multiple white space, and tabs from said email; converting all text contained in said email to lowercase characters; comparing text to relationship terms stored in a relationship term database; tagging text matching one or more of said relationship terms; tagging text comprising dates, numbers, indicators of time, measurement units, and currency symbols; tagging text comprising parts of speech; comparing text to lemmatize terms stored in a lemmatize dictionary database; tagging text matching one or more lemmatize terms; removing non-essential punctuation from said text; calculating and weigh term frequency in said text using term frequency inverse document frequency; eliminating one or more terms with the lowest calculated weight; and classifying said email based on remaining tags and terms. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
Specification