Phonetic Filtering of Undesired Email Messages

US 20100077051A1
Filed: 11/30/2009
Published: 03/25/2010
Est. Priority Date: 10/14/2003
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

training an email system for determining spam, where training includes at least the following;

tokenizing at least a portion of a first email message to create a token;

determining, from the token, a spam probability for the first email message;

in response to a determination that a spam probability from the token indicates that the first email message is likely spam, determining whether the generated token is present in a database of tokens, in response to a determination the generated token is not present in the database of tokens, assigning a probability value for the generated token as spam; and

in response to a determination that the spam probability from the generated token indicates that the first email message is not likely spam, determining whether the generated token is present in a database of tokens; and

filtering a second email message according to the training.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Several embodiments, among others, provided in the present disclosure teach a filtering of email messages for spam based on phonetic equivalents of words found in the email message. In some embodiments, an email message having a word is received, and a phonetic equivalent of the word is generated. Thereafter, the phonetic equivalent of the word is tokenized to generate a token representative of the phonetic equivalent. The generated token is then used to determine a spam probability.

86 Citations

View as Search Results

20 Claims

1. A method comprising:
- training an email system for determining spam, where training includes at least the following;
  
  tokenizing at least a portion of a first email message to create a token;
  
  determining, from the token, a spam probability for the first email message;
  
  in response to a determination that a spam probability from the token indicates that the first email message is likely spam, determining whether the generated token is present in a database of tokens, in response to a determination the generated token is not present in the database of tokens, assigning a probability value for the generated token as spam; and
  
  in response to a determination that the spam probability from the generated token indicates that the first email message is not likely spam, determining whether the generated token is present in a database of tokens; and
  
  filtering a second email message according to the training.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein tokenizing at least a portion of a first email message includes tokenizing at least one of the following:
    - at least one word in the first email message, at least one email address associated with the first email message, at least one domain name associated with the first email message, and at least one attachment of the first email message.
  - 3. The method of claim 1, further comprising generating a phonetic equivalent of a word in the first email message, wherein generating a phonetic equivalent of a word comprises:
    - identifying a string of characters, the string of characters including a non-alphabetic character; and
      
      removing the non-alphabetic character from the string of characters.
  - 4. The method of claim 3, wherein removing the non-alphabetic character comprises:
    - locating a non-alphabetic character within the string of characters.
  - 5. The method of claim 1, wherein determining the spam probability comprises:
    - assigning a spam probability value to the token; and
      
      generating a Bayesian probability value using the spam probability value assigned to the token.
  - 6. The method of claim 5, wherein determining the spam probability further comprises:
    - comparing the generated Bayesian probability value with a predefined threshold value.
  - 7. The method of claim 6, wherein determining the spam probability further comprises:
    - categorizing the first email message as spam in response to the Bayesian probability value being greater than the predefined threshold value.
  - 8. The method of claim 6, wherein determining the spam probability further comprises:
    - categorizing the first email message as non-spam in response to the Bayesian probability value being not greater than the predefined threshold value.

9. A system comprising:
- a memory that stores;
  
  first tokenize logic configured to tokenize a phonetic equivalent of a word in a received email message;
  
  second tokenize logic configured to tokenize an attachment of the received email message;
  
  spam-determination logic configured to determine a spam probability value from the generated tokens; and
  
  sorting logic configured to sort generated tokens in accordance with the corresponding determined spam probability value.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, the memory further storing:
    - string-identification logic configured to identify a string of characters, the string of characters including a non-alphabetic character; and
      
      character-removal logic configured to remove the non-alphabetic character from the string of characters.
  - 11. The system of claim 10, the memory further storing:
    - spam-probability logic configured to assign a spam probability value to the token; and
      
      Bayesian logic configured to generate a Bayesian probability value using the spam probability value assigned to the token.
  - 12. The system of claim 11, the memory further storing:
    - compare logic configured to compare the generated Bayesian probability value with a predefined threshold value.
  - 13. The system of claim 12, the memory further storing:
    - spam-categorization logic configured to categorize the received email message as spam in response to the Bayesian probability value being greater than the predefined threshold value.
  - 14. The system of claim 12, the memory further storing:
    - spam-categorization logic configured to categorize the received email message as non-spam in response to the Bayesian probability value being not greater than the predefined threshold value.

15. A computer-readable medium that includes a program that, when executed by a computer, causes the computer to perform at least the following:
- generate a phonetic equivalent of word from a received email message;
  
  tokenize the phonetic equivalent of the word to create a token;
  
  determine a spam probability from the token; and
  
  sort the generated token in accordance with the corresponding determined spam probability value.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable medium of claim 15, the program further causing the computer to perform at least the following:
    - identify a string of characters, the string of characters including a non-alphabetic character; and
      
      remove the non-alphabetic character from the string of characters.
  - 17. The computer-readable medium of claim 15, the program further causing the computer to perform at least the following:
    - assign a spam probability value to the token; and
      
      generate a Bayesian probability value using the spam probability value assigned to the token.
  - 18. The computer-readable medium of claim 17, the program further causing the computer to perform at least the following:
    - compare the generated Bayesian probability value with a predefined threshold value.
  - 19. The computer-readable medium of claim 18, the program further causing the computer to perform at least the following:
    - categorize the received email message as spam in response to the Bayesian probability value being greater than the predefined threshold value.
  - 20. The computer-readable medium of claim 18, the program further causing the computer to perform at least the following:
    - categorize the received email message as non-spam in response to the Bayesian probability value being not greater than the predefined threshold value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Malik, Dale W., Daniell, W. Todd

Granted Patent

US 7,949,718 B2
Time in Patent Office

Days
Field of Search
US Class Current

709/206
CPC Class Codes

G06Q 10/107 Computer-aided management o...

H04L 51/212 using filtering or selectiv...

Phonetic Filtering of Undesired Email Messages

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

86 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Phonetic Filtering of Undesired Email Messages

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

86 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links