×

Spam filtering based on statistics and token frequency modeling

  • US 8,364,766 B2
  • Filed: 12/04/2008
  • Issued: 01/29/2013
  • Est. Priority Date: 12/04/2008
  • Status: Active Grant
First Claim
Patent Images

1. A network device, comprising:

  • a transceiver device that is operative to send and receive data over a network; and

    a processor device that is operative to perform actions, comprising;

    receiving a message;

    determining a plurality of tokens from the received message based in part on a text body of the received message;

    analyzing the plurality of tokens to assign probability values that the received message is classifiable as one of a plurality of message classes, including a spam message and a non-spam message;

    selecting a message class for the received message based on a comparison of the assigned probability values, wherein a probability value is associated with each of the plurality of message classes, wherein the assigned probability values represent a plurality of complement probability values for each of the plurality of message classes, and wherein selecting the message class further comprises selecting a message class having a lowest complement probability value;

    providing the message class selected, a list of tokens with associated token frequencies, and the plurality of tokens to a token frequency component that is configured for the selected message class, wherein the list of tokens are determined for the message class selected; and

    using the token frequency component to determine a number of tokens in the plurality of tokens that result in an associated token frequency for each matching token in the list of tokens exceeding a token frequency threshold, wherein each number of tokens resulting in the associated token frequency and is selectively decremented over time as a period of time expires for each corresponding token; and

    based on a comparison between a number of matching tokens in the received message for the selected message class to a matched token threshold provided by the frequency threshold component, identifying whether the received message is a spam message or a non-spam message.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×