Spam filtering using feature relevance assignment in neural networks

US 8,131,655 B1
Filed: 05/30/2008
Issued: 03/06/2012
Est. Priority Date: 05/30/2008
Status: Active Grant

First Claim

Patent Images

1. A spam filtering method comprising employing a computer system to perform the steps of:

computing a set of pattern relevancies for a set of feature patterns, wherein at least one pattern relevance of the set of pattern relevancies is computed according to a set of feature relevance weights determined through a process external to neuronal training; and

classifying a target message as spam or ham according to a result of a processing of the target message by a neural network filter according to the set of pattern relevancies by;

assigning a pattern relevance of the set of pattern relevancies to each neuron of a subset of neurons of the neural network filter;

computing a target input vector characterizing the presence of a set of spam/ham identifying message features within the target message;

selecting an active neuron of the subset of neurons according to a scalar product between the target input vector and a first set of neuronal weights of the neural network filter;

computing a recognition score according to a scalar product between the target input vector and a second set of neuronal weights of the neural network filter, and according to the pattern relevance corresponding to the active neuron; and

comparing the recognition score to a predefined vigilance threshold.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some embodiments, a spam filtering method includes computing a pattern relevance for each of a set of message feature patterns, and using a neural network filter to classify incoming messages as spam or ham according to the pattern relevancies. Each message feature pattern is characterized by the simultaneous presence within a message of a specific set of message features (e.g., the presence of certain keywords within the message body, various message header heuristics, various message layout features, etc.). Each message feature may be spam- or ham-identifying, and may receive a tunable feature relevance weight from an external source (e.g. data file and/or human operator). The external feature relevance weights modulate the set of neuronal weights calculated through a training process of the neural network.

Citations

16 Claims

1. A spam filtering method comprising employing a computer system to perform the steps of:
- computing a set of pattern relevancies for a set of feature patterns, wherein at least one pattern relevance of the set of pattern relevancies is computed according to a set of feature relevance weights determined through a process external to neuronal training; and
  
  classifying a target message as spam or ham according to a result of a processing of the target message by a neural network filter according to the set of pattern relevancies by;
  
  assigning a pattern relevance of the set of pattern relevancies to each neuron of a subset of neurons of the neural network filter;
  
  computing a target input vector characterizing the presence of a set of spam/ham identifying message features within the target message;
  
  selecting an active neuron of the subset of neurons according to a scalar product between the target input vector and a first set of neuronal weights of the neural network filter;
  
  computing a recognition score according to a scalar product between the target input vector and a second set of neuronal weights of the neural network filter, and according to the pattern relevance corresponding to the active neuron; and
  
  comparing the recognition score to a predefined vigilance threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, comprising computing the at least one pattern relevance according to a training input vector characterizing the presence of the set of spam/ham identifying message features within a training message, and wherein the at least one pattern relevance is substantially equal to:
  - 3. The method of claim 1, comprising computing the at least one pattern relevance according to a training input vector characterizing the presence of the set of spam/ham identifying message features within a training message, and wherein the at least one pattern relevance is substantially equal to:
  - 4. The method of claim 1, wherein computing the at least one pattern relevance comprises:
    - assigning a training pattern relevance to each neuron of the subset of neurons of the neural network filter;
      
      for each training message of a message corpus,computing a training input vector characterizing a presence of a set of spam/ham identifying message features within said each training message;
      
      selecting a training active neuron of the subset of neurons according to a scalar product between the training input vector and a third set of neuronal weights of the neural network filter;
      
      computing a training recognition score according to a scalar product between the target input vector and a fourth set of neuronal weights of the neural network filter, and according to the training pattern relevance corresponding to the training active neuron;
      
      performing a test classification of said each training message as spam or ham;
      
      modifying the training pattern relevance of the training active neuron according to the training recognition score and a result of the test classification; and
      
      setting each pattern relevance of the set of pattern relevancies substantially equal to a corresponding training pattern relevance.
  - 5. The method of claim 4, wherein the training recognition score is substantially equal to:
    - Σ
      
      ₁=α
      
      ₁R_NN+α
      
      ₂R_actwherein R_NNis substantially equal to
  - 6. The method of claim 4, wherein the training recognition score is substantially equal to:
    - Σ
      
      ₂=α
      
      ₁R_NN+α
      
      ₂R_act+α
      
      ₃R_Fwherein R_NNis substantially equal to
  - 7. The method of claim 1, wherein the recognition score is substantially equal to:
    - Σ
      
      ₁=α
      
      ₁R_NN+α
      
      ₂R_actwherein R_NNis substantially equal to
  - 8. The method of claim 1, wherein the recognition score is substantially equal to:
    - Σ
      
      ₂=α
      
      ₁R_NN+α
      
      ₂R_act+α
      
      ₃R_Fwherein R_NNis substantially equal to

9. A non-transitory computer-readable medium storing instructions, which, when executed by a computer system, cause the computer system to form:
- a training classifier configured to compute a set of pattern relevancies for a set of feature patterns, wherein at least one pattern relevance of the set of pattern relevancies is computed according to a set of feature relevance weights determined through a process external to neuronal training; and
  
  a neural network filter configured to classify a target message as spam or ham according to the set of pattern relevancies byassigning a pattern relevance of the set of pattern relevancies to each neuron of a subset of neurons of the neural network filter;
  
  computing a target input vector characterizing the presence of a set of spam/ham identifying message features within the target message;
  
  selecting an active neuron of the subset of neurons according to a scalar product between the target input vector and a first set of neuronal weights of the neural network filter;
  
  computing a recognition score according to a scalar product between the target input vector and a second set of neuronal weights of the neural network filter, and according to the pattern relevance corresponding to the active neuron; and
  
  comparing the recognition score to a predefined vigilance threshold.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the training classifier is configured to compute the at least one pattern relevance according to a training input vector characterizing the presence of the set of spam/ham identifying message features within a training message, and wherein the at least one pattern relevance is substantially equal to:
  - 11. The system of claim 9, wherein the training classifier is configured to compute the at least one pattern relevance according to a training input vector characterizing the presence of the set of spam/ham identifying message features within a training message, and wherein the at least one pattern relevance is substantially equal to:
  - 12. The system of claim 9, wherein the training classifier is configured to:
    - assign a training pattern relevance to each neuron of a subset of training neurons of the training classifier;
      
      for each training message of a message corpus,compute a training input vector characterizing the presence of a set of spam/ham identifying message features within the each training message;
      
      select an active training neuron of the subset of training neurons according to a scalar product between the training input vector and a first set of neuronal weights of the training classifier;
      
      compute a training recognition score according to a scalar product between the target input vector and a second set of neuronal weights of the training classifier, and according to the initial training pattern relevance corresponding to the active training neuron;
      
      perform a test classification of the each training message as spam or ham;
      
      modify the training pattern relevance of the active training neuron according to the training recognition score and a result of the test classification; and
      
      set the pattern relevancies substantially equal to the training pattern relevancies, respectively.
  - 13. The system of claim 12, wherein the training classifier is configured to compute the recognition score according to:
    - Σ
      
      ₁=α
      
      ₁R_NN+α
      
      ₂R_act,wherein Σ
      
      ₁is the training recognition score, wherein R_NNis substantially equal to
  - 14. The system of claim 12, wherein the training classifier is configured to compute the training recognition score according to:
    - Σ
      
      ₂=α
      
      ₁R_NN+α
      
      ₂R_act+α
      
      ₃R_Fwherein Σ
      
      ₂is the training recognition score, wherein R_NNis substantially equal to
  - 15. The system of claim 9, wherein the neural network filter is configured to compute the recognition score according to:
    - Σ
      
      ₁=α
      
      ₁R_NN+α
      
      ₂R_actwherein Σ
      
      ₁is the recognition score, wherein R_NNis substantially equal to
  - 16. The system of claim 9, wherein the neural network filter is configured to compute the recognition score according to:
    - Σ
      
      ₂=α
      
      ₁R_NN+α
      
      ₂R_act+α
      
      ₃R_Fwherein Σ
      
      ₂is the recognition score, wherein R_NNis substantially equal to

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bitdefender IPR Management Limited (Bitdefender LLC)
Original Assignee
Bitdefender IPR Management Limited (Bitdefender LLC)
Inventors
Cosoi, Alexandru C, Vlad, Madalin S, Sgarciu, Valentin
Primary Examiner(s)
Holmes, Michael B

Application Number

US12/130,630
Time in Patent Office

1,376 Days
Field of Search

706/12
US Class Current

706/12
CPC Class Codes

G06N 3/0409   Adaptive resonance theory [...

G06N 3/045   Combinations of networks

H04L 51/212   using filtering or selectiv...

Spam filtering using feature relevance assignment in neural networks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Spam filtering using feature relevance assignment in neural networks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links