AUTOMATED DETECTION OF DECEPTION IN SHORT AND MULTILINGUAL ELECTRONIC MESSAGES

US 20120254333A1
Filed: 04/25/2012
Published: 10/04/2012
Est. Priority Date: 01/07/2010
Status: Abandoned Application

First Claim

Patent Images

1. A method of detecting deception in electronic messages, comprising:

(a) obtaining a first set of electronic messages;

(b) subjecting the first set to model-based clustering analysis to identify training data;

(c) building a first suffix tree using the training data for deceptive messages;

(d) building a second suffix tree using the training data for non-deceptive messages;

(e) assessing an electronic message to be evaluated via comparison of the message to the first and second suffix trees and scoring the degree of matching to both to classify the message as deceptive or non-deceptive based upon the respective scores.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for automatically identifying harmful electronic messages, such as those presented in emails, on Craigslist or on Twitter, Facebook and other social media websites, features methodology for discriminating unwanted garbage communications (spam) and unwanted deceptive messages (scam) from wanted, truthful communications based upon patterns discernable from samples of each type of electronic communication. Methods are proposed that enable discrimination of wanted from unwanted communications in short electronic messages, such as on Twitter and for multilingual application.

407 Citations

14 Claims

1. A method of detecting deception in electronic messages, comprising:
- (a) obtaining a first set of electronic messages;
  
  (b) subjecting the first set to model-based clustering analysis to identify training data;
  
  (c) building a first suffix tree using the training data for deceptive messages;
  
  (d) building a second suffix tree using the training data for non-deceptive messages;
  
  (e) assessing an electronic message to be evaluated via comparison of the message to the first and second suffix trees and scoring the degree of matching to both to classify the message as deceptive or non-deceptive based upon the respective scores.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the subjecting step (B) results in a diverse sample training set of messages from the first set by clustering the first set of messages and then applying model selection to select a message sample set and categorizing each message in the sample as either deceptive or not based upon expert evaluation, then labeling each message to yield a training set of data.
  - 3. The method of claim 2, further comprising the step of filtering the message by removing punctuation, removing stop words, and stemming, prior to the step of clustering.
  - 4. The method of claim 3, further comprising the step of representing the words of a message as a feature vector and setting the value of the feature as the normalized frequency of the word in the message, prior to the step of clustering.
  - 5. The method of claim 4, further comprising the step of reducing the feature space by Latent Semantic Analysis (LSA) prior to clustering.
  - 6. The method of claim 1, wherein the clustering is done by K-means clustering.
  - 7. The method of claim 1, wherein the best models are selected from the clusters generated by the step of clustering by (AIC and/or BIC).
  - 8. The method of claim 1, further comprising the step of utilizing the classification of the message to be evaluated to update one of the first and second suffix trees depending upon the classification as deceptive or non-deceptive.

9. A method of detecting deception in an electronic message M, comprising the steps of:
- (a) building training files D of deceptive messages and T of truthful messages;
  
  (b) building suffix trees SD and ST for files D and T, respectively;
  
  (c) traversing suffix trees SD and ST and determining different combinations and adaptive context;
  
  (d) determining the cross-entropy ED and ET between the electronic message M and each of the suffix trees SD and ST, respectively;
  
  thenif ED>
  
  ET, classify Message M as deceptive;
  
  orif ET>
  
  ED, classify message M as truthful.

10. A method for automatically categorizing an electronic message in a foreign language as wanted or unwanted, comprising the steps of:
- (a) collecting a sample corpus of a plurality of wanted and unwanted messages in a domestic language with known categorization as wanted or unwanted;
  
  (b) testing the corpus in the domestic language by an automated testing method to discern wanted and unwanted messages and scoring detection effectiveness associated with the automated testing method by comparing the automatic testing categorization results to the known categorization;
  
  (c) translating the corpus into a foreign language with a translation tool;
  
  (d) testing the corpus in the foreign language by the automated testing method and scoring detection effectiveness associated with the automated testing method;
  
  (e) if the detection effectiveness score in the foreign language indicates acceptable detection accuracy, then using the testing method and the translation tool to categorize electronic messages as wanted or unwanted.
- View Dependent Claims (11, 12, 13)
- - 11. The method of claim 10, wherein a plurality of automated testing methods are available and further comprising the steps of testing in steps (b) and steps (d) with each of the plurality of automated testing methods and selecting an automated testing method with the best detection accuracy.
  - 12. The method of claim 10, wherein there are a plurality of translation tools available and further comprising the steps of translating in step (c) using each of the plurality of translation tools and then executing steps (d) and (e) for each of the different translation tools and then selecting a translation tool of the plurality that results in the best detection accuracy.
  - 13. The method of claim 10 wherein there are a plurality of automated testing methods available and further comprising the steps of testing in steps (b) and steps (d) with each of the plurality of automated testing methods and wherein there are a plurality of translation tools available and further comprising the steps of translating in step (c) using each of the plurality of translation tools and then executing steps (d) and (e) for each of the different translation tools, such that all the possible combinations of automated testing methods and translation tools are exercised and then selecting a combination of automated testing method and translation tool that results in the best detection accuracy.

14. A system for detecting deception in communications, comprising:
- a computer programmed with software that automatically analyzes a text message in digital form for deceptiveness by at least one of statistical analysis of text content to ascertain and evaluate pscho-linguistic cues that are present in the text message, authorship similarity analysis, and analysis to detect coded/camouflages messages,and a computer having means to obtain the text message in digital form and store the text message within a memory of said computer, and the computer having means to access truth data against which the veracity of the text message can be compared and a graphical user interface through which a user of said system can control said system and receive results concerning the deceptiveness of the text message analyzed by said system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Trustees of The Stevens Institute of Technology (Stevens Institute Of Technology)
Original Assignee
The Trustees of The Stevens Institute of Technology (Stevens Institute Of Technology)
Inventors
Chandramouli, Rajarathnam, Chen, Xiaoling, Subbalakshmi, Koduvayur P., Hao, Peng, Cheng, Na, Perera, Rohan

Application Number

US13/455,862
Publication Number

US 20120254333A1
Time in Patent Office

Days
Field of Search
US Class Current

709/206
CPC Class Codes

G06F 40/10   Text processing natural lan...

G06F 40/20   Natural language analysis s...

G06F 40/40   Processing or translation o...

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G06N 5/04   Inference or reasoning models

G06Q 10/107   Computer-aided management o...

AUTOMATED DETECTION OF DECEPTION IN SHORT AND MULTILINGUAL ELECTRONIC MESSAGES

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

407 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATED DETECTION OF DECEPTION IN SHORT AND MULTILINGUAL ELECTRONIC MESSAGES

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

407 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links