Machine-learning based classification of user accounts based on email addresses and other account information

US 9,189,746 B2
Filed: 01/12/2012
Issued: 11/17/2015
Est. Priority Date: 01/12/2012
Status: Active Grant

First Claim

Patent Images

1. A method performed by one or more processors configured with computer-executable instructions, the method comprising:

receiving an account associated with information including an email address;

extracting one or more features from the information associated with the account, wherein at least one of the one or more features is based on memorability of the email address, the memorability relating to a pattern of symmetry, anti-symmetry, or uniformly distanced characters in the email address; and

determining a trust level of the account at least partly based on the extracted features.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A trust level of an account is determined at least partly based on a degree of the memorability of an email address associated with the account. Additional features such as those based on the domain of the email address and those from the additional information such as name, phone number, and address associated with the account may also be used to determine the trust level of the account. A machine learning process may be used to learn a classification model based on one or more features that distinguish a malicious account from a benign account from training data. The classification model is used to determine a trust level of the account, and/or if the account is malicious or benign, and may be continuously improved by incrementally adapting or improving the model with new accounts.

Citations

19 Claims

1. A method performed by one or more processors configured with computer-executable instructions, the method comprising:
- receiving an account associated with information including an email address;
  
  extracting one or more features from the information associated with the account, wherein at least one of the one or more features is based on memorability of the email address, the memorability relating to a pattern of symmetry, anti-symmetry, or uniformly distanced characters in the email address; and
  
  determining a trust level of the account at least partly based on the extracted features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method as recited in claim 1, further comprising:
    - determining that the account is benign if the determined trust level is higher than a first preset threshold; and
      
      /ordetermining that the account is malicious if the determined trust level is lower than the first preset threshold or a second preset threshold that is different from the first preset threshold.
  - 3. The method as recited in claim 1, wherein the at least one of the one or more features comprises one or more features related to meaningful strings in the email address.
  - 4. The method as recited in claim 3, wherein the meaningful strings comprise one or more numbers that are convertible into one or more letters according to a set of rules.
  - 5. The method as recited in claim 1, wherein the at least one of the one or more features comprises one or more features related to pronounceable strings in the email address.
  - 6. The method as recited in claim 1, wherein at least one of the one or more features is based on a domain of the email address.
  - 7. The method as recited in claim 6, further comprising computing a trust level for the domain of the email address using at least one of a white list of domains, a black list of domains, a malicious list of domains, or a benign list of domains, wherein:
    - the white list of domains includes one or more domains that are assumed to be associated with benign accounts;
      
      the black list of domains includes one or more domains that are assumed to be associated with malicious accounts;
      
      the benign list of domains includes one or more domains and a count for each of the one or more domains that a respective domain is associated with accounts labeled as benign;
      
      orthe malicious list of domains includes one or more domains and a count for each of the one or more domains that a respective domain is associated with accounts labeled as malicious.
  - 8. The method as recited in claim 1, wherein:
    - the information further comprises additional information associated with the account and/or the email address, the additional information including a name, a phone number, an IP address of a source of the request, and/or an address associated with the account and/or the email address; and
      
      at least one of the one or more features is based on the additional information of the account.
  - 9. The method as recited in claim 8, further comprising computing a trust level for the additional information using at least one of a white list of additional information, a black list of additional information, a malicious list of additional information, or a benign list of additional information, wherein:
    - the white list of additional information includes one or more additional information that are assumed to be associated with benign accounts;
      
      the black list of additional information includes one or more additional information that are assumed to be associated with malicious accounts;
      
      the benign list of additional information includes one or more additional information and a count for each of the one or more additional information that respective additional information is associated with accounts labeled as benign; and
      
      the malicious list of additional information includes one or more additional information and a count for each of the one or more additional information that respective additional information is associated with accounts labeled as malicious.
  - 10. The method as recited in claim 1, wherein the determining the trust level of the account comprises:
    - analyzing a plurality of labeled accounts from one or more sources, each of the plurality of labeled accounts indicating that a respective labeled account is malicious or benign;
      
      determining one or more features extracted from the plurality of accounts that distinguish a respective labeled account that is malicious and a respective labeled account that is benign;
      
      applying one or more machine learning methods to build a classification model based on the obtained one or more features; and
      
      using the classification model to calculate a score of the trust level of the account.
  - 11. The method as recited in claim 10, wherein the one or more machine learning methods comprise a support vector machine (SVM) method.

12. One or more computing devices having stored thereupon a plurality of computer-executable instructions that, when executed by a processor, causes the processor to perform operations comprising:
- analyzing a plurality of labeled accounts from one or more sources, each of the plurality of labeled accounts indicating that a respective labeled account is malicious or benign;
  
  determining one or more features extracted from the plurality of accounts that distinguish a respective labeled account that is malicious and a respective labeled account that is benign, wherein at least one of the one or more features is based on memorability of email addresses, the memorability relating to a pattern of symmetry, anti-symmetry, or uniformly distanced characters in the email address;
  
  applying one or more machine learning methods to build a classification model based on the obtained one or more features; and
  
  using the classification model to determine a trust level of an incoming account.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The one or more computing devices as recited in claim 12, wherein one or more of the plurality of labeled accounts are associated with the email addresses.
  - 14. The one or more computing devices as recited in claim 12, wherein the one or more operations further comprise:
    - comparing the trust level of the incoming account determined by the classification model with a ground truth of the trust level of the incoming account; and
      
      improving the classification model based on a result of the comparison.
  - 15. The one or more computing devices as recited in claim 14, wherein the improving the classification model comprises:
    - receiving the ground truth of the trust level of the incoming account;
      
      comparing the ground truth with the trust level determined by the classification model to check an accuracy of determination of the classification model; and
      
      training the classification model incrementally at least partly based on a result of comparison.
  - 16. The one or more computing devices as recited in claim 12, wherein the one or more operations further comprise:
    - receiving one or more new labeled accounts; and
      
      adapting the classification model based on the one or more new labeled accounts.
  - 17. The one or more computing devices as recited in claim 12, wherein the one or more machine learning methods comprise a support vector machine (SVM) method.
  - 18. The one or more computing devices as recited in claim 12, wherein at least one of the one or more features comprises one or more features related to meaningful strings in the email address, and wherein the meaningful strings further compriseone or more numbers that are convertible into one or more letters according to a set of rules.

19. A system comprising:
- memory storing one or more modules;
  
  one or more processors operably coupled to the memory to execute one or more modules, the one or more modules including;
  
  a receiving module that receives an account, the account associated with an email address and/or additional information;
  
  a training module that uses one or more labeled data including a plurality of labeled accounts to learn a classification model based on one or more features from email addresses and/or additional information associated with the labeled accounts that distinguish a malicious account from a benign account at least partly based on memorability of the email addresses, the features including at least one of following;
  
  one or more features related to meaningful strings, the meaningful strings including one or more letters or numbers that are convertible according to a set of rules;
  
  one or more features related to pronounceable strings;
  
  one or more features related to a pattern including symmetry, anti-symmetry or uniformly distanced characters in the email address;
  
  one or more features related to a domain of the email address;
  
  orone or more features related to additional information associated with the account and/or the email address, the additional information including a name, a phone number, or an address associated with the account and/or the email address; and
  
  a determination module that uses the classification model to determine a trust level of the account.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Zhu, Bin Benjamin, Xue, Fei
Primary Examiner(s)
Hill, Stanley K
Assistant Examiner(s)
Misir, Dave

Application Number

US13/349,306
Publication Number

US 20130185230A1
Time in Patent Office

1,405 Days
Field of Search

726/22, 726/4, 726/21, 726/26, 706/12, 706/13, 706/20, 706/52, 709/204, 709/206, 709/223, 709/224
US Class Current

1/1
CPC Class Codes

G06F 21/31   User authentication

G06F 21/41   where a single sign-on prov...

G06F 21/552   involving long-term monitor...

G06F 21/57   Certifying or maintaining t...

G06F 2221/2113   Multi-level security, e.g. ...

G06F 2221/2117   User registration

G06N 20/00   Machine learning

G06Q 10/107   Computer-aided management o...

H04L 51/212   using filtering or selectiv...

Machine-learning based classification of user accounts based on email addresses and other account information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Machine-learning based classification of user accounts based on email addresses and other account information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links