Machine-learning based classification of user accounts based on email addresses and other account information
First Claim
1. A method performed by one or more processors configured with computer-executable instructions, the method comprising:
- receiving an account associated with information including an email address;
extracting one or more features from the information associated with the account, wherein at least one of the one or more features is based on memorability of the email address, the memorability relating to a pattern of symmetry, anti-symmetry, or uniformly distanced characters in the email address; and
determining a trust level of the account at least partly based on the extracted features.
2 Assignments
0 Petitions
Accused Products
Abstract
A trust level of an account is determined at least partly based on a degree of the memorability of an email address associated with the account. Additional features such as those based on the domain of the email address and those from the additional information such as name, phone number, and address associated with the account may also be used to determine the trust level of the account. A machine learning process may be used to learn a classification model based on one or more features that distinguish a malicious account from a benign account from training data. The classification model is used to determine a trust level of the account, and/or if the account is malicious or benign, and may be continuously improved by incrementally adapting or improving the model with new accounts.
-
Citations
19 Claims
-
1. A method performed by one or more processors configured with computer-executable instructions, the method comprising:
-
receiving an account associated with information including an email address; extracting one or more features from the information associated with the account, wherein at least one of the one or more features is based on memorability of the email address, the memorability relating to a pattern of symmetry, anti-symmetry, or uniformly distanced characters in the email address; and determining a trust level of the account at least partly based on the extracted features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. One or more computing devices having stored thereupon a plurality of computer-executable instructions that, when executed by a processor, causes the processor to perform operations comprising:
-
analyzing a plurality of labeled accounts from one or more sources, each of the plurality of labeled accounts indicating that a respective labeled account is malicious or benign; determining one or more features extracted from the plurality of accounts that distinguish a respective labeled account that is malicious and a respective labeled account that is benign, wherein at least one of the one or more features is based on memorability of email addresses, the memorability relating to a pattern of symmetry, anti-symmetry, or uniformly distanced characters in the email address; applying one or more machine learning methods to build a classification model based on the obtained one or more features; and using the classification model to determine a trust level of an incoming account. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
memory storing one or more modules; one or more processors operably coupled to the memory to execute one or more modules, the one or more modules including; a receiving module that receives an account, the account associated with an email address and/or additional information; a training module that uses one or more labeled data including a plurality of labeled accounts to learn a classification model based on one or more features from email addresses and/or additional information associated with the labeled accounts that distinguish a malicious account from a benign account at least partly based on memorability of the email addresses, the features including at least one of following; one or more features related to meaningful strings, the meaningful strings including one or more letters or numbers that are convertible according to a set of rules; one or more features related to pronounceable strings; one or more features related to a pattern including symmetry, anti-symmetry or uniformly distanced characters in the email address; one or more features related to a domain of the email address;
orone or more features related to additional information associated with the account and/or the email address, the additional information including a name, a phone number, or an address associated with the account and/or the email address; and a determination module that uses the classification model to determine a trust level of the account.
-
Specification