Method and system for discovering suspicious account groups
First Claim
1. A method for discovering suspicious account groups, comprising:
- under a control of at least one hardware processor,receiving a monitoring website table and at least one monitored vocabulary set containing a plurality of elements;
downloading a first group of accounts and one or more post contents corresponding to each account of the first group of accounts from the monitoring website during a first time interval;
establishing a language model, for each account of the first group of accounts, according to the one or more post contents from each account of the first group of accounts during the first time interval, to describe a linguistic fashion for each account, the language model being expressed at least partly as a probability of an occurrence of at least one element of the at least one monitored vocabulary set in an account;
comparing a similarity among a first group of language models of the first group of accounts to cluster the first group of accounts;
downloading newly added data including a second group of accounts and one or more post contents corresponding to each account of the second group of accounts from the monitoring website during a second time interval;
obtaining one or more homonyms synonyms in the newly added data of at least one element of the at least one monitored vocabulary set corresponding to the first group of accounts, comprising the sub-steps offetching one or more features through a previous feature window and a next feature window of each monitored vocabulary in the at least one monitored vocabulary set; and
converting a weight of an original word of the at least one monitored vocabulary set into a corresponding weight of a homonym synonym;
updating the first group of language models with the one or more homonyms synonyms;
integrating the first and the second groups of accounts to create an integrated group of accounts;
rebuilding a language model for each of the integrated group of accounts to create a second group of language models based on the step of updating the first group of language models with the one or more homonyms synonyms;
clustering the integrated group of accounts according to the determined similarity among the integrated group of accounts based on the second group of language models;
determining at least one suspicious account group after the step of clustering according to a level of homogeneity among at least account groups of the integrated group of accounts; and
determining interaction connection among accounts of the integrated group of accounts based on a result of the step of identifying at least one suspicious account group.
1 Assignment
0 Petitions
Accused Products
Abstract
In one exemplary embodiment, a system for discovering suspicious account groups establishes a language model according to the post contents from each account of a first group of accounts during a first time interval, to describe the speech of the account, and compares the similarity among a plurality of language models of the first group of accounts to cluster the first group of accounts; and for a plurality of newly added data during a second time interval, discovers near-synonyms of at least a monitored vocabulary set, and updates the near-synonyms to a plurality of language models of a second group of accounts. The system further integrates the first and the second groups of accounts, and re-clusters an integrated group of accounts.
20 Citations
17 Claims
-
1. A method for discovering suspicious account groups, comprising:
-
under a control of at least one hardware processor, receiving a monitoring website table and at least one monitored vocabulary set containing a plurality of elements; downloading a first group of accounts and one or more post contents corresponding to each account of the first group of accounts from the monitoring website during a first time interval; establishing a language model, for each account of the first group of accounts, according to the one or more post contents from each account of the first group of accounts during the first time interval, to describe a linguistic fashion for each account, the language model being expressed at least partly as a probability of an occurrence of at least one element of the at least one monitored vocabulary set in an account; comparing a similarity among a first group of language models of the first group of accounts to cluster the first group of accounts; downloading newly added data including a second group of accounts and one or more post contents corresponding to each account of the second group of accounts from the monitoring website during a second time interval; obtaining one or more homonyms synonyms in the newly added data of at least one element of the at least one monitored vocabulary set corresponding to the first group of accounts, comprising the sub-steps of fetching one or more features through a previous feature window and a next feature window of each monitored vocabulary in the at least one monitored vocabulary set; and converting a weight of an original word of the at least one monitored vocabulary set into a corresponding weight of a homonym synonym; updating the first group of language models with the one or more homonyms synonyms; integrating the first and the second groups of accounts to create an integrated group of accounts; rebuilding a language model for each of the integrated group of accounts to create a second group of language models based on the step of updating the first group of language models with the one or more homonyms synonyms; clustering the integrated group of accounts according to the determined similarity among the integrated group of accounts based on the second group of language models; determining at least one suspicious account group after the step of clustering according to a level of homogeneity among at least account groups of the integrated group of accounts; and determining interaction connection among accounts of the integrated group of accounts based on a result of the step of identifying at least one suspicious account group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for discovering suspicious account groups, comprising:
-
a language model training device receiving a monitoring website table and at least one monitored vocabulary set containing a plurality of elements, receiving a first group of accounts and one or more post contents corresponding to each account of the first group of accounts downloaded from the monitoring website during a first time interval, and establishing a language model, for each account of the first group of accounts, according to the one or more post contents from each account of the first group of accounts during the first time interval, to describe a linguistic fashion for each account, the language model being expressed at least partly as a probability of an occurrence of at least one element of the at least one monitored vocabulary set in an account, the language model training device further receiving newly added data including a second group of accounts and one or more post contents corresponding to each account of the second group of accounts downloaded from the monitoring website during a second time interval; an account clustering device clustering the first group of accounts according to a similarity of a first group of language models of the first group of accounts; a near-synonym identification device discovering one or more near-synonyms of at least one element of the at least one monitored vocabulary set in the newly added data during a second time interval, and updating the one or more near-synonyms to a second group of language models of a second group of accounts; and an incremental account clustering device updating the first group of language models with the one or more homonyms synonyms, integrating the first and the second groups of accounts to create an integrated group of accounts, rebuilding a language model for each of the integrated group of accounts to create a second group of language models based on the step of updating the first group of language models with the one or more homonyms synonyms and re-clustering the integrated group of accounts according to the determined similarity among the integrated group of accounts based on the second group of language models; wherein to discover the one or more synonyms in the newly added data the system is configured to fetch one or more features through a previous feature window and a next feature window of each monitored vocabulary in the at least one monitored vocabulary set; and convert a weight of an original word of the at least one monitored vocabulary set into a corresponding weight of a homonym synonym; and wherein the system is further configured to determine at least one suspicious account group after the step of clustering according to a level of homogeneity among at least account groups of the integrated group of accounts, and determine interaction connection among accounts of the integrated group of accounts based on a result of the step of identifying at least one suspicious account group. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification