×

Identifying terms

  • US 9,123,046 B1
  • Filed: 04/27/2012
  • Issued: 09/01/2015
  • Est. Priority Date: 04/29/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving, for each of multiple accounts, a document associated with the account;

    identifying the accounts that have been designated as spam accounts;

    merging the documents that are associated with the accounts that have been designated as spam accounts, into a single, merged document;

    determining, for each of one or more terms that occur in the merged document, a blacklist term frequency (BTF) that represents a number of times that the term occurs in the merged document;

    determining a number of accounts that have not been designated as spam accounts;

    determining, for each of the terms, a number of the documents that are associated with accounts that have not been designated as spam accounts, in which the term occurs;

    determining, for each of the terms, an inverse document frequency (IDF) for the term based on the number of accounts that have not been designated as spam accounts, and the number of documents that are associated with the accounts that have not been designated as spam accounts, in which the term occurs;

    determining, for each of the one or more terms, a blacklist term frequency-inverse document frequency (BTF-IDF) score by multiplying the blacklist term frequency for the term by the inverse document frequency for the term;

    selecting, as spam terms, one or more of the terms whose respective BT-IDF score satisfies a threshold; and

    automatically determining whether to designate a new account as a spam account based at least on identifying an occurrence of one or more of the spam terms in a document associated with the new account.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×