×

Identifying malicious text in advertisement content

  • US 10,445,770 B2
  • Filed: 08/01/2014
  • Issued: 10/15/2019
  • Est. Priority Date: 08/01/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • retrieving, by a processor of an online system, text included in advertisement content of an advertisement (“

    ad”

    ) request for presentation to a user of the online system;

    identifying, by the processor of the online system, one or more words included in the advertisement content;

    identifying, by the processor of the online system, one or more Unicode characters comprising each of the one or more words, each of the one or more Unicode characters being associated with a range of Unicode characters that comprise to a Unicode block of a plurality of Unicode blocks;

    determining, for each Unicode character of the one or more Unicode characters included in each of the one or more words, a Unicode block associated with the Unicode character;

    determining, by the processor of the online system, a score for each word of the one or more words by;

    determining, for each of the identified one or more words, a most common Unicode block associated with the one or more Unicode characters in the word;

    determining a conditional probability of the one or more Unicode characters being included in the word belonging to a specific Unicode block based at least in part on a number of Unicode characters in the word and a number of Unicode characters in the word associated with the most common Unicode block associated with the Unicode characters in the word; and

    determining the score for the word based at least in part on the determined conditional probability, a word of the one or more words comprising Unicode characters associated with a same Unicode block having a higher determined score relative to a word comprising Unicode characters associated with two or more different Unicode blocks;

    generating, by the processor of the online system, a combined score for the advertisement based on the determined scores of each word of the one or more words;

    determining, by the processor of the online system, that the advertisement content is offensive based at least in part on the combined score for the advertisement being less than a threshold value; and

    responsive to the combined score for the advertisement being less than the threshold value, determining, by the processor of the online system, that the advertisement content is ineligible for presentation to the user of the online system based at least in part on the determination that the advertisement content is offensive.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×