Identifying malicious text in advertisement content

US 10,445,770 B2
Filed: 08/01/2014
Issued: 10/15/2019
Est. Priority Date: 08/01/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

retrieving, by a processor of an online system, text included in advertisement content of an advertisement (“

ad”

) request for presentation to a user of the online system;

identifying, by the processor of the online system, one or more words included in the advertisement content;

identifying, by the processor of the online system, one or more Unicode characters comprising each of the one or more words, each of the one or more Unicode characters being associated with a range of Unicode characters that comprise to a Unicode block of a plurality of Unicode blocks;

determining, for each Unicode character of the one or more Unicode characters included in each of the one or more words, a Unicode block associated with the Unicode character;

determining, by the processor of the online system, a score for each word of the one or more words by;

determining, for each of the identified one or more words, a most common Unicode block associated with the one or more Unicode characters in the word;

determining a conditional probability of the one or more Unicode characters being included in the word belonging to a specific Unicode block based at least in part on a number of Unicode characters in the word and a number of Unicode characters in the word associated with the most common Unicode block associated with the Unicode characters in the word; and

determining the score for the word based at least in part on the determined conditional probability, a word of the one or more words comprising Unicode characters associated with a same Unicode block having a higher determined score relative to a word comprising Unicode characters associated with two or more different Unicode blocks;

generating, by the processor of the online system, a combined score for the advertisement based on the determined scores of each word of the one or more words;

determining, by the processor of the online system, that the advertisement content is offensive based at least in part on the combined score for the advertisement being less than a threshold value; and

responsive to the combined score for the advertisement being less than the threshold value, determining, by the processor of the online system, that the advertisement content is ineligible for presentation to the user of the online system based at least in part on the determination that the advertisement content is offensive.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An online system receives advertisement requests from one or more advertisers and determines whether an advertisement request includes malicious content before presenting content from the advertisement request to a user. To determine whether the advertisement request includes malicious content, the online system identifies text in the advertisement request, identifies words in the text, and identifies characters in each word. The online system identifies a most common type of character in each word and generates a score for each word based on its constituent characters. For example, a word'"'"'s score is based on the combination of characters in the word, such as a conditional probability of a word including a type of character given that the word includes a given number of the most common type of character. The scores are analyzed to determine if text in the advertisement request includes malicious content.

Citations

14 Claims

1. A method comprising:
- retrieving, by a processor of an online system, text included in advertisement content of an advertisement (“
  
  ad”
  
  ) request for presentation to a user of the online system;
  
  identifying, by the processor of the online system, one or more words included in the advertisement content;
  
  identifying, by the processor of the online system, one or more Unicode characters comprising each of the one or more words, each of the one or more Unicode characters being associated with a range of Unicode characters that comprise to a Unicode block of a plurality of Unicode blocks;
  
  determining, for each Unicode character of the one or more Unicode characters included in each of the one or more words, a Unicode block associated with the Unicode character;
  
  determining, by the processor of the online system, a score for each word of the one or more words by;
  
  determining, for each of the identified one or more words, a most common Unicode block associated with the one or more Unicode characters in the word;
  
  determining a conditional probability of the one or more Unicode characters being included in the word belonging to a specific Unicode block based at least in part on a number of Unicode characters in the word and a number of Unicode characters in the word associated with the most common Unicode block associated with the Unicode characters in the word; and
  
  determining the score for the word based at least in part on the determined conditional probability, a word of the one or more words comprising Unicode characters associated with a same Unicode block having a higher determined score relative to a word comprising Unicode characters associated with two or more different Unicode blocks;
  
  generating, by the processor of the online system, a combined score for the advertisement based on the determined scores of each word of the one or more words;
  
  determining, by the processor of the online system, that the advertisement content is offensive based at least in part on the combined score for the advertisement being less than a threshold value; and
  
  responsive to the combined score for the advertisement being less than the threshold value, determining, by the processor of the online system, that the advertisement content is ineligible for presentation to the user of the online system based at least in part on the determination that the advertisement content is offensive.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein determining the conditional probability further comprises:
    - determining probabilities of each Unicode character in the word being followed by a subsequent Unicode character in the one or more Unicode characters included in the word being associated with a same Unicode block.
  - 3. The method of claim 2, wherein determining the score for the word based at least in part on the determined probabilities comprises:
    - determining a sum of the determined probabilities.
  - 4. The method of claim 2, wherein determining the score for the word based at least in part on the determined probabilities comprises:
    - determining an average of the determined probabilities.
  - 5. The method of claim 1, wherein the threshold value is determined based at least in part on a number of the identified one or more words in the text.
  - 6. The method of claim 1, wherein a character in the identified one or more characters is selected from a group consisting of:
    - a letter, a number, a text symbol, and any combination thereof.
  - 7. The method of claim 1, wherein determining the Unicode block associated with each Unicode character included in each of the one or more words comprises:
    - analyzing a hexadecimal value used to encode each of the Unicode characters in each of the one or more words, each hexadecimal value corresponding to a Unicode block.

8. A method comprising:
- retrieving, by a processor of an online system, text included in advertisement content of an advertisement (“
  
  ad”
  
  ) request for presentation to a user of the online system;
  
  identifying, by the processor of the online system, one or more words included in the advertisement content;
  
  identifying a Unicode block associated with each of one or more characters in each of the identified one or more words, each of the one or more characters being associated with a range of characters that comprise to a Unicode block of a plurality of Unicode blocks;
  
  scoring, by the processor of the online system, each word from the identified one or more words by;
  
  determining, for each of the identified one or more words, a most common Unicode block associated with the one or more characters in the word;
  
  determining a conditional probability of the one or more characters being included in the word belonging to a specific Unicode block based at least in part on a number of characters in the word and a number of characters in the word associated with the most common Unicode block associated with the characters in the word; and
  
  determining a score for the word based at least in part on the determined conditional probability, wherein a word of the one or more words comprising characters associated with a same Unicode block having a higher determined score relative to a word comprising characters associated with two or more different Unicode blocks;
  
  generating, by the processor of the online system, a combined score for the advertisement based on the determined scores of each word of the one or more words;
  
  determining, by the processor of the online system, that the advertisement content includes offensive content based at least in part on the combined score for the advertisement being less than a threshold value; and
  
  responsive to the combined score for the advertisement being less than the threshold value, determining, by the processor of the online system, that the advertisement content is ineligible for presentation to the user of the online system based at least in part on the determination that the advertisement content includes offensive content.
- View Dependent Claims (9, 10, 11)
- - 9. The method of claim 8, wherein determining the conditional probability further comprises:
    - determining probabilities of each character in the word being followed by a subsequent character in the one or more characters included in the word being associated with a same Unicode block.
  - 10. The method of claim 8, wherein the threshold value is based at least in part on a number of the identified one or more words in the text.
  - 11. The method of claim 8, wherein determining the Unicode block associated with each character included in each of the one or more words comprises:
    - analyzing a hexadecimal value used to encode each of the characters in each of the one or more words, each hexadecimal value corresponding to a Unicode block.

12. A computer program product comprising a non-transitory computer-readable storage medium having instructions encoded thereon that, when executed by a processor, cause the processor to:
- retrieve text included in advertisement content of an advertisement (“
  
  ad”
  
  ) request for presentation to a user of an online system;
  
  identify one or more words included in the advertisement content;
  
  identify a Unicode block associated with each of one or more characters in each of the identified one or more words, each of the one or more characters being associated with a range of characters that comprise to a Unicode block of a plurality of Unicode blocks;
  
  score each word from the identified one or more words by;
  
  determining, for each of the identified one or more words, a most common Unicode block associated with the one or more characters in the word;
  
  determining a conditional probability of the one or more characters being included in the word belonging to a specific Unicode block based at least in part on a number of characters in the word and a number of characters in the word associated with the most common Unicode block associated with the characters in the word; and
  
  determining the score associated with the word based at least in part on the determined conditional probability, wherein a word of the one or more words comprising characters associated with a same Unicode block having a higher determined score relative to a word comprising characters associated with two or more different Unicode blocks;
  
  generate a combined score for the advertisement based on the determined scores of each word of the one or more words;
  
  determine that the advertisement content includes offensive content based at least in part on the combined score for the advertisement being less than a threshold value; and
  
  responsive to the combined score for the advertisement being less than the threshold value, determine that the advertisement content is ineligible for presentation to the user of the online system based at least in part on the determination that the advertisement content includes offensive content.
- View Dependent Claims (13, 14)
- - 13. The computer program product of claim 12, wherein determining the conditional probability further comprises:
    - determining probabilities of each character in the word being followed by a subsequent character in the one or more characters included in the word being associated with a same Unicode block.
  - 14. The computer program product of claim 12, wherein the threshold value is based at least in part on a number of the identified one or more words in the text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Schroeder, Andrew Joseph, Dowling, Benjamin Mark
Primary Examiner(s)
Pyo, Monica M

Application Number

US14/450,184
Publication Number

US 20160034950A1
Time in Patent Office

1,901 Days
Field of Search

707731, 707748, 715773, 709219
US Class Current
CPC Class Codes

G06F 16/63   Querying

G06Q 30/0248   Avoiding fraud

H04L 63/1441   Countermeasures against mal...

H04L 63/145   the attack involving the pr...

Identifying malicious text in advertisement content

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying malicious text in advertisement content

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links