IDENTIFYING MALICIOUS QUERIES

US 20110283360A1
Filed: 05/17/2010
Published: 11/17/2011
Est. Priority Date: 05/17/2010
Status: Active Grant

First Claim

Patent Images

1. A method of identifying malicious queries, comprising:

applying a set of seed malicious queries to search logs to extract an Internet protocol (IP) address associated with a query in the search logs that matches one of the set of seed malicious queries;

creating an expanded malicious query set by applying the IP address to the search logs to identify other queries submitted from the IP address;

capturing variations of queries in the expanded malicious query set that are contained in the search logs;

extracting an enlarged set of malicious queries from the search logs; and

outputting the enlarged set of malicious queries and the IP address.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A framework identifies malicious queries contained in search logs to uncover relationships between the malicious queries and the potential attacks launched by attackers submitting the malicious queries. A small seed set of malicious queries may be used to identify an IP address in the search logs that submitted the malicious queries. The seed set may be expanded by examining all queries in the search logs submitted by the identified IP address. Regular expressions may be generated from the expanded set of queries and used for detecting yet new malicious queries. Upon identifying the malicious queries, the framework may be used to detect attacks on vulnerable websites, spamming attacks, and phishing attacks.

Citations

20 Claims

1. A method of identifying malicious queries, comprising:
- applying a set of seed malicious queries to search logs to extract an Internet protocol (IP) address associated with a query in the search logs that matches one of the set of seed malicious queries;
  
  creating an expanded malicious query set by applying the IP address to the search logs to identify other queries submitted from the IP address;
  
  capturing variations of queries in the expanded malicious query set that are contained in the search logs;
  
  extracting an enlarged set of malicious queries from the search logs; and
  
  outputting the enlarged set of malicious queries and the IP address.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising:
    - generating regular expressions at a regular expression generator;
      
      extracting malicious queries from the search logs that match the regular expressions to determine the enlarged set of malicious queries; and
      
      outputting the regular expressions.
  - 3. The method of claim 2, further comprising eliminating redundant regular expressions that match similar queries in the search logs.
  - 4. The method of claim 2, further comprising performing an attack analysis using the regular expressions and the enlarged set of malicious queries to determine a type of attack associated with the enlarged set of malicious queries.
  - 5. The method of claim 4, further comprising analyzing the regular expressions to determine the type of attack as one of an exploitation of a website vulnerability, a spamming attack, or a phishing attack.
  - 6. The method of claim 1, further comprising profiling a behavior associated with the IP address to determine if the IP address is associated with a proxy.
  - 7. The method of claim 6, further comprising:
    - defining a granularity to be applied to a geographic profile associated with the behavior;
      
      defining popular queries within the granularity; and
      
      determining a difference between the IP address and the geographic profile.
  - 8. The method of claim 1, further comprising identifying previously unknown malicious queries within the enlarged set of malicious queries.
  - 9. The method of claim 1, further comprising iteratively processing by feeding-back the enlarged set of malicious queries as the set of seed malicious queries.
  - 10. The method of claim 1, further comprising verifying by examining features of the enlarged set of malicious queries to determine if the enlarged set of malicious queries were generated by a script or generated having botnet group properties.
  - 11. The method of claim 1, further comprising providing information regarding the malicious queries to security applications for remedial actions to be taken by vulnerable servers.

12. A system for identifying malicious queries, comprising:
- a storage server that stores a plurality of search logs, the search logs including records each containing a query, a time at which the query was issued, a set of results returned, a user agent, and an IP (Internet protocol) address that issued the query;
  
  a proxy filter that profiles behaviors associated with the IP address to determine if the IP address is associated with a proxy; and
  
  a regular expression generator that generates regular expressions that are applied to the search logs,wherein a set of seed malicious queries are input to the storage server to extract the IP address from the search logs that submitted a query that matches one of the set of seed malicious queries to create an expanded query set that is used by the regular expression generator to generate the regular expressions, andwherein the regular expressions are applied to the search logs to output an enlarged set of malicious queries.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The system of claim 12, wherein the expanded query set is generated by applying the IP address to the search logs to identify other queries submitted from the IP address.
  - 14. The system of claim 12, wherein the regular expression generator determines a score that measures the likelihood that a particular regular expression matches a random string, and wherein the regular expression generator discards the regular expressions that exceed the score.
  - 15. The system of claim 12, further comprising a feedback loop where the enlarged set of malicious queries is iteratively provided as the set of seed malicious queries.
  - 16. The system of claim 12, further comprising an attack analysis engine that uses the regular expressions and the enlarged set of malicious queries to determine a type of attack associated with the enlarged set of malicious queries, wherein the type of attack is an exploitation of a website vulnerability, a spamming attack, or a phishing attack.

17. A method for identifying attack-related queries made to a search engine, comprising:
- extracting a set of queries associated with suspect IP (Internet protocol) address from a plurality of search logs stored in a storage server;
  
  generating a plurality of regular expressions at a regular expression generator that capture variations of the set of queries, the regular expressions being assigned a score that measures the likelihood that a particular regular expression matches a random string;
  
  discarding the regular expressions that exceed the score;
  
  applying the regular expressions to the search logs to extract the attack-related queries from the search logs; and
  
  outputting the attack-related queries, the regular expressions, and the IP address.
- View Dependent Claims (18, 19, 20)
- - 18. The method of claim 17, further comprising feeding back the attack-related queries to extract a subsequent suspect IP address or a subsequent set of attack-related queries from the search logs.
  - 19. The method of claim 17, further comprising profiling a behavior associated with the IP address to determine if the IP address is associated with a proxy.
  - 20. The method of claim 17, further comprising performing an attack analysis using the regular expressions and the attack-related queries to determine a type of attack.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
John, John Payyappillil, Abadi, Martin, Yu, Fang, Xie, Yinglian

Granted Patent

US 8,495,742 B2
Time in Patent Office

Days
Field of Search
US Class Current

726/24
CPC Class Codes

H04L 63/0227 Filtering policies mail mes...

H04L 63/1416 Event detection, e.g. attac...

IDENTIFYING MALICIOUS QUERIES

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

IDENTIFYING MALICIOUS QUERIES

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links