IDENTIFYING MALICIOUS QUERIES
First Claim
1. A method of identifying malicious queries, comprising:
- applying a set of seed malicious queries to search logs to extract an Internet protocol (IP) address associated with a query in the search logs that matches one of the set of seed malicious queries;
creating an expanded malicious query set by applying the IP address to the search logs to identify other queries submitted from the IP address;
capturing variations of queries in the expanded malicious query set that are contained in the search logs;
extracting an enlarged set of malicious queries from the search logs; and
outputting the enlarged set of malicious queries and the IP address.
2 Assignments
0 Petitions
Accused Products
Abstract
A framework identifies malicious queries contained in search logs to uncover relationships between the malicious queries and the potential attacks launched by attackers submitting the malicious queries. A small seed set of malicious queries may be used to identify an IP address in the search logs that submitted the malicious queries. The seed set may be expanded by examining all queries in the search logs submitted by the identified IP address. Regular expressions may be generated from the expanded set of queries and used for detecting yet new malicious queries. Upon identifying the malicious queries, the framework may be used to detect attacks on vulnerable websites, spamming attacks, and phishing attacks.
-
Citations
20 Claims
-
1. A method of identifying malicious queries, comprising:
-
applying a set of seed malicious queries to search logs to extract an Internet protocol (IP) address associated with a query in the search logs that matches one of the set of seed malicious queries; creating an expanded malicious query set by applying the IP address to the search logs to identify other queries submitted from the IP address; capturing variations of queries in the expanded malicious query set that are contained in the search logs; extracting an enlarged set of malicious queries from the search logs; and outputting the enlarged set of malicious queries and the IP address. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for identifying malicious queries, comprising:
-
a storage server that stores a plurality of search logs, the search logs including records each containing a query, a time at which the query was issued, a set of results returned, a user agent, and an IP (Internet protocol) address that issued the query; a proxy filter that profiles behaviors associated with the IP address to determine if the IP address is associated with a proxy; and a regular expression generator that generates regular expressions that are applied to the search logs, wherein a set of seed malicious queries are input to the storage server to extract the IP address from the search logs that submitted a query that matches one of the set of seed malicious queries to create an expanded query set that is used by the regular expression generator to generate the regular expressions, and wherein the regular expressions are applied to the search logs to output an enlarged set of malicious queries. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method for identifying attack-related queries made to a search engine, comprising:
-
extracting a set of queries associated with suspect IP (Internet protocol) address from a plurality of search logs stored in a storage server; generating a plurality of regular expressions at a regular expression generator that capture variations of the set of queries, the regular expressions being assigned a score that measures the likelihood that a particular regular expression matches a random string; discarding the regular expressions that exceed the score; applying the regular expressions to the search logs to extract the attack-related queries from the search logs; and outputting the attack-related queries, the regular expressions, and the IP address. - View Dependent Claims (18, 19, 20)
-
Specification