Synopsis of a search log that respects user privacy
First Claim
1. In a computing environment, a method comprising:
- processing a search log, including determining which queries from the search log correspond to information that is safe to publish; and
for each of the queries having information that is safe to publish, publishing the information as output data, wherein determining which queries from the search log correspond to information that is safe to publish comprises limiting how many queries in the search log each user can contribute to a set of queries for processing, wherein publishing the information as output data comprises outputting a query-action graph having nodes representing queries and nodes representing actions taken, with each edge between a query node and an action node having a weight that indicates how many times that action was taken following that query, wherein the weight has zero noise, a negative noise or a positive noise added thereto, or outputting a query-inaction graph having nodes representing queries and nodes representing actions skipped, with each edge between a query node and an inaction node having a weight that indicates how many times that action was not taken following that query, and wherein the weight has zero noise, a negative noise or a positive noise added thereto.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is releasing output data representing a search log, in which the data is suitable for most data mining/analysis applications, but is safe to publish by preserving user privacy. The search log is processed such that a query is only included if a sufficient count of that query is present; noise may be added. User contributions that are considered may be limited to a maximum number of queries. The output may indicate how often (possibly plus noise) that each query appeared. Other output may comprise a query-action graph, a query-inaction graph and/or a query-reformulation graph, with nodes representing queries and nodes representing actions, inactions or reformulations (e.g., clicked URLs, skipped URLs, or selected related queries), and edges between nodes representing action, skip or selection counts (possibly plus noise). The output may correspond to the top results/related queries returned from a search.
13 Citations
19 Claims
-
1. In a computing environment, a method comprising:
-
processing a search log, including determining which queries from the search log correspond to information that is safe to publish; and for each of the queries having information that is safe to publish, publishing the information as output data, wherein determining which queries from the search log correspond to information that is safe to publish comprises limiting how many queries in the search log each user can contribute to a set of queries for processing, wherein publishing the information as output data comprises outputting a query-action graph having nodes representing queries and nodes representing actions taken, with each edge between a query node and an action node having a weight that indicates how many times that action was taken following that query, wherein the weight has zero noise, a negative noise or a positive noise added thereto, or outputting a query-inaction graph having nodes representing queries and nodes representing actions skipped, with each edge between a query node and an inaction node having a weight that indicates how many times that action was not taken following that query, and wherein the weight has zero noise, a negative noise or a positive noise added thereto. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. In a computing environment, a system comprising:
-
at least one processing unit; a transformation mechanism, implemented on the at least one processing unit, configured to process a search log, including by determining which queries from the search log correspond to information that is safe to publish, and for each of the queries having information that is safe to publish, configured to publish the information as output data, wherein determining which queries from the search log correspond to information that is safe to publish comprises limiting how many queries in the search log each user can contribute to a set of queries for processing, wherein publishing the information as output data comprises outputting a query-action graph having nodes representing queries and nodes representing actions taken, with each edge between a query node and an action node having a weight that indicates how many times that action was taken following that query, wherein the weight has zero noise, a negative noise or a positive noise added thereto, or outputting a query-inaction graph having nodes representing queries and nodes representing actions skipped, with each edge between a query node and an inaction node having a weight that indicates how many times that action was not taken following that query, and wherein the weight has zero noise, a negative noise or a positive noise added thereto. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. One or more computer storage devices storing computer executable instructions, which in response to execution by a computer, cause the computer to perform steps comprising:
-
processing a search log, including determining which queries from the search log correspond to information that is safe to publish; and for each of the queries having information that is safe to publish, publishing the information as output data, wherein determining which queries from the search log correspond to information that is safe to publish comprises limiting how many queries in the search log each user can contribute to a set of queries for processing, wherein publishing the information as output data comprises outputting a query-action graph having nodes representing queries and nodes representing actions taken, with each edge between a query node and an action node having a weight that indicates how many times that action was taken following that query, wherein the weight has zero noise, a negative noise or a positive noise added thereto, or outputting a query-inaction graph having nodes representing queries and nodes representing actions skipped, with each edge between a query node and an inaction node having a weight that indicates how many times that action was not taken following that query, and wherein the weight has zero noise, a negative noise or a positive noise added thereto. - View Dependent Claims (19)
-
Specification