Systems and methods for generating concept units from search queries
First Claim
Patent Images
1. A computer-implemented method of generating concept units from user search queries, the method comprising:
- receiving a plurality of queries, each query comprising a string of one or more words;
tokenizing each query string to produce one or more tokens for each query, wherein said tokens for said queries form an initial set of units;
combining units from the initial set of units that appear adjacent each other in a query to form a second set of units;
validating the second set of units;
repeating the steps of combining and validating one or more times using the second set of units in place of the initial set of units until a convergence condition is satisfied, wherein a final set of units is formed once the convergence condition has been satisfied;
storing the final set of units to a memory;
wherein the second set of units comprises a plurality of units, and wherein validating the second set of units comprises;
detecting (a) how often each of the plurality of the units appears by itself separate from others of the plurality of units;
detecting (b) how often two or more of the plurality of units appear next to each other across an entire set of queries; and
comparing a result of (a) with a result of (b).
10 Assignments
0 Petitions
Accused Products
Abstract
Systems and method for enhancing search functionality provided to a user. In certain aspects, a query processing engine automatically decomposes queries into constituent units that are related to concepts in which a user may be interested. The query processing engine decomposes queries into one or more constituent units per query using statistical methods. In certain aspects, no real world knowledge is used in determining units. In other aspects, aspects of world and content knowledge are introduced to enhance and optimize performance, for example, manually using a team of one or more information engineers.
355 Citations
34 Claims
-
1. A computer-implemented method of generating concept units from user search queries, the method comprising:
-
receiving a plurality of queries, each query comprising a string of one or more words; tokenizing each query string to produce one or more tokens for each query, wherein said tokens for said queries form an initial set of units; combining units from the initial set of units that appear adjacent each other in a query to form a second set of units; validating the second set of units; repeating the steps of combining and validating one or more times using the second set of units in place of the initial set of units until a convergence condition is satisfied, wherein a final set of units is formed once the convergence condition has been satisfied; storing the final set of units to a memory; wherein the second set of units comprises a plurality of units, and wherein validating the second set of units comprises; detecting (a) how often each of the plurality of the units appears by itself separate from others of the plurality of units; detecting (b) how often two or more of the plurality of units appear next to each other across an entire set of queries; and comparing a result of (a) with a result of (b). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 31, 32)
-
-
21. A system for generating concept units from user search queries, the system comprising:
-
a memory unit; and a processing module configured to receive one or more query log files, each query log file including a plurality of queries, each query including a string of one or more words, and wherein the processing module is further configured to; tokenize each query from the query log files to produce an initial set of units; and
thereafter, iteratively, until a convergence condition is satisfied;combine units from the initial set of units that appear adjacent each other in a query to form a second set of units; and validate the second set of units, wherein the second set of units is used for each iteration; and once the convergence condition has been satisfied, store a final set of units to the memory unit; wherein the second set of units comprises a plurality of units, and wherein validating the second set of units comprises; detecting (a) how often each of the plurality of the units appears by itself separate from others of the plurality of units; detecting (b) how often two or more of the plurality of units appear next to each other across an entire set of queries; and comparing a result of (a) with a result of (b). - View Dependent Claims (22, 23, 24, 25, 26, 27, 33)
-
-
28. A computer readable medium including code for causing a processor to generate concept units from a plurality of user search queries, each query comprising a string of one or more words wherein the code includes instructions to:
-
a) tokenize each query string to produce one or more tokens for each query, wherein said tokens for said queries form an initial set of units; b) combine units from the initial set of units that appear adjacent each other in a query to form a second set of units; c) validate the second set of units; d) repeat b) and c) one or more times using the second set of units in place of the initial set of units until a convergence condition is satisfied, wherein a final set of units is formed once the convergence condition has been satisfied; and store the final set of units to a memory module; wherein the second set of units comprises a plurality of units, and wherein validating the second set of units comprises; detecting (e) how often each of the plurality of the units appears by itself separate from others of the plurality of units; detecting (f) how often two or more of the plurality of units appear next to each other across an entire set of queries; and comparing a result of (e) with a result of (f). - View Dependent Claims (29, 30, 34)
-
Specification