System and Method for Generating Queries
First Claim
1. A system for generating a Boolean query comprising:
- a. a data manager configured to get training data and production data, wherein the training data comprises a plurality of training documents and each of the plurality of training documents comprises at least one training token, and wherein the production data comprises a plurality of production documents and each of the plurality of production documents comprises at least one production token;
b. a clustering manager configured to cluster the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents, wherein each cluster comprises at least one training document; and
c. a query manager configured to generate the Boolean query for a cluster of the plurality of clusters based on an occurrence of the at least one training token in at least one training document of the plurality of training documents, and to execute the Boolean query on the plurality of production documents in the production data.
20 Assignments
0 Petitions
Accused Products
Abstract
The system utilizes (gets) training data that comprises a plurality of training documents. Each of the plurality of training documents comprises a training token(s). The plurality of training documents are clustered into a plurality of clusters based on at least one training token in the plurality of training documents. Each cluster contains at least one training document. A Boolean query(s) is generated for a cluster based on an occurrence of the at least one training token in a training document in the plurality of training documents. The system gets production data that comprises a plurality of production documents. Each of the plurality of production documents comprises a production token(s). The Boolean query(s) is then executed on the production data.
-
Citations
47 Claims
-
1. A system for generating a Boolean query comprising:
-
a. a data manager configured to get training data and production data, wherein the training data comprises a plurality of training documents and each of the plurality of training documents comprises at least one training token, and wherein the production data comprises a plurality of production documents and each of the plurality of production documents comprises at least one production token; b. a clustering manager configured to cluster the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents, wherein each cluster comprises at least one training document; and c. a query manager configured to generate the Boolean query for a cluster of the plurality of clusters based on an occurrence of the at least one training token in at least one training document of the plurality of training documents, and to execute the Boolean query on the plurality of production documents in the production data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for generating a Boolean query comprising:
-
a. a data manager configured to get training data and production data, wherein the training data comprises a plurality of training documents and each of the plurality of training documents comprises at least one training token, wherein the production data comprises a plurality of production documents and each of the plurality of production documents comprises at least one production token, and wherein data manager is configured to clean the training data, and identify at least one salient token from the at least one training token in each of the plurality of training documents; b. a clustering manager configured to cluster the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents or the at least one salient token, wherein each cluster comprises at least one training document; and c. a query manager configured to generate the Boolean query for a cluster of the plurality of clusters based on an occurrence of the at least one salient token in at least one training document of the plurality of training documents, and to execute the Boolean query on the plurality of production documents in the production data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer implemented method for generating a Boolean query comprising:
-
a. getting training data, wherein the training data comprises a plurality of training documents and wherein each of the plurality of training documents comprises at least one training token; b. clustering the plurality of training documents into a plurality of clusters based on at least one training token of the plurality of training documents, wherein each cluster comprises at least one training document; c. generating the Boolean query for a cluster of the plurality of clusters based on an occurrence of the at least one training token in at least one training document of the plurality of training documents; d. getting production data, wherein the production data comprises a plurality of production documents and wherein each of the plurality of production documents comprises at least one production token; and e. executing the Boolean query on the plurality of production documents in the production data. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 43)
-
-
32. A computer implemented method for generating a Boolean query comprising:
-
a. getting training data, wherein the training data comprises a plurality of training documents and wherein each of the plurality of training documents comprises at least one training token; b. cleaning the training data; c. identifying at least one salient token from the at least one training token in each of the plurality of training documents; d. clustering the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents or the at least one salient token, wherein each cluster comprises at least one training document; e. generating the Boolean query for a cluster of the plurality of clusters based on an occurrence of at least one salient token in at least one training document of the plurality of training documents; f. getting production data, wherein the production data comprises a plurality of production documents and wherein each of the plurality of production documents comprises at least one production token; and g. executing the Boolean query on the plurality of production documents in the production data. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
44. An apparatus for generating a Boolean query comprising:
-
a. means for getting training data, wherein the training data comprises a plurality of training documents and wherein each of the plurality of training documents comprises at least one training token; b. means for clustering the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents, wherein each cluster comprises at least one training document; c. means for generating the Boolean query for a cluster in the plurality of clusters based on an occurrence of the at least one training token in at least one training document in the plurality of training documents; d. means for getting production data, wherein the production data comprises a plurality of production documents and wherein each of the plurality of production documents comprises at least one production token; and e. means for executing the Boolean query on the plurality of production documents in the production data.
-
-
45. An apparatus for generating a Boolean query:
-
a. means for getting training data, wherein the training data comprises a plurality of training documents and wherein each of the plurality of training documents comprises at least one training token; b. means for cleaning the training data; c. means for identifying at least one salient token from the at least one training token in each of the plurality of training documents; d. means for clustering the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents or the at least one salient token, wherein each cluster comprises at least one training document; e. means for generating the Boolean query for a cluster in the plurality of clusters based on an occurrence of the at least one salient token in at least one training document in the plurality of training documents; f. means for getting production data, wherein the production data comprises a plurality of production documents and wherein each of the plurality of production documents comprises at least one production token; and g. means for executing the Boolean query on the plurality of production documents in the production data.
-
-
46. A system for generating a Boolean query comprising:
-
a. a data manager configured to get training data and production data, wherein the training data comprises a plurality of training documents and each of the plurality of training documents comprises at least one training token, wherein the production data comprises a plurality of production documents and each of the plurality of production documents comprises at least one production token, and wherein the data manager is configured to clean the training data, identify a plurality of salient tokens from the at least one training token in each of the plurality of training documents, calculate a salient token/cluster weight matrix for the plurality of salient tokens for the cluster, to rank the plurality of salient tokens for the cluster, and to select a list of the top N salient tokens from the plurality of training tokens for the cluster; b. a clustering manager configured to cluster the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents, wherein each cluster comprises at least one training document; and c. a query manager configured to generate the Boolean query for a cluster in the plurality of clusters based on an occurrence of the plurality of salient tokens in at least one training document in the plurality of training documents, and to execute the Boolean query on the plurality of production documents in the production data.
-
-
47. A computer implemented method for generating a Boolean query comprising:
-
a. getting training data, wherein the training data comprises a plurality of training documents and wherein each of the plurality of training documents comprises at least one training token; b. cleaning the training data; c. identifying a plurality of salient tokens from the at least one training token in each of the plurality of training documents; d. clustering the plurality of training documents into a plurality of clusters based on at least one training token in the plurality of training documents, wherein each cluster comprises at least one training document; e. calculating a salient token/cluster weight matrix for the plurality of salient tokens for the cluster; k. ranking the plurality of salient tokens for the cluster; and l. selecting a list of the top N salient tokens from the plurality of salient tokens for the cluster; m. generating the Boolean query for a cluster of the plurality of clusters based on an occurrence of the plurality of salient tokens in the at least one training document in the plurality of training documents; n. getting production data, wherein the production data comprises a plurality of production documents and wherein each of the plurality of production documents comprises at least one production token; and o. executing the Boolean query on the plurality of production documents in the production data.
-
Specification