Algorithm for fast disk based text mining
First Claim
1. A method of executing a query on a computer for at least one document similar to a specified document, the method comprising:
- receiving the query;
forming a reduced query document based on ranks of terms in the specified document, the forming comprising;
calculating a rank of at least one term in the specified query document,calculating a square of each rank,calculating a normalized rank for each square,sorting a list of said normalized ranks,calculating a partial sum for each normalized rank in the list of normalized ranks, andincluding, in the reduced query document, terms corresponding to a partial sum above a threshold value;
generating a modified query based on the query and the reduced query document;
executing the modified query on a data repository to generate a set of results; and
providing a result from said generated set of results to a user interface.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus, including computer systems and program products, for executing a query, for example, a query for a document similar to another document. In one general aspect, the techniques feature a method of executing a query for at least one document similar to a specified document. That method includes receiving the query; forming a reduced query document based on ranks of terms in the specified document; generating a modified query based on the query and the reduced query document; executing the modified query on a data repository to generate a set of results; and, providing a result to a user interface.
8 Citations
16 Claims
-
1. A method of executing a query on a computer for at least one document similar to a specified document, the method comprising:
-
receiving the query; forming a reduced query document based on ranks of terms in the specified document, the forming comprising; calculating a rank of at least one term in the specified query document, calculating a square of each rank, calculating a normalized rank for each square, sorting a list of said normalized ranks, calculating a partial sum for each normalized rank in the list of normalized ranks, and including, in the reduced query document, terms corresponding to a partial sum above a threshold value; generating a modified query based on the query and the reduced query document; executing the modified query on a data repository to generate a set of results; and providing a result from said generated set of results to a user interface. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable storage media embodying instructions to cause data processor to perform operations comprising:
-
receiving a query for at least one document similar to a specified document; forming a reduced query document based on ranks of terms in the specified document, the forming comprising; calculating a rank of at least one term in the specified query document, calculating a square of each rank, calculating a normalized rank for each square, sorting a list of said normalized ranks, calculating a partial sum for each normalized rank in the list of normalized ranks, and including, in the reduced query document, terms corresponding to a partial sum above a threshold value; generating a modified query based on the query and the reduced query document; executing the modified query on a data repository to generate a set of results; and providing a result from said generated set of results to a user interface. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification