Methods for iteratively and interactively performing collection selection in full text searches
First Claim
1. A method of permitting iterative performance of collection selection relative to a set of databases, where each said database includes a plurality of documents, to obtain consistent relative-ranking collection selection results for each iteration, said method comprising the steps of:
- a) obtaining a collection selection query including a set of predetermined search terms;
b) determining an inverse collection frequency for each member of said set of predetermined search terms with respect to each said database and said set of databases, and determining a document frequency for each member of said set of predetermined search terms with respect to each said database;
c) determining a ranking value for each said database based on a sum of the products of said inverse collection frequencies for said set of predetermined search terms and said document frequencies for respective members of said set of search terms;
d) selecting a subset of said set of databases based on predetermined criteria dependant on said ranking value for each said database; and
e) selectively repeating portions of said steps (b) through (d) with respect to each member of said set of predetermined search terms for each iteration of said method.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of selecting the likely most relevant database collections for document searching based on an ad hoc query where each of the databases includes a plurality of documents. Iterative collection selection processing of the databases is performed to obtain consistent relative-ranking collection selection results for each iteration. The method uses a collection selection query and performs the repetitive steps of determining an inverse collection frequency and a document frequency for each database; determining a ranking value for each database; selecting a subset of the set of databases based on predetermined criteria dependant on the ranking value for each the database. The method provides for automated and manual descriptions, boolean selection terms combined with soft terms, and uses term proximity, capitalization, phraseology and other information in establishing a relevance ranking of the collections with respect to the ad hoc query.
-
Citations
20 Claims
-
1. A method of permitting iterative performance of collection selection relative to a set of databases, where each said database includes a plurality of documents, to obtain consistent relative-ranking collection selection results for each iteration, said method comprising the steps of:
-
a) obtaining a collection selection query including a set of predetermined search terms; b) determining an inverse collection frequency for each member of said set of predetermined search terms with respect to each said database and said set of databases, and determining a document frequency for each member of said set of predetermined search terms with respect to each said database; c) determining a ranking value for each said database based on a sum of the products of said inverse collection frequencies for said set of predetermined search terms and said document frequencies for respective members of said set of search terms; d) selecting a subset of said set of databases based on predetermined criteria dependant on said ranking value for each said database; and e) selectively repeating portions of said steps (b) through (d) with respect to each member of said set of predetermined search terms for each iteration of said method. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of executing a query selectively against a collection of databases, each including a plurality of documents, said method comprising the steps of:
-
a) receiving a first predetermined query including a set of predetermined search terms from a user; b) utilizing said first predetermined query to select a set of databases from said collection of databases; c) optionally reporting said set of databases to said user and permitting said first predetermined query to be modified and substituted for said first predetermined query, said steps of utilizing and optionally reporting then being repeated; d) optionally receiving a second predetermined query from said user and substituting said second predetermined query for said first predetermined query; and e) searching said set of databases to select a set of documents that are responsive to said first predetermined query. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method of supporting a search of a plurality of databases in response to a predetermined query, said method comprising the steps of:
-
a) establishing a meta-index database containing collection records corresponding to the members of said plurality of databases, each said collection record including a collection term list, including terms that may include word terms and phrase terms, statistical data, and fielded data descriptive of a respective member of said plurality of databases; b) supporting a first search of said collection records with respect to said fielded data to identify a set of said plurality of databases by satisfaction of logical relationships between a set of conditional values, provided in conjunction with a set of search terms as part of said predetermined query, and said fielded data of respective said collection records; c) supporting a second search of said collection records corresponding to said set of databases with respect to said collection term lists and statistical data to develop term rank calculation data on a per said term, per said collection record, and per said set of databases basis; d) selecting a subset of said set of databases based on said rank calculation data for use in a subsequent search. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification