Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections
First Claim
1. A method of performing automated document collection selection and document selection relative to a plurality of independently maintained document collections, each including a plurality of documents, using a list of qualified terms developed from an input query text, said method comprising the steps of:
- providing a meta-index having meta-index values identifying corresponding ones of the document collections and information about documents in the corresponding ones of the document collections;
parsing said input query text to select single-word terms and multiple-word phrase terms from said query text by exclusion of predetermined context-free single-word terms and punctuation;
applying each such selected term against the meta-index values in said meta-index to determine correlation between the selected terms and the meta-index values;
determining cumulative rankings for said document collections based upon said correlation relative to each such selected term normalized against said plurality of document collections; and
selecting a subset of said document collections having the highest relative cumulative rankings whereby said subset of said document collections is established to be the most appropriate subset of said plurality of document collections to search using said input query text,searching each of said subset of document collections with said input query text to select documents correlating to said query text.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of performing automated collection selection relative to a plurality of document collections, each including one or more documents, using a list of qualified terms developed from an input query text. The method comprises the steps of: (a) parsing the input query text to select single-word terms and multiple-word phrase terms from the query text by exclusion of predetermined context-free single-word terms and punctuation; (b) applying each such selected term against a meta-index descriptive of the document collections; (c) determining cumulative rankings for the document collections relative to each such selected term normalized against the plurality of document collections; and (d) selecting a set of the document collections having the highest relative cumulative rankings.
448 Citations
18 Claims
-
1. A method of performing automated document collection selection and document selection relative to a plurality of independently maintained document collections, each including a plurality of documents, using a list of qualified terms developed from an input query text, said method comprising the steps of:
-
providing a meta-index having meta-index values identifying corresponding ones of the document collections and information about documents in the corresponding ones of the document collections; parsing said input query text to select single-word terms and multiple-word phrase terms from said query text by exclusion of predetermined context-free single-word terms and punctuation; applying each such selected term against the meta-index values in said meta-index to determine correlation between the selected terms and the meta-index values; determining cumulative rankings for said document collections based upon said correlation relative to each such selected term normalized against said plurality of document collections; and selecting a subset of said document collections having the highest relative cumulative rankings whereby said subset of said document collections is established to be the most appropriate subset of said plurality of document collections to search using said input query text, searching each of said subset of document collections with said input query text to select documents correlating to said query text. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of performing automated collection selection relative to a plurality of document collections, each including one or more documents, using a list of qualified terms developed from an input query text, said method comprising the steps of:
-
a) parsing said input query text to select single-word terms and multiple-word phrase terms from said query text by exclusion of predetermined context-free single-word terms and punctuation; b) applying each such selected term against a meta-index wherein said document collections are represented as respective collection records in said meta-index, wherein a predetermined collection record stores fielded data descriptive of said document collection represented by said predetermined collection record, and a predetermined term with corresponding statistical data, said predetermined term being qualified for entry into said predetermined collection record where; i) the number of occurrences of said predetermined term within a perdetermined document collection is in excess of a first predetermined number; ii) the number of occurrences of said predetermined term within at least one document within said predetermined document collection is in excess of a second predetermined number;
oriii) said predetermined term occurs within a number of documents within said predetermined document collection in excess of a third predetermined number, wherein at least one of said first, second, and third predetermined numbers is in excess of one, and wherein said statistical data includes the number of occurrences of said predetermined term within the documents of said predetermined document collection, the number of documents within said predetermined document collection containing said predetermined term, the number of qualified terms that occur in said predetermined document collection, and the number of documents within said predetermined document collection; c) determining cumulative rankings for said document collections relative to each such selected term normalized against said plurality of document collections; and d) selecting a set of said document collections having the highest relative cumulative rankings.
-
-
9. A method of performing automated collection selection relative to a plurality of document collections, each including one or more documents, using a list of qualified terms developed from an input query text, said method comprising the steps of:
-
a) parsing said input query text to select single-word terms and multiple-word phrase terms from said query text by exclusion of predetermined context-free single-word terms and punctuation; b) applying each such selected term against a meta-index descriptive of said document collections; c) determining cumulative rankings for said document collections relative to each such selected term normalized against said plurality of document collections by, for each of said document collections and for each of said terms relative to a respective one of said document collections, performing the steps of; i) calculating an initial term ranking for a predetermined term relative to a predetermined document collection based on a ratio of the number of documents having a qualified number of occurrences of said predetermined term in said predetermined document collection and a qualified number of documents within said predetermined document collection; ii) scaling said initial term ranking; iii) calculating a normalizing factor based on the ratio of the total number of documents in said document collections and the total number of documents in said document collections having a qualified number of occurrences of said predetermined term; iv) scaling said normalizing factor; v) calculating a product of said scaled initial term ranking and said scaled normalizing factor to provide a term ranking for said predetermined term; and vi) summing said products corresponding to each of said terms relative to said predetermined document collection to provide said cumulative term ranking for said predetermined document collection; and d) selecting a set of said document collections having the highest relative cumulative rankings. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method of selecting a subset of a set of document collections to search based on an input query text from a query source in advance of selecting a plurality of documents from said subset to identify to said query source in response to said input query text, said method comprising the steps of:
-
a) parsing said input query text to select predetermined single-word terms and multiple-word phrase terms from said query text by exclusion of predetermined context-free single-word terms and punctuation and determining each remaining word to be a single-word term and each set of two successive remaining words being a multiple-word phrase term; b) applying said predetermined single-word terms and multiple-word phrase terms against a meta-index including a plurality of collection records wherein each of said collection records is descriptive of a corresponding one of said document collections; c) determining cumulative rankings for each of said document collections relative to the set of said predetermined single-word terms and multiple-word phrase terms, wherein the ranking of each of said predetermined single-word terms and multiple-word phrase terms for each of said document collections is normalized against said plurality of document collections; and d) selecting said subset of said document collections based on the respective cumulative rankings of said document collections. - View Dependent Claims (15, 16, 17, 18)
-
Specification