Method and system for processing a text search query in a collection of documents
First Claim
1. A computer system for processing a text search query in a collection of documents, comprising:
- a computer;
a full posting index for the documents of the collection, said full posting index comprising a first set of index terms and a full posting list for said index terms of said first set, enumerating all occurrences of said index terms of said first set in all documents of the collection;
search conditions on search terms of a given text search query that are translated into conditions on said index terms to provide translated conditions;
an additional posting index for said documents of the collection, said additional posting index being related to a defined document part and comprising a second set of index terms and a restricted posting list for said index terms of said second set, enumerating all occurrences of said index terms of said second set in said document part of said documents of said collection;
identified conditions of said translated conditions, which are restricted to defined document parts, for which an additional posting index is available;
said identified conditions with part restriction rewritten as pair conditions on index terms of said additional posting index and corresponding document part; and
a query result based on said additional posting index and said pair conditions.
3 Assignments
0 Petitions
Accused Products
Abstract
A method, system and computer program product implementing the method are provided to process a text search query in a collection of documents. A full posting index is generated for the documents in the collection. The full posting index comprises one or more first index terms and a full posting list for each first index term, enumerating the occurrences of the first index term in the documents. In addition to the full posting index, at least one additional posting index is generated for the documents. The additional posting index is related to a defined document part and comprises one or more second index terms and a restricted posting list for each second index term, enumerating all occurrences of the second index term in the document part of the documents of the collection. The text search query is performed using the additional posting index.
-
Citations
13 Claims
-
1. A computer system for processing a text search query in a collection of documents, comprising:
-
a computer; a full posting index for the documents of the collection, said full posting index comprising a first set of index terms and a full posting list for said index terms of said first set, enumerating all occurrences of said index terms of said first set in all documents of the collection; search conditions on search terms of a given text search query that are translated into conditions on said index terms to provide translated conditions; an additional posting index for said documents of the collection, said additional posting index being related to a defined document part and comprising a second set of index terms and a restricted posting list for said index terms of said second set, enumerating all occurrences of said index terms of said second set in said document part of said documents of said collection; identified conditions of said translated conditions, which are restricted to defined document parts, for which an additional posting index is available; said identified conditions with part restriction rewritten as pair conditions on index terms of said additional posting index and corresponding document part; and a query result based on said additional posting index and said pair conditions.
-
-
2. The system according to claim 1, further comprising:
a field posting index for said documents of the collection, said field posting index comprising a set of fields and a field posting list for each field of said set of fields, enumerating start and end positions of continuous parts of said field in said documents of said collection.
-
3. The system according to claim 1 wherein said additional posting index is generated for subqueries defining document parts comprising at least one and any combination of a field condition, phrase and proximity condition.
-
4. The system according to claim 1 wherein said additional posting index is generated for sub-queries defining document parts with low coverage.
-
5. The system according to claim 1 wherein said additional posting index is generated for sub-queries which are frequently used.
-
6. The system according to claim 1 wherein said additional posting index is generated based on said full posting index.
-
7. A computer program product stored on a computer readable storage medium, comprising computer readable program means for causing a computer to perform a method of processing a text search query in a collection of documents,
wherein the documents of the collection are associated with a full posting index, said full posting index comprising a one or more first index terms and a full posting list, enumerating occurrences of said one or more first index terms in said documents of said collection; - and
wherein said text search query comprises one or more search conditions on one or more search terms, said one or more search conditions being translated into one or more conditions on said one or more first index terms to provide one or more translated conditions; said method comprising; generating at least one additional posting index, said additional posting index being related to a defined document part and comprising one or more second index terms and a restricted posting list for said one or more second index terms, enumerating occurrences of said one or more second index terms in said document part of said documents of said collection; identifying one or more conditions of said one or more translated conditions for which an additional posting index is available to provide one or more identified conditions; re-writing said one or more identified conditions as one or more pair conditions on said one or more second index terms of said additional posting index and corresponding document part; and processing said one or more pair conditions using said additional posting index to provide a query result.
- and
-
8. The computer program product according to claim 7 wherein said additional posting index is generated for subqueries defining document parts comprising at least one or any combination of a field condition, phrase and proximity condition.
-
9. The computer program product according to claim 7 wherein said additional posting index is generated for sub-queries defining document parts with low coverage.
-
10. The computer program product according to claim 7 wherein said additional posting index is generated for sub-queries which are frequently used.
-
11. The computer program product according to claim 7 wherein said additional posting index is generated using said full posting index to compute one or more restricted posting lists for said one or more second index terms.
-
12. The computer program product according to claim 7,
wherein a field posting index is generated for each document added to said collection, said field posting index comprising a set of fields and a field posting list for each field of said set of fields, enumerating start and end positions of continuous parts of said field in said documents of said collection, wherein said additional posting indexes are generated together with said full posting index and said field posting index.
-
13. The computer program product according to claim 7 wherein said additional posting index comprises ranking information about a weighted index term frequency in each document as a whole and wherein said ranking information is extracted from said full posting index.
Specification