Efficient multifaceted search in information retrieval systems
First Claim
1. A computer-implemented method of querying multifaceted information in an information retrieval system, comprising:
- constructing, by said information retrieval (IR) system, an inverted index having a plurality of unique indexed tokens associated with a plurality of posting lists in a one-to-one correspondence, each posting list including one or more documents of a plurality of documents, wherein an indexed token of said plurality of unique indexed tokens is one of a facet token included as an annotation in a document of said plurality of documents and a path prefix of said facet token, wherein said annotation indicates a path within a tree structure representing a facet that includes said document, said tree structure including a plurality of nodes representing a category and one or more sub-categories that categorize said document;
receiving, by said IR system, a query that includes a plurality of constraints on said plurality of documents, said plurality of constraints being associated with multiple indexed tokens of said plurality of unique indexed tokens and multiple posting lists corresponding to said multiple indexed tokens; and
executing said query by said IR system, said executing including;
identifying said multiple posting lists via a utilization of said plurality of constraints and said inverted index, andintersecting said multiple posting lists to obtain a result of said query.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.
39 Citations
20 Claims
-
1. A computer-implemented method of querying multifaceted information in an information retrieval system, comprising:
-
constructing, by said information retrieval (IR) system, an inverted index having a plurality of unique indexed tokens associated with a plurality of posting lists in a one-to-one correspondence, each posting list including one or more documents of a plurality of documents, wherein an indexed token of said plurality of unique indexed tokens is one of a facet token included as an annotation in a document of said plurality of documents and a path prefix of said facet token, wherein said annotation indicates a path within a tree structure representing a facet that includes said document, said tree structure including a plurality of nodes representing a category and one or more sub-categories that categorize said document; receiving, by said IR system, a query that includes a plurality of constraints on said plurality of documents, said plurality of constraints being associated with multiple indexed tokens of said plurality of unique indexed tokens and multiple posting lists corresponding to said multiple indexed tokens; and executing said query by said IR system, said executing including; identifying said multiple posting lists via a utilization of said plurality of constraints and said inverted index, and intersecting said multiple posting lists to obtain a result of said query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer system comprising:
-
a central processing unit (CPU); a memory coupled to said CPU; a computer-readable, tangible storage device coupled to said CPU, said storage device containing instructions that are carried out by said CPU via said memory to implement a method of querying multifaceted information, said method comprising; constructing an inverted index having a plurality of unique indexed tokens associated with a plurality of posting lists in a one-to-one correspondence, each posting list including one or more documents of a plurality of documents, wherein an indexed token of said plurality of unique indexed tokens is one of a facet token included as an annotation in a document of said plurality of documents and a path prefix of said facet token, wherein said annotation indicates a path within a tree structure representing a facet that includes said document, said tree structure including a plurality of nodes representing a category and one or more sub-categories that categorize said document; receiving a query that includes a plurality of constraints on said plurality of documents, said plurality of constraints being associated with multiple indexed tokens of said plurality of unique indexed tokens and multiple posting lists corresponding to said multiple indexed tokens; and executing said query by identifying said multiple posting lists via a utilization of said plurality of constraints and said inverted index, and intersecting said multiple posting lists to obtain a result of said query. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product comprising a computer-readable, tangible storage device having a computer-readable program code stored therein, said computer-readable program code containing instructions that are carried out by a processor of a computer system to implement a method of querying multifaceted information in an information retrieval system, said method comprising:
-
constructing, by said information retrieval (IR) system, an inverted index having a plurality of unique indexed tokens associated with a plurality of posting lists in a one-to-one correspondence, each posting list including one or more documents of a plurality of documents, wherein an indexed token of said plurality of unique indexed tokens is one of a facet token included as an annotation in a document of said plurality of documents and a path prefix of said facet token, wherein said annotation indicates a path within a tree structure representing a facet that includes said document, said tree structure including a plurality of nodes representing a category and one or more sub-categories that categorize said document; receiving, by said IR system, a query that includes a plurality of constraints on said plurality of documents, said plurality of constraints being associated with multiple indexed tokens of said plurality of unique indexed tokens and multiple posting lists corresponding to said multiple indexed tokens; and executing said query by said IR system by; identifying said multiple posting lists via a utilization of said plurality of constraints and said inverted index, and intersecting said multiple posting lists to obtain a result of said query. - View Dependent Claims (18, 19, 20)
-
Specification