Supporting web-query expansion efficiently using multi-granularity indexing and query processing
First Claim
1. A method of querying a database of documents, the database including a preliminary index of the documents, words contained in the documents and associations therebetween, the words in the preliminary index being of an original granularity, the method comprising the steps of:
- a) replacing the words in the preliminary index with corresponding higher granularity concepts, resulting in a coarser granularity index of reduced index size;
b) logically expanding a query applied to the database of documents by replacing only the words of the query, being of the original granularity, meeting a predetermined criterion, which is whether the words can be found in a lexical dictionary with corresponding ones of the higher granularity concepts, b)(i) wherein the higher granularity concepts are higher granularity semantic concepts, b)(ii) further logically expanding the query by adding syntactically related words for each of the corresponding ones of the higher granularity concepts;
b)(iii) further logically expanding the query by adding syntactically related words for each of the words in the query failing to meet the predetermined criterion;
b)(iv) replacing ones of the syntactically related words meeting the predetermined criterion with associated ones of the higher granularity concepts; and
b)(v) removing any redundant ones of the syntactically related words and higher granularity concepts from the expanded query;
c) executing the logically expanded query to retrieve ones of the documents associated, through the coarser granularity index, with the corresponding ones of the higher granularity concepts; and
d) retrieving ones of the documents in order of relevance until a predetermined number of ones of the documents associated with the corresponding ones of the higher granularity concepts are retrieved, wherein the order of relevance is an exact match, a semantic match, a syntactical match and no match between the words of the query and the words contained in the retrieved ones of the documents.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for efficient query expansion using reduced size indices and for progressive query processing. Queries are expanded conceptually, using semantically similar and syntactically related words to those specified by the user in the query to reduce the chances of missing relevant documents. The notion of a multi-granularity information and processing structure is used to support efficient query expansion, which involves an indexing phase, a query processing and a ranking phase. In the indexing phase, semantically similar words are grouped into a concept which results in a substantial index size reduction due to the coarser granularity of semantic concepts. During query processing, the words in a query are mapped into their corresponding semantic concepts and syntactic extensions, resulting in a logical expansion of the original query. Additionally, the processing overhead is avoided. The initial query words can then be used to rank the documents in the answer set on the basis of exact, semantic and syntactic matches and also to perform progressive query processing.
290 Citations
50 Claims
-
1. A method of querying a database of documents, the database including a preliminary index of the documents, words contained in the documents and associations therebetween, the words in the preliminary index being of an original granularity, the method comprising the steps of:
-
a) replacing the words in the preliminary index with corresponding higher granularity concepts, resulting in a coarser granularity index of reduced index size;
b) logically expanding a query applied to the database of documents by replacing only the words of the query, being of the original granularity, meeting a predetermined criterion, which is whether the words can be found in a lexical dictionary with corresponding ones of the higher granularity concepts, b)(i) wherein the higher granularity concepts are higher granularity semantic concepts, b)(ii) further logically expanding the query by adding syntactically related words for each of the corresponding ones of the higher granularity concepts;
b)(iii) further logically expanding the query by adding syntactically related words for each of the words in the query failing to meet the predetermined criterion;
b)(iv) replacing ones of the syntactically related words meeting the predetermined criterion with associated ones of the higher granularity concepts; and
b)(v) removing any redundant ones of the syntactically related words and higher granularity concepts from the expanded query;
c) executing the logically expanded query to retrieve ones of the documents associated, through the coarser granularity index, with the corresponding ones of the higher granularity concepts; and
d) retrieving ones of the documents in order of relevance until a predetermined number of ones of the documents associated with the corresponding ones of the higher granularity concepts are retrieved, wherein the order of relevance is an exact match, a semantic match, a syntactical match and no match between the words of the query and the words contained in the retrieved ones of the documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method querying a database of documents, the database including an index of reduces index size of the documents, higher granularity concepts and associations therebetween, the higher granularity concepts corresponding to words of original granularity contained in the documents, the method comprising the steps of:
-
a) logically expanding a query applied to the database of documents by replacing only words of the query, being of the original granularity, meeting a predetermined criterion, which is whether the words can be found in a lexical dictionary, with corresponding ones of the higher granularity concepts, a)(i) wherein the higher granularity concepts are higher granularity semantic concepts, a)(ii) further logically expanding the query by adding syntactically related words for each of the corresponding ones of the higher granularity concepts;
a)(iii) further logically expanding the query by adding syntactically related words for each of the words in the query failing to meet the predetermined criterion;
a)(iv) replacing ones of the syntactically related words meeting the predetermined criterion with associated ones of the higher granularity concepts; and
a)(v) removing any redundant ones of the syntactically related words and higher granularity concepts from the expanded query;
b) executing the logically expanded query to retrieve documents associated, through the index, with the corresponding ones of the higher granularity concepts; and
c) retrieving ones of the documents associated with the corresponding ones of the higher granularity concepts are retrieved, wherein the retrieved ones of the documents are ranked using the words of the query, being of the original granularity, - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A system for querying a database of documents, the database including a preliminary index of the documents, words contained in the documents and associations therebetween, the words in the preliminary index being of an original granularity, the system comprising:
-
a) an indexer for replacing the words in the preliminary index with corresponding higher granularity concepts, resulting in a coarser granularity index of reduced index size;
b) a user interface for providing a query to be applied to the database of documents; and
c) a processor for logically expanding the query by replacing only the words of the query, being of the original granularity, meeting a predetermined criterion, which is whether the words can be found in a lexical dictionary, with corresponding ones of the higher granularity concepts, whereupon the processor executes the logically expanded query to retrieve ones of the documents associated, through the coarser granularity index, with the corresponding ones of the higher granularity concepts, wherein the processor retrieves ones of the documents in order of relevance until a predetermined number of ones of the documents associated with the corresponding ones of the higher granularity concepts are retrieved, using the words of the query, being of the original granularity, and wherein the order of relevance is an exact match, a semantic match, a syntactical match and no match between the words of the query and the words contained in the retrieved ones of the documents, c)(i) wherein the higher granularity concepts are higher granularity semantic concepts, and wherein logically expanding the query further comprises;
c)(ii) adding syntactically related words for each of the corresponding ones of the higher granularity concepts;
c)(iii) adding syntactically related words for each of the words in the query failing to meet the predetermined criterion;
c)(iv) replacing ones of the syntactically related words meeting the predetermined criterion with associated ones of the higher granularity concepts; and
c)(v) removing any redundant ones of the syntactically related words and higher granularity concepts from the expanded query. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A system of querying a database of documents, the database including an index of reduced index size of the documents, higher granularity concepts and associations there between, the higher granularity concepts corresponding to words of original granularity contained in the documents, the system comprising:
-
a) a user interface for providing a query to be applied to the database of documents; and
b) a processor for logically expanding the query replacing only words of the query meeting a predetermined criterion, which is whether the words can be found in a lexical dictionary, being of the original granularity, with corresponding ones of the higher granularity concepts, whereupon the processor executes the logically expanded query to retrieve documents associated, throughout the index, with the corresponding ones of the higher granularity concepts, b)(i) wherein the higher granularity concepts are higher granularity semantic concepts, and wherein the processor logically expands the query by further;
b)(ii) adding syntactically related words for each of the corresponding ones of the higher granularity concepts;
b)(iii) adding syntactically related words for each of the words in the query failing to meet the predetermined criterion;
b)(iv) replacing ones of the syntactically related words meeting the predetermined criterion with associated ones of the higher granularity concepts; and
b)(v) removing any redundant ones of the syntactically related words and higher granularity concepts from the expanded query, wherein the processor further retrieves ones of the documents in order of relevance until a predetermined number of ones of the documents associated with the corresponding ones of the higher granularity concepts are retrieved. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
-
Specification