Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface

US 8,620,900 B2
Filed: 02/09/2009
Issued: 12/31/2013
Est. Priority Date: 02/09/2009
Status: Active Grant

First Claim

Patent Images

1. A method for using dual indices to support query expansion, relevance models, non-relevance models and an intelligent search interface, comprising using a computing device to:

access an inverted index to obtain an initial retrieval of results in response to a query, and to generate a rank list of the results, the results referring to information units (IUs) where terms of the query occur;

determine a number of “

N”

IUs in the results that are regarded by the computing device as relevant by accessing a forward index;

determine at least one non-relevant IU in the results that are regarded by the computing device as not relevant by accessing the forward index; and

using the forward index to perform any one from the group consisting of;

computing query expansion weights, building the relevance models by the contexts of query terms in a top “

N”

retrieved IUs within the number of “

N”

IUs, building the non-relevance models using the at least one non-relevant IU, and finding the longest contiguous sequences of query terms in the query found in an IU;

wherein the forward index and inverted index have pointers to locations in the IUs where terms of the query occur more than once, and a forward index and inverted index pointer storage stores the locations in the IUs where the query term occurs only once in the IUs, and the forward index retrieves a term frequency vector of the IU or a set of contexts of the IU; and

wherein computing query expansion weights for the top “

N”

retrieved IUs utilizes the forward index to compute query expansion by;

computing at least one relevance query expansion term weight using the top “

N”

retrieved IUs in the results and the forward index;

computing at least one non-relevance query expansion term weight using the at least one non-relevant IU in the results and the forward index; and

selecting query expansion terms using the results, the at least one relevance query expansion term weight, and the at least one non-relevance query expansion term weight.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface, comprising using a computing device to: access an inverted index to obtain an initial retrieval of results in response to a query, and to generate a rank list of the results, the results referring to information units (IUs) where the query occurs; and determine a number of “N” IUs in the results that are regarded by the computing device to be relevant by accessing a forward index; and use the forward index to perform any one from the group consisting of: computing query expansion weights for top “N” retrieved IUs, building the relevance models by the contexts of query terms in the top “N” retrieved IUs, and finding the longest contiguous sequences of query terms in a query found in an IU; wherein the forward index and inverted index have pointers to locations in the IUs where terms of the query occur, and the forward index retrieves a term frequency vector of the IU or a set of contexts of the IU.

25 Citations

View as Search Results

13 Claims

1. A method for using dual indices to support query expansion, relevance models, non-relevance models and an intelligent search interface, comprising using a computing device to:
- access an inverted index to obtain an initial retrieval of results in response to a query, and to generate a rank list of the results, the results referring to information units (IUs) where terms of the query occur;
  
  determine a number of “
  
  N”
  
  IUs in the results that are regarded by the computing device as relevant by accessing a forward index;
  
  determine at least one non-relevant IU in the results that are regarded by the computing device as not relevant by accessing the forward index; and
  
  using the forward index to perform any one from the group consisting of;
  
  computing query expansion weights, building the relevance models by the contexts of query terms in a top “
  
  N”
  
  retrieved IUs within the number of “
  
  N”
  
  IUs, building the non-relevance models using the at least one non-relevant IU, and finding the longest contiguous sequences of query terms in the query found in an IU;
  
  wherein the forward index and inverted index have pointers to locations in the IUs where terms of the query occur more than once, and a forward index and inverted index pointer storage stores the locations in the IUs where the query term occurs only once in the IUs, and the forward index retrieves a term frequency vector of the IU or a set of contexts of the IU; and
  
  wherein computing query expansion weights for the top “
  
  N”
  
  retrieved IUs utilizes the forward index to compute query expansion by;
  
  computing at least one relevance query expansion term weight using the top “
  
  N”
  
  retrieved IUs in the results and the forward index;
  
  computing at least one non-relevance query expansion term weight using the at least one non-relevant IU in the results and the forward index; and
  
  selecting query expansion terms using the results, the at least one relevance query expansion term weight, and the at least one non-relevance query expansion term weight.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein “
    - N”
      
      is from 1 to 10000.
  - 3. The method according to claim 1, wherein “
    - N”
      
      is query-dependent, and the forward index is accessed one-by-one to determine the number “
      
      N”
      
      .
  - 4. The method according to claim 1, wherein the forward index is searched by an information unit identifier (IID) of the IU, and is searched by terms of the query, and the frequencies of the query terms are collected.
  - 5. The method according to claim 1, wherein the forward index is a set of frequency vectors for the IUs.
  - 6. The method according to claim 1, wherein the inverted index and forward index are compressed.
  - 7. The method according to claim 1, wherein the locations are cached in a cache that becomes a file.
  - 8. The method according to claim 7, wherein the cache, inverted index and forward index are stored on solid state disk drives.
  - 9. The method according to claim 1, wherein the forward index is any one from the group consisting of:
    - variable bit-block compression signature, superimposed coding signature, vocabulary index, trie, B-tree, heap B-tree, red-black tree, suffix arrays, suffix tree, PATRICIA trie, string B-tree, and DAWGs.
  - 10. The method according to claim 4, further comprising the initial steps of:
    - storing the frequencies of the query terms in the forward index; and
      
      storing the pointers to word locations in the forward index.

11. A system for supporting query expansion, relevance models, non-relevance models and an intelligent search interface using dual indices, the system comprising:
- a computing device comprising a retrieval module and a post-processing module;
  
  wherein the retrieval module is configured to access an inverted index to obtain an initial retrieval of results in response to a query and to generate a rank list of the results, the results referring to information units (lUs) where the query occurs;
  
  wherein the post-processing module is configured to;
  
  determine a number of “
  
  N”
  
  IUs in the results that are regarded by the computing device as relevant by accessing a forward index;
  
  determine at least one non-relevant IU in the results that are regarded by the computing device by accessing the forward index; and
  
  use the forward index to perform any one from the group consisting of;
  
  computing query expansion weights for a top “
  
  N”
  
  retrieved IUs within the number of “
  
  N”
  
  IUs, building the relevance models by the contexts of query terms in the top “
  
  N”
  
  retrieved IUs, building the non-relevance models using the at least one non-relevant IU, and finding the longest contiguous sequences of query terms in the query found in an IU;
  
  wherein the forward index and inverted index have pointers to locations in the IUs where terms of the query occur more than once, and a forward index and inverted index pointer storage stores the locations in the IUs where the query term occurs only once in the IUs, and the forward index retrieves a term frequency vector of the IU or a set of contexts of the IU; and
  
  wherein the post-processing module is configured to compute query expansion weights for the top “
  
  N”
  
  retrieved IUs utilizes the forward index to compute query expansion terms by;
  
  computing at least one relevance query expansion term weight using the top “
  
  N”
  
  retrieved IUs in the results and the forward index;
  
  computing at least one non-relevance query expansion term weight using the at least one non-relevant IU in the results and the forward index; and
  
  selecting query expansion terms using the results, the at least one relevance query expansion term weight, and the at least one non-relevance query expansion term weight.
- View Dependent Claims (12)
- - 12. The system according to claim 11, further comprising solid state disk drives to store the inverted index and forward index.

13. A search engine providing support for query expansion, relevance models, non-relevance models and an intelligent search interface, the search engine comprising:
- a computing device comprising a retrieval module and a post-processing module;
  
  dual indices consisting of an inverted index and a forward index;
  
  wherein the retrieval module is configured to access an inverted index to obtain an initial retrieval of results in response to a query, and to generate a rank list of the results, the results referring to information units (IUs) where the query occurs;
  
  wherein the post-processing module is configured to determine a number of “
  
  N”
  
  IUs in the results that are regarded by the computing device as relevant by accessing a forward index, determine at least one non-relevant IU in the results that are regarded by the computing device as not relevant by accessing the forward index, and the post-processing module uses the forward index to perform any one from the group consisting of;
  
  computing query expansion weights for top “
  
  N”
  
  retrieved IUs, building the relevance models by the contexts of query terms in the top “
  
  N”
  
  retrieved IUs, building the non-relevance models using the at least one non-relevant IU, and finding the longest contiguous sequences of query terms in the query found in an IU;
  
  wherein the forward index and inverted index have pointers to locations in the IUs where terms of the query occur more than once, and a forward index and inverted index pointer storage stores the locations in the IUs where the query term occurs only once in the IUs, and the forward index retrieves a term frequency vector of the IU or a set of contexts of the IU; and
  
  wherein computing query expansion weights for a portion of the number of “
  
  N”
  
  IUs in the results utilizing the forward index to compute query expansion terms by;
  
  computing at least one relevance query expansion term weight using the portion of the number of “
  
  N”
  
  IUs in the results and the forward index;
  
  computing at least one non-relevance query expansion term weight using the at least one non-relevant IU in the results and the forward index; and
  
  selecting query expansion terms using the results, the at least one relevance query expansion term weight, and the at least one non-relevance query expansion term weight.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hong Kong Polytechnic University
Original Assignee
Hong Kong Polytechnic University
Inventors
Luk, Robert Wing Pong
Primary Examiner(s)
VU, BAI DUC

Application Number

US12/368,282
Publication Number

US 20100205172A1
Time in Patent Office

1,786 Days
Field of Search

None
US Class Current

707/715
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/313   Selection or weighting of t...

G06F 16/334   Query execution G06F16/335 ...

Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links