×

Expanded inverted index

  • US 7,634,468 B2
  • Filed: 11/29/2006
  • Issued: 12/15/2009
  • Est. Priority Date: 05/06/2003
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method comprising:

  • parsing, by at least one of one or more processors, each document in a collection of documents to create a vocabulary comprising a plurality of index terms that occur in the collection of documents, the plurality of index terms comprising commonly occurring terms and infrequently occurring terms, the commonly occurring terms differing from the infrequently occurring terms;

    identifying, by at least one of the one or more processors, the commonly occurring terms and assigning a unique common term identifier value to each of the commonly occurring terms;

    generating, by at least one of the one or more processors, a plurality of expanded inverted lists, each expanded inverted list of the plurality of expanded inverted lists corresponding to one of the plurality of index terms and comprising a posting for each document in the collection of documents that includes the corresponding one of the plurality of index terms, each posting comprising;

    an identification of the document of the collection of documents in which the corresponding one of the plurality of index terms appears,an indication of whether the corresponding one of the plurality of index terms in the one of the collection of documents is one of the infrequently occurring terms and whether the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms,a number of times the corresponding one of the plurality of index terms occurs in the one of the collection of documents,a location offset value specifying where in the document the corresponding one of the plurality of index terms occurs relative to a reference location in the document, andthe unique common term identifier assigned to the commonly occurring one of the plurality of index terms if the corresponding one of the plurality of index terms occurs immediately adjacent to one of the commonly occurring terms;

    creating, by at least one of the one or more processors, an expanded inverted index that comprises the vocabulary and the expanded inverted list corresponding to each of the plurality of index terms;

    parsing, by at least one of the one or more processors, a search query that comprises a phrase query of two or more search terms that must appear in a specified order, the parsing comprising identifying a sequence of the search terms that includes a first one of the commonly occurring terms that is immediately followed or immediately preceded by one of the infrequently occurring terms and also identifying a second one of the search terms that is a second of the infrequently occurring terms and that is not immediately adjacent to one of the commonly occurring terms;

    retrieving, by at least one of the one or more processors, two or more of the plurality of expanded inverted lists from the expanded inverted index, the two or more of the plurality of expanded inverted lists comprising expanded inverted lists corresponding to the first and the second infrequently occurring index terms identified in the parsing; and

    returning, by at least one of the one or more processors, one or more documents of the collections of documents that appear in all of the retrieved two or more of the plurality of expanded inverted lists after evaluating the search query using only the retrieved two or more of the plurality of expanded inverted lists.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×