Multiple index based information retrieval system

US 7,567,959 B2
Filed: 01/25/2005
Issued: 07/28/2009
Est. Priority Date: 07/26/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method for indexing documents with respect to a first phrase, wherein each document has a document identifier, the method comprising:

storing a primary index of phrases;

storing a secondary index of phrases;

establishing a list of documents that contain the first phrase;

partitioning the documents, by operation of a processor adapted to manipulate data within a computer system, in the list into at least a first portion comprising higher ranked documents in the list, and a second portion comprising lesser ranked documents in the list, based on ranking the documents in the list by a relevance score;

storing the first portion in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents; and

based on the partitioning, storing the second portion in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents, wherein an identifier indicating a reference to the secondary index is stored in the primary index, and is associated with the stored first portion.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. The document index is partitioned into multiple indexes, including a primary index and a secondary index. The primary index stores phrase posting lists with relevance rank ordered documents. The secondary index stores excess documents from the posting lists in document order.

Citations

15 Claims

1. A computer implemented method for indexing documents with respect to a first phrase, wherein each document has a document identifier, the method comprising:
- storing a primary index of phrases;
  
  storing a secondary index of phrases;
  
  establishing a list of documents that contain the first phrase;
  
  partitioning the documents, by operation of a processor adapted to manipulate data within a computer system, in the list into at least a first portion comprising higher ranked documents in the list, and a second portion comprising lesser ranked documents in the list, based on ranking the documents in the list by a relevance score;
  
  storing the first portion in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents; and
  
  based on the partitioning, storing the second portion in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents, wherein an identifier indicating a reference to the secondary index is stored in the primary index, and is associated with the stored first portion.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the relevance score comprises a page rank based type score.
  - 3. The method of claim 1, further comprising storing for each document in the primary index relevance attributes of the document.
  - 4. The method of claim 3, wherein the relevance attributes include at least one of the following:
    - a total number of occurrences of the phrase in document, a rank ordered list of anchor documents that also contain the phrase and that point to the document, a position of each phrase occurrence in the document, a set of one or more flag indicating a format of the occurrence or a portion of the document containing the occurrence.
  - 5. The method of claim 3, wherein storing the second portion of the list in the secondary index comprises storing substantially only document identification information.
  - 6. The method of claim 1, wherein storing the first portion of the list in the primary index comprises storing the first portion of the list on a physical storage device in rank order of the relevance scores.
  - 7. The method of claim 1, wherein storing the second portion of the list in the secondary index comprises storing the second portion of the list on a physical storage device in numerical order of the document identifiers.

8. A computer implemented method for indexing documents with respect to a first phrase, wherein each document has a document identifier, the method comprising:
- storing a primary index of phrases;
  
  storing a secondary index of phrases;
  
  establishing a list of documents that contain the first phrase;
  
  ranking, by operation of a processor adapted to manipulate data within a computer system, the documents in the list by a relevance score;
  
  storing a first portion of the list comprising higher ranked documents in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents, wherein the first portion includes a first section wherein each document listed in the first section includes a first plurality of relevance attributes, and a second section wherein each document listed in the second section comprises a second plurality of relevance attributes that are a subset of the first set of relevance attributes, and wherein the documents listed in the first section are ranked higher than the documents listed in the second section; and
  
  storing a second portion of the list comprising lesser ranked documents in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8, wherein the first portion of each list of documents includes a third section wherein each document listed in the third section includes a third plurality of relevance attributes that are a subset of the second plurality of relevance attributes, and wherein the documents listed in the second section are ranked higher than the documents listed in the third section.
  - 10. The method of claim 8, wherein the first portion of each list contain n entries, wherein the second portion of the list contain m*n entries, wherein m>
    - 2, and the third portion of the list contains 1*n entries, wherein 1>
      
      4.

11. A computer readable storage medium storing a computer program executable by a processor for indexing documents with respect to a first phrase, the actions of the computer program comprising:
- storing a primary index of phrases;
  
  storing a secondary index of phrases;
  
  establishing a list of documents that contain the first phrase;
  
  partitioning the documents in the list into at least a first portion comprising higher ranked documents in the list, and a second portion comprising lesser ranked documents in the list, based on ranking the documents in the list by a relevance score;
  
  storing the first portion in the primary index, the higher ranked documents of the first portion stored relative to one another in the primary index in rank order of the respective relevance scores of the ranked documents; and
  
  based on the partitioning, storing the second portion in the secondary index, the lesser ranked documents of the second portion stored relative to one another in the secondary index in numerical order of the respective document identifiers of the ranked documents, wherein an identifier indicating a reference to the secondary index is stored in the primary index, and is associated with the stored first portion.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer readable storage medium of claim 11, further comprising storing for each document in the primary index relevance attributes of the document.
  - 13. The computer readable storage medium of claim 11, wherein storing the second portion of the list in the secondary index comprises storing substantially only document identification information.
  - 14. The computer readable storage medium of claim 11, wherein the first portion of each list of documents includes a first section wherein each document listed in the first section includes a first plurality of relevance attributes, and a second section wherein each document listed in the second section comprises a second plurality of relevance attributes that are subset of the first set of relevance attributes, and wherein the documents listed in the first section are ranked higher than the documents listed in the second section.
  - 15. The computer readable storage medium of claim 11, wherein storing the first portion of the list in a primary index comprises storing the first portion of the list on a physical storage device in rank order of the relevance scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Patterson, Anna L.
Primary Examiner(s)
Vy; Hung T

Application Number

US11/043,695
Publication Number

US 20060106792A1
Time in Patent Office

1,645 Days
Field of Search

707 3- 6, 707/7
US Class Current

1/1
CPC Class Codes

G06F 16/2228   Indexing structures

G06F 16/24578   using ranking

G06F 16/313   Selection or weighting of t...

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Multiple index based information retrieval system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Multiple index based information retrieval system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links