×

System and method of data caching for compliance storage systems with keyword query based access

  • US 8,140,538 B2
  • Filed: 04/17/2008
  • Issued: 03/20/2012
  • Est. Priority Date: 04/17/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of data caching for compliance and storage systems that provides keyword search query based access to documents, the method comprising:

  • searching documents from a storage device by a keyword based interface;

    staging from a cache documents that are read and that are expected to be needed again from the storage device;

    computing a document weight for each of the documents read and expected to be needed again, wherein the document weight is based on a document information retrieval (IR) relevancy metric for user keyword queries and a recency and a frequency of each query and the document weight models a probability of a particular document being accessed again through a query, and wherein the document weight is based on a relevance of each document for queries in a query history;

    placing a processor and a disk in data communication with a First In First Out queue and a cache; and

    if the document being accessed again was not already in the cache, evicting another document from the cache to make room for the document being accessed again to be placed in the cache by packing elements in the order of a document weight-to-size ratio, highest to smallest, and evicting documents with a smallest document weight-to-size ratio first;

    maintaining a query history of recent queries from a user in a query history first-in first-out queue;

    assigning each query from a user a query weight based on a position of the query from a user in the First In First Out queue, wherein the query weight models a probability of a query or a related query being invoked again;

    wherein each one of the document weight is recomputed by the processor when a document to be retrieved was not previously cached;

    updating the query history First-in First-Out queue and each of the document weights when a new query has been entered;

    adapting each of the document weights to changing query frequencies and popularities; and

    selecting and evicting documents from the cache according to a knapsack solution.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×