System and method of data caching for compliance storage systems with keyword query based access
First Claim
1. A method of data caching for compliance and storage systems that provides keyword search query based access to documents, the method comprising:
- searching documents from a storage device by a keyword based interface;
staging from a cache documents that are read and that are expected to be needed again from the storage device;
computing a document weight for each of the documents read and expected to be needed again, wherein the document weight is based on a document information retrieval (IR) relevancy metric for user keyword queries and a recency and a frequency of each query and the document weight models a probability of a particular document being accessed again through a query, and wherein the document weight is based on a relevance of each document for queries in a query history;
placing a processor and a disk in data communication with a First In First Out queue and a cache; and
if the document being accessed again was not already in the cache, evicting another document from the cache to make room for the document being accessed again to be placed in the cache by packing elements in the order of a document weight-to-size ratio, highest to smallest, and evicting documents with a smallest document weight-to-size ratio first;
maintaining a query history of recent queries from a user in a query history first-in first-out queue;
assigning each query from a user a query weight based on a position of the query from a user in the First In First Out queue, wherein the query weight models a probability of a query or a related query being invoked again;
wherein each one of the document weight is recomputed by the processor when a document to be retrieved was not previously cached;
updating the query history First-in First-Out queue and each of the document weights when a new query has been entered;
adapting each of the document weights to changing query frequencies and popularities; and
selecting and evicting documents from the cache according to a knapsack solution.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of data caching for compliance and storage systems that provide keyword search query based access to documents computes a value for each data document based on a document information-retrieval relevancy metric for user keyword queries and a recency, frequency of each query. The values are adapted to changing query frequencies and popularities. Then selecting and evicting documents from a cache can be based on the values according to a knapsack solution. A weight is computed for each query such that recent, more frequent queries get a higher weight. A information-retrieval metric is used for measuring a relevancy of a document for a query. A weighted sum is taken of the information-retrieval metric times a query weight over all queries.
14 Citations
7 Claims
-
1. A method of data caching for compliance and storage systems that provides keyword search query based access to documents, the method comprising:
-
searching documents from a storage device by a keyword based interface; staging from a cache documents that are read and that are expected to be needed again from the storage device; computing a document weight for each of the documents read and expected to be needed again, wherein the document weight is based on a document information retrieval (IR) relevancy metric for user keyword queries and a recency and a frequency of each query and the document weight models a probability of a particular document being accessed again through a query, and wherein the document weight is based on a relevance of each document for queries in a query history; placing a processor and a disk in data communication with a First In First Out queue and a cache; and
if the document being accessed again was not already in the cache, evicting another document from the cache to make room for the document being accessed again to be placed in the cache by packing elements in the order of a document weight-to-size ratio, highest to smallest, and evicting documents with a smallest document weight-to-size ratio first;maintaining a query history of recent queries from a user in a query history first-in first-out queue; assigning each query from a user a query weight based on a position of the query from a user in the First In First Out queue, wherein the query weight models a probability of a query or a related query being invoked again; wherein each one of the document weight is recomputed by the processor when a document to be retrieved was not previously cached; updating the query history First-in First-Out queue and each of the document weights when a new query has been entered; adapting each of the document weights to changing query frequencies and popularities; and selecting and evicting documents from the cache according to a knapsack solution. - View Dependent Claims (2, 3, 4)
-
-
5. A document search system, comprising:
-
a keyword based interface that searches documents from a storage device; a cache that stages documents that are read and that are expected to be needed again from said storage device, wherein said cache further includes a document weight that is maintained for each document, said document weight models a probability of a particular document being accessed again through a query, said document weight is based on a relevance of each document for queries in a query history, and if said document being accessed again was not already in said cache, another document is evicted from said cache to make room for said document being accessed again to be placed in said cache by packing elements in the order of a document weight-to-size ratio, highest to smallest, and documents with a smallest document weight-to-size ratio are evicted first; a query history first-in first-out (FIFO) queue that maintains a query history of recent queries from a user, wherein, each query is assigned a query weight based on its position in said FIFO queue, wherein the query weight models a probability of a query or a related query being invoked again; a processor connected to said query history FIFO queue, wherein said processor computes a value for each data document based on a document information retrieval (IR) relevancy metric for user keyword queries and a recency and a frequency of each query, and said processor recomputes each one of said document weight (Dw) for each data document when a document to be retrieved was not previously cached; an updating system that updates said query history FIFO queue, each of said query weight, and each of said document weight when a new query has been entered; a mechanism that adapts each one of said document weight for each data document to changing query frequencies and popularities; and a mechanism selecting and evicting documents from said cache based on said document weight for each data document according to a knapsack solution. - View Dependent Claims (6, 7)
-
Specification