Methods and apparatus for clustering and prefetching data objects
First Claim
1. A method of managing data objects in a computer system, the method comprising the steps of:
- maintaining a log of at least a portion of accesses to the data objects;
determining from the maintained log at least one cluster comprised of data objects accessed at substantially similar times;
storing the data objects comprising the at least one cluster in close proximity to one another in a memory;
receiving a request for a data object in a cluster;
determining from the log a probability that at least one other data object in the cluster may be subsequently requested; and
in response to the probability being not less than a predetermined value, retrieving both the requested data object and the at least one other data object.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for managing data objects in conjunction with a computer system are provided. In a technique for clustering data objects on a disk storage device, the invention comprises maintaining a log of at least a portion of accesses (e.g., read and for write operations) to the data objects; determining from the maintained log a cluster comprised of data objects accessed at substantially similar times; and storing the data objects comprising the cluster in close proximity to one another on the disk storage device. In a technique for prefetching data objects on a disk storage device, the invention comprises receiving a request for a data object in a cluster, determining from the log a probability that at least one other data object in the cluster may be subsequently requested; and, in response to the probability being not less than a predetermined value, retrieving both the requested data object and the at least one other data object. Such clustering and prefetching techniques substantially reduce the number of storage device seeks.
-
Citations
21 Claims
-
1. A method of managing data objects in a computer system, the method comprising the steps of:
-
maintaining a log of at least a portion of accesses to the data objects;
determining from the maintained log at least one cluster comprised of data objects accessed at substantially similar times;
storing the data objects comprising the at least one cluster in close proximity to one another in a memory;
receiving a request for a data object in a cluster;
determining from the log a probability that at least one other data object in the cluster may be subsequently requested; and
in response to the probability being not less than a predetermined value, retrieving both the requested data object and the at least one other data object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. Apparatus for managing data objects in a computer system, the apparatus comprising:
-
at least one processor operative to;
(i) maintain a log of at least a portion of accesses to the data objects;
(ii) determine from the maintained log at least one cluster comprised of data objects accessed at substantially similar times;
(iii) store the data objects comprising the at least one cluster in close proximity to one another in a data storage device;
(iv) receive a request for a data object in a cluster;
(v) determine from the log a probability that at least one other data object in the cluster may be subsequently requested; and
(vi) in response to the probability being not less than a predetermined value, retrieve both the requested data object and the at least one other data object; and
memory, operatively coupled to the at least one processor, for storing at least one of the log and a cluster membership identifying the at least one cluster. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. In a system comprising at least one server and at least one disk storage device operatively coupled to the at least one server, apparatus for managing data objects in accordance with the at least one server and the at least one disk storage device, the apparatus comprising:
-
memory for storing at least one log, the log comprising information relating to at least a portion of accesses to the data objects; and
a module, operatively coupled to the log memory, and operative to;
(i) cause the storing of the data objects in at least one cluster on the at least one disk storage device via the at least one server based on the at least one log;
(ii) learn of a request for a data object in a cluster;
(iii) determine from the log a probability that at least one other data object in the cluster may be subsequently requested; and
(iv) in response to the probability being not less than a predetermined value, cause the retrieval of both the requested data object and the at least one other data object from the at least one disk storage device. - View Dependent Claims (20)
-
-
21. An article of manufacture for managing data objects in a computer system, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
maintaining a log of at least a portion of accesses to the data objects;
determining from the maintained log at least one cluster comprised of data objects accessed at substantially similar times;
storing the data objects comprising the at least one cluster in close proximity to one another in a memory;
receiving a request for a data object in a cluster;
determining from the log a probability that at least one other data object in the cluster may be subsequently requested; and
in response to the probability being not less than a predetermined value, retrieving both the requested data object and the at least one other data object.
-
Specification