Apparatus, system, and method for selecting optimal replica sources in a grid computing environment
First Claim
Patent Images
1. An apparatus comprising:
- a storage device storing code executable by a processor and comprising;
a search module configured to locate existing replica source information including a mapping of logical replica source names to physical locations;
a collection module configured to collect current network statistics and maintain a history of previous network statistics for candidate replica sources, the network statistics comprising a network response time calculated by sending an Internet Control Message Protocol (ICMP) packet and a bandwidth calculated by sending multiple packets of a predetermined size over a specified time period;
a determination module configured to determine identifiers for replica sources by starting with historical data to select a first trial which acts as a predefined profile for determining optimal replica sources and then checking statistics for other more optimal options in response to historical network statistics and current network statistics and rules of a user policy module;
the user policy module comprising one or more user-defined policies, the policies establishing rules for preferred attributes of the replica sources, wherein the user-defined policies comprise a desired number of replica sources to identify, the number of replica sources ordered from a most optimal replica source to a least optimal replica source, file size, membership of the replica sources in a preferred set of replica sources, proximity of the replica sources, network response time, workload of a host for the replica sources, and bandwidth of the connection to the replica sources; and
a sending module configured to send the identifiers of the replica sources to a file transfer service that copies from the replica sources in a parallel data transfer to create a replica.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus, system, and method are disclosed for selecting optimal replica sources in a grid computing environment. As disclosed, the present invention overcomes shortcomings in the art involving location and selection of replica sources. In particular, the present invention selects an optimal replica source based on current and historical network statistics, as well as user-defined policies. The user-defined policies allow for customization of the replica source search, and the option of obtaining multiple ranked sources for parallel data transfer.
-
Citations
18 Claims
-
1. An apparatus comprising:
a storage device storing code executable by a processor and comprising; a search module configured to locate existing replica source information including a mapping of logical replica source names to physical locations; a collection module configured to collect current network statistics and maintain a history of previous network statistics for candidate replica sources, the network statistics comprising a network response time calculated by sending an Internet Control Message Protocol (ICMP) packet and a bandwidth calculated by sending multiple packets of a predetermined size over a specified time period; a determination module configured to determine identifiers for replica sources by starting with historical data to select a first trial which acts as a predefined profile for determining optimal replica sources and then checking statistics for other more optimal options in response to historical network statistics and current network statistics and rules of a user policy module; the user policy module comprising one or more user-defined policies, the policies establishing rules for preferred attributes of the replica sources, wherein the user-defined policies comprise a desired number of replica sources to identify, the number of replica sources ordered from a most optimal replica source to a least optimal replica source, file size, membership of the replica sources in a preferred set of replica sources, proximity of the replica sources, network response time, workload of a host for the replica sources, and bandwidth of the connection to the replica sources; and a sending module configured to send the identifiers of the replica sources to a file transfer service that copies from the replica sources in a parallel data transfer to create a replica. - View Dependent Claims (2, 3, 4)
-
5. An apparatus to select an optimal replica source in a grid computing environment, the apparatus comprising:
-
a storage device storing code executable by a processor and comprising; a receiving module configured to receive a request for an optimal replica source location from a requesting file system; a search module configured to locate existing replica source information including a mapping of logical replica source names to physical locations; a collection module configured to collect current network statistics and maintain a history of previous network statistics for candidate replica sources, the network statistics comprising a network response time calculated by sending an ICMP packet and a bandwidth calculated by sending multiple packets of a predetermined size over a specified time period; a determination module configured to determine identifiers for replica sources by starting with historical data to select a first trial which acts as a predefined profile for determining optimal replica sources and then checking statistics for other more optimal options in response to historical network statistics and current network statistics and rules of a user policy module; the user policy module comprising one or more user-defined policies, the policies establishing rules for preferred attributes of the replica sources, wherein the user-defined policies comprise a desired number of replica sources to identify, the number of replica sources ordered from a most optimal replica source to a least optimal replica source, file size, membership of the replica sources in a preferred set of replica sources, proximity of the replica sources, network response time, workload of a host for the replica sources, and bandwidth of the connection to the replica sources; and a sending module comprising configured to send the identifiers of the replica sources to a file transfer service of the requesting file system that copies from the replica sources in a parallel data transfer to create a replica. - View Dependent Claims (6, 7, 8)
-
-
9. A system to select an optimal replica source in a grid computing environment, the system comprising:
-
at least one replica destination; at least one replica source; an optimal source selector comprising a storage device storing code executable by a processor and comprising; a search module configured to locate existing replica source information including a mapping of logical replica source names to physical locations; a collection module configured to collect current network statistics and maintain a history of previous network statistics for candidate replica sources, the network statistics comprising a network response time calculated by sending an ICMP packet and a bandwidth calculated by sending multiple packets of a predetermined size over a specified time period; a determination module configured to determine an identifiers for replica sources by starting with historical data to select a first trial which acts as a predefined profile for determining optimal replica sources and then checking statistics for other more optimal options in response to historical network statistics and current network statistics and rules of a user policy module; the user policy module comprising one or more user-defined policies, the policies establishing rules for preferred attributes of the replica sources, wherein the user-defined policies comprise a desired number of replica sources to identify, the number of replica sources ordered from a most optimal replica source to a least optimal replica source, file size, membership of the replica sources in a preferred set of replica sources, proximity of the replica sources, network response time, workload of a host for the replica sources, and bandwidth of the connection to the replica sources; an interface to the replica source selection device; and a sending module configured to send the identifiers of the replica sources to a file transfer service that copies from the replica sources in a parallel data transfer to create a replica. - View Dependent Claims (10, 11, 12)
-
-
13. A program of machine-readable instructions stored on a physical storage device and executable by a digital processing apparatus to perform operations to select an optimal replica source in a grid computing environment, the operations comprising:
-
searching existing replica source information including a mapping of logical replica source names to physical locations; collecting current network statistics and maintaining a history of previous network statistics for candidate replica sources; and determining identifiers for replica sources by starting with historical data to select a first trial which acts as a predefined profile for determining optimal replica sources and then checking statistics for other more optimal options in response to historical network statistics and current network statistics, the network statistics comprising a network response time calculated by sending an ICMP packet and a bandwidth calculated by sending multiple packets of a predetermined size over a specified time period; determining the identifiers of the replica sources in response to one or more user-defined policies, the policies establishing rules for preferred attributes of the replica sources, wherein the user-defined policies comprise a desired number of replica sources to identify, the number of replica sources ordered from a most optimal replica source to a least optimal replica source, file size, membership of the replica sources in a preferred set of replica sources, proximity of the replica sources, network response time, workload of a host for the replica sources, and bandwidth of the connection to the replica sources; and sending the identifiers of the replica sources to a file transfer service that copies from the replica sources in a parallel data transfer to create a replica. - View Dependent Claims (14, 15, 16)
-
-
17. A method for selecting replica sources in a grid computing environment, the method comprising:
-
searching, by use of a processor, existing replica source information including a mapping of logical replica source names to physical locations; collecting current network statistics and maintaining a history of previous network statistics for candidate replica sources, the network statistics comprising a network response time calculated by sending an ICMP packet and a bandwidth calculated by sending multiple packets of a predetermined size over a specified time period; determining identifiers for replica sources by starting with historical data to select a first trial which acts as a predefined profile for determining optimal replica sources and then checking statistics for other more optimal options in response to historical network statistics and current network statistics and one or more user-defined policies, the user-defined policies establishing rules for preferred attributes of the replica sources, wherein the user-defined policies comprise a desired number of replica sources to identify, the number of replica sources ordered from a most optimal replica source to a least optimal replica source, file size, membership of the replica sources in a preferred set of replica sources, proximity of the replica sources, network response time, workload of a host for the replica sources, and bandwidth of the connection to the replica sources; and sending the identifiers of the replica sources to a file transfer service that copies from the replica sources in a parallel data transfer to create a replica.
-
-
18. An apparatus to select an optimal replica source in a grid computing environment, the apparatus comprising:
-
means for searching existing replica source information comprising executable code stored on a storage device, executed by a processor, and including a mapping of logical replica source names to physical locations; means for collecting current network statistics and maintaining a history of previous network statistics for candidate replica sources comprising executable code stored on the storage device, executed by the processor, the network statistics comprising a network response time calculated by sending an ICMP packet and a bandwidth calculated by sending multiple packets of a predetermined size over a specified time period; means for determining identifiers for replica sources by starting with historical data to select a first trial which acts as a predefined profile for determining optimal replica sources and then checking statistics for other more optimal options in response to historical network statistics, current network statistics, and one or more user-defined policies, the policies establishing rules for preferred attributes of the replica sources, wherein the user-defined policies comprise a desired number of replica sources to identify, the number of replica sources ordered from a most optimal replica source to a least optimal replica source, file size, membership of the replica sources in a preferred set of replica sources, proximity of the replica sources, network response time, workload of a host for the replica sources, and bandwidth of the connection to the replica sources; and means for sending the identifier of the replica sources to a file transfer service that copies from the replica sources in a parallel data transfer to create a replica.
-
Specification