×

Intelligent data sourcing in a networked storage system

  • US 9,218,376 B2
  • Filed: 06/12/2013
  • Issued: 12/22/2015
  • Est. Priority Date: 06/13/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method of sourcing data from storage associated with a pool of computing devices during a data storage operation associated with one of the computing devices in the pool, the method comprising:

  • obtaining signatures corresponding to data units that form a data set associated with a data storage operation, the data set corresponding to a version of one or more files of primary data of a first computing device in a pool of a plurality of computing devices, each respective computing device in the pool storing primary data generated by one or more software applications executing on the respective computing device, the primary data stored in at least one storage device associated with the respective computing device,wherein the storage devices of the computing devices in the pool store a plurality of data units of primary data including at least the data set stored in the at least one storage device of the first computing device,wherein each file of a plurality of files of primary data stored in the storage devices comprises at least one data unit of the plurality of data units,wherein at least a first data unit of the data set forms at least a portion of a first file of primary data stored in the at least one storage device of the first computing device and a second data unit of the plurality of data units matches the first data unit and forms at least a portion of a second file of primary data stored in the at least one storage device of a second computing device of the plurality of computing devices, andwherein the first file and the second file are generated by the one or more software applications executing on the first computing device and the second computing device, respectively;

    populating, by one or more processors, a shared signature repository that includes;

    signatures corresponding to at least each data unit of the plurality of data units, wherein a first signature corresponds to the first data unit and the second data unit; and

    for each signature included in the signature repository, an indication as to one or more of the computing devices whose at least one storage device includes an independently generated data unit that corresponds to the signature, wherein each independently generated data unit forms at least a portion of a distinct file residing on the respective storage device, and wherein the shared signature repository includes at least a first indication that indicates a first location of the first data unit in the at least one storage device of the first computing device and a second location of the second data unit in the at least one storage device of the second computing device;

    comparing the obtained signatures, including a signature of the first data unit, with the signature repository to identify one or more matching data units, including the second data unit, stored in the respective at least one storage device of the computing devices in the pool, wherein each of the one or more matching data units forms at least a portion of a read/write file residing in the respective storage device and is stored in a native format of the respective software application that generated the respective matching data unit;

    consulting, by one or more processors, a priority policy; and

    based on the priority policy, and for at least the first data unit in the data set, determining to access the second data unit rather than the first data unit for the data storage operation.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×