METHODS AND SYSTEMS FOR QUICK AND EFFICIENT DATA MANAGEMENT AND/OR PROCESSING

US 20080133561A1
Filed: 12/01/2006
Published: 06/05/2008
Est. Priority Date: 12/01/2006
Status: Active Grant

First Claim

Patent Images

1. A method of data management, comprising:

breaking a data stream into a plurality of data groups using a combination of a first data segmentation procedure and a second data segmentation procedure, wherein expected average data group size of the first data segmentation procedure and the second data segmentation procedure is different.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

System(s) and method(s) are provided for data management and data processing. For example, various embodiments may include systems and methods relating to relatively larger groups of data being selected with comparable or better performing selection results (e.g., high data redundancy elimination and/or average chunk size). In various embodiments, the system(s) and method(s) may include, for example a data group, block, or chunk combining technique or/and a data group, block, or chunk splitting technique. Various embodiments may include a first standard or typical data grouping, blocking, or chunking technique and/or data group, block, or chunk combining technique or/and a data group, block, or chunk splitting technique. Exemplary system(s) and method(s) may relate to data hashing and/or data elimination. Embodiments may include a look-ahead buffer and determine whether to emit small chunks or large chunks based on characteristics of underlying data and/or particular application of the invention (e.g., for backup).

Citations

35 Claims

1. A method of data management, comprising:
- breaking a data stream into a plurality of data groups using a combination of a first data segmentation procedure and a second data segmentation procedure, wherein expected average data group size of the first data segmentation procedure and the second data segmentation procedure is different.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 2. The method of claim 1, wherein the second data segmentation procedure includes one or more alternate data segmentation procedures.
  - 3. The method of claim 1, wherein the second data segmentation procedure is applied only when certain predetermined criteria related to the data in the data stream are met as determined by looking ahead at the data in the data stream.
  - 4. The method of claim 1, wherein the first segmentation procedure has a smaller average data group size than the second segmentation procedure.
  - 5. The method of claim 4, wherein the second data segmentation procedure combines two or more small data groups together to make a larger data group of larger average size.
  - 6. The method of claim 5, wherein the second data segmentation procedure combines the two or more small data groups into a larger data group when the two or more small data groups are new data that have not been determined to have previously occurred in the data stream.
  - 7. The method of claim 6, wherein one or more small data groups of new data are combined into a larger data group when a maximum predetermined length of the data stream has been reached or processed.
  - 8. The method of claim 5, wherein the second data segmentation procedure skips the step of combining the two or more small data groups into a larger data group whenever the two or more small data groups are new data that have not been determined to have previously occurred in the data stream, and the small data groups occur sequentially after a data group has been determined to have previously occurred.
  - 9. The method of claim 5, wherein the second data segmentation procedure skips the step of combining the two or more small data groups into a larger data group when the two or more small data groups are new data that have not been determined to have previously occurred in the data stream, and the small data groups occur sequentially before a data group that has been determined to have previously occurred.
  - 10. The method of claim 5, wherein the second data segmentation procedure skips the step of combining the two or more small data groups into a larger data group when the two or more small data groups are new data that have not been determined to have previously occurred in the data stream, and the small data groups occur sequentially either before or after a data group that has been determined to have previously occurred.
  - 11. The method of claim 1, wherein one or more small data groups of new data are combined into a larger data group when a maximum number of small data groups have already been combined.
  - 12. The method of claim 11, further comprising the step of:
    - emitting one or more small data group(s) or one or more larger data group(s).
  - 13. The method of claim 12, wherein a larger data group is only emitted if a resynchronization point is not crossed.
  - 14. The method of claim 13, wherein a duplicate status is assigned to a small data group that either has previously been emitted or has been determined to have been previously emitted as part of a previously emitted large data group, and wherein the plurality of data groups are a sequential stream of consecutive data.
  - 15. The method of claim 14, wherein the first data segmentation procedure is a content-defined chunking or blocking procedure that independently determines break points and data grouping from data in the data stream and inputs that information into the second data segmentation procedure, and the second data segmentation procedure is a modified content-defined chunking or blocking procedure or is a procedure that combines two or more smaller data groups into one or more larger groups of data.
  - 16. The method of claim 15, wherein the method is used for performing duplicate data elimination in a data storage system.
  - 17. The method of claim 1, wherein a look-ahead buffer is used to determine when the second data segmentation procedure will be applied.
  - 18. The method of claim 1, wherein the second data segmentation procedure further breaks apart a non-duplicate big data group, block or chunk either before or after a duplicate region or duplicate big data group, block or chunk is detected.
  - 19. The method of claim 18, wherein the first segmentation procedure has a larger average data group size than the second segmentation procedure.
  - 20. The method of claim 1, wherein the second data segmentation procedure is not applied to at least one smaller data group when no larger data group ending with this smaller data group is considered to be a duplicate, when an immediately following number of data groups are of a predetermined type, or an immediately following amount of data is considered to be duplicate data.
  - 21. The method of claim 1, wherein a single alternate data segmentation procedure is used rather than the first segmentation procedure and the second data segmentation procedure, if a result of using both the first segmentation procedure and the second data segmentation procedure does not achieve an improved expected result.
  - 22. The method of claim 1, wherein whenever there are multiple opportunities to apply the second data segmentation procedure, and at least one resulting larger data group(s) is estimated to be a duplicate, then the second data segmentation procedure is applied to at least one such larger data group(s).
  - 23. The method of claim 1, further comprising the step of:
    - emitting one or more smaller data group(s) or one or more larger data group(s).

24. A method of data management, comprising:
- applying a first content-defined data chunking procedure to obtain one or more initial chunking points; and
  
  applying a second content-defined data chunking procedure, based on a predetermined set of criteria, so as to modify the initial chunking points to different chunking points thereby increasing an average size of data chunks and an average amount of duplicate data identified.
- View Dependent Claims (25, 26)
- - 25. The method of claim 24, wherein the second content-defined data chunking procedure includes a chunk combining procedure to amalgamate data chunks defined by the initial chunking points.
  - 26. The method of claim 24, wherein the second content-defined data chunking procedure includes a chunk splitting procedure to split data chunks defined by the initial chunking points.

27. A method of content-defined chunking, comprising the steps of:
- amalgamating small chunks into large chunks within long stretches of data that has been determined to be non-duplicate data;
  
  bordering the edges within long stretches of data that has been determined to be non-duplicate data regions that are adjacent to data regions that are determined to be duplicate data with small chunks by not amalgamating the small chunks found near the edges; and
  
  re-emitting large chunk(s) which are found to be duplicate(s) data.

28. A system of data management, comprising:
- a data identification system; and
  
  a data manipulation system, wherein the data manipulation system, based on a predetermined set of criteria, selectively modifies one or more initial data break points to so as to increase the average size of data groups.
- View Dependent Claims (29, 30, 31)
- - 29. The system of claim 28, wherein the data manipulation system selectively modifies one or more initial break points so as to increase the average amount of duplicate data identified instead of increase the average size of data groups.
  - 30. The system of claim 28, wherein the data manipulation system modifies the initial break points by applying a data block combining procedure to amalgamate existing data blocks.
  - 31. The system of claim 28, wherein the data manipulation system modifies the initial break points by applying a data block splitting procedure to split existing data blocks.

32. A system of data management, comprising:
- means for performing data identification; and
  
  means for manipulating data, wherein the means for manipulating data, based on a predetermined set of criteria, selectively modifies one or more initial data break points so as to increase the average size of data groups.
- View Dependent Claims (33, 34, 35)
- - 33. The system of claim 32, wherein the means for manipulating data selectively modifies one or more initial break points so as to increase the average amount of duplicate data identified instead of increase the average size of data groups.
  - 34. The system of claim 32, wherein the means for manipulating data modifies the initial break points by applying a data block combining procedure to amalgamate existing data blocks.
  - 35. The system of claim 34, wherein the means for manipulating data modifies the initial break points by applying a data block splitting procedure to split existing data blocks.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Laboratories America Inc (NEC Corporation)
Inventors
Dubnicki, Cezary, Ungureanu, Cristian, Kruus, Erik

Granted Patent

US 8,214,517 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/101
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 16/113   Details of archiving lifecy...

G06F 16/1744   using compression, e.g. spa...

G06F 3/0608   Saving storage space on sto...

G06F 3/0641   De-duplication techniques

G06F 3/0671   In-line storage system

H04L 65/612   for unicast

H04L 65/70   Media network packetisation

H04L 67/62   Establishing a time schedul...

H04L 69/04   Protocols for data compress...

METHODS AND SYSTEMS FOR QUICK AND EFFICIENT DATA MANAGEMENT AND/OR PROCESSING

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND SYSTEMS FOR QUICK AND EFFICIENT DATA MANAGEMENT AND/OR PROCESSING

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links