Methods and systems for quick and efficient data management and/or processing

US 8,214,517 B2
Filed: 12/01/2006
Issued: 07/03/2012
Est. Priority Date: 12/01/2006
Status: Active Grant

First Claim

Patent Images

1. A method of data management implemented in and performed by a computing device utilizing a storage device, the method comprising:

segmenting a data stream using a combination of a first data segmentation procedure and a second data segmentation procedure, wherein an expected average data group size of the first data segmentation procedure and the second data segmentation procedure are different,wherein said first segmentation procedure breaks said data stream into a plurality of data groups,wherein the second data segmentation procedure combines at least two small data groups, from said plurality of data groups, together to make a larger data group having an average size that is larger than average sizes of the small data groups, andwherein the second data segmentation procedure refrains from combining at least two other small data groups together in response to determining that the at least two other small data groups are new and non-duplicate of data previously occurring in the data stream and in response to determining that a data group that is sequentially adjacent to the at least two other small data groups is a duplicate of data previously occurring in the data stream, wherein the at least two other small data groups are consecutive data groups in the plurality of data groups and wherein the second data segmentation procedure combines the at least two small data groups into the larger data group in response to determining that the at least two small data groups are new and non-duplicate of data that has previously occurred in the data stream.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

System(s) and method(s) are provided for data management and data processing. For example, various embodiments may include systems and methods relating to relatively larger groups of data being selected with comparable or better performing selection results (e.g., high data redundancy elimination and/or average chunk size). In various embodiments, the system(s) and method(s) may include, for example a data group, block, or chunk combining technique or/and a data group, block, or chunk splitting technique. Various embodiments may include a first standard or typical data grouping, blocking, or chunking technique and/or data group, block, or chunk combining technique or/and a data group, block, or chunk splitting technique. Exemplary system(s) and method(s) may relate to data hashing and/or data elimination. Embodiments may include a look-ahead buffer and determine whether to emit small chunks or large chunks based on characteristics of underlying data and/or particular application of the invention (e.g., for backup).

Citations

11 Claims

1. A method of data management implemented in and performed by a computing device utilizing a storage device, the method comprising:
- segmenting a data stream using a combination of a first data segmentation procedure and a second data segmentation procedure, wherein an expected average data group size of the first data segmentation procedure and the second data segmentation procedure are different,wherein said first segmentation procedure breaks said data stream into a plurality of data groups,wherein the second data segmentation procedure combines at least two small data groups, from said plurality of data groups, together to make a larger data group having an average size that is larger than average sizes of the small data groups, andwherein the second data segmentation procedure refrains from combining at least two other small data groups together in response to determining that the at least two other small data groups are new and non-duplicate of data previously occurring in the data stream and in response to determining that a data group that is sequentially adjacent to the at least two other small data groups is a duplicate of data previously occurring in the data stream, wherein the at least two other small data groups are consecutive data groups in the plurality of data groups and wherein the second data segmentation procedure combines the at least two small data groups into the larger data group in response to determining that the at least two small data groups are new and non-duplicate of data that has previously occurred in the data stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein one or more different small data groups of new data are combined into the larger data group when a maximum predetermined length of the data stream has been reached or processed.
  - 3. The method of claim 1, wherein the data group that is sequentially adjacent to the at least two other small data groups is sequentially before the at least two other small data groups in the data stream.
  - 4. The method of claim 1, wherein the data group that is sequentially adjacent to the at least two other small data groups is sequentially after the at least two other small data groups in the data stream.
  - 5. The method of claim 1, wherein the data group that is sequentially adjacent to the at least two other small data groups is sequentially before or after the at least two other small data groups in the data stream.
  - 6. The method of claim 1, further comprising the step of:
    - emitting one or more small data group(s) or one or more larger data group(s).
  - 7. The method of claim 6, wherein a larger data group is only emitted if a resynchronization point is not crossed.
  - 8. The method of claim 1, wherein the second data segmentation procedure is not applied to at least one small data group when no larger data group ending with the at least one small data group is considered to be a duplicate, when an immediately following number of data groups are of a predetermined type, or an immediately following amount of data is considered to be duplicate data.

9. A method of data management, implemented in and performed by a computing device utilizing a storage device, the method comprising:
- segmenting a data stream using a combination of a first data segmentation procedure and a second data segmentation procedure, wherein an expected average data group size of the first data segmentation procedure and the second data segmentation procedure are different,wherein said first segmentation procedure breaks said data stream into a plurality of data groups,wherein the second data segmentation procedure refrains from breaking apart a given data group of said plurality of data groups in response to determining that said given data group is non-duplicate of data previously occurring in the data stream, andwherein the second data segmentation procedure further breaks apart at least one of the plurality of data groups that is sequentially adjacent to said given data group into smaller data groups in response to determining that said at least one of the plurality of data groups is non-duplicate of data previously occurring in the data stream and in response to determining that a data group that is sequentially adjacent to said at least one of the plurality of data groups is a duplicate of data previously occurring in the data stream.
- View Dependent Claims (10, 11)
- - 10. The method of claim 9, wherein the first segmentation procedure has a larger average data group size than the second segmentation procedure.
  - 11. The method of claim 9, further comprising the step of:
    - emitting one or more small data group(s) or one or more larger data group(s).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Laboratories America Inc (NEC Corporation)
Inventors
Dubnicki, Cezary, Kruus, Erik, Ungureanu, Cristian
Primary Examiner(s)
Swearingen, Jeffrey R
Assistant Examiner(s)
ELFERVIG, TAYLOR A

Application Number

US11/566,139
Publication Number

US 20080133561A1
Time in Patent Office

2,041 Days
Field of Search

709/231, 709/247, 707/101, 707/693
US Class Current

709/231
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 16/113   Details of archiving lifecy...

G06F 16/1744   using compression, e.g. spa...

G06F 3/0608   Saving storage space on sto...

G06F 3/0641   De-duplication techniques

G06F 3/0671   In-line storage system

H04L 65/612   for unicast

H04L 65/70   Media network packetisation

H04L 67/62   Establishing a time schedul...

H04L 69/04   Protocols for data compress...

Methods and systems for quick and efficient data management and/or processing

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for quick and efficient data management and/or processing

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links