Content based file chunking
First Claim
1. A method performed by data processing apparatus, the method comprising:
- identifying a data item to be chunked;
determining a data type of the data item;
determining that the data type of the data item is one of a specified one or more data types; and
in response to determining that the data type of the data item is one of the specified one or more data types;
identifying particular content portions that are included within the data item; and
performing a chunking of the data item that is based on the particular content portions that are included within the data item.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transferring electronic data. In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of identifying a data item to be chunked; determining the type of the data item; determining whether the type of the data item is one of a specified one or more types; if it is determined that the type of the data item is not one of the specified one or more types, performing a first chunking of the data item; and if it is determined that the type of the data item is one of the specified one or more types, performing a second chunking of the data item that is based on the particular content portions of the data item.
-
Citations
26 Claims
-
1. A method performed by data processing apparatus, the method comprising:
-
identifying a data item to be chunked; determining a data type of the data item; determining that the data type of the data item is one of a specified one or more data types; and in response to determining that the data type of the data item is one of the specified one or more data types; identifying particular content portions that are included within the data item; and performing a chunking of the data item that is based on the particular content portions that are included within the data item. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method performed by data processing apparatus, the method comprising:
-
receiving a data item to be chunked; identifying a data type associated with the data item; using the identified data type to introspect the data of the data item and build a content based map of the data item; using the content based map to identify particular content portions included in the data item; using the content based map to identify a separate chunking to be performed for the particular content portions in the data item; and chunking the data item based on the particular content portions in the data item. - View Dependent Claims (11, 12)
-
-
13. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
-
identifying a data item to be chunked; determining that a data type the data item is one of a specified one or more data types; in response to determining that the data type of the data item is one of the specified one or more data types; identifying particular content portions included within the data item; identifying a first content portion included within the data item that is likely to change relative to an earlier version of the data item and a second content portion included within the data item that is likely to be unchanged relative to an earlier version of the data item; chunking the first content portion; and chunking the second content portion separately from the first content portion.
-
-
14. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
-
identifying a first data item to be chunked; determining that the first data item is of a first data type; in response to determining that the first data item is of the first data type, chunking the first data item based on the size of the data item; identifying a second data item to be chunked; determining that the second data item is of a second data type; and in response to determining that the second data item is of a second data type; identifying particular content portions included in the second data item; and chunking the second data item based on the likelihood that at least some portions of the particular content portions will be different than an earlier version of the second data item.
-
-
15. A system comprising:
one or more computing devices operable to perform operations comprising; identifying a data item to be chunked; determining a data type of the data item; determining that the data type of the data item is one of a specified one or more data types; and in response to determining that the data type of the data item is one of the specified one or more data types; identifying particular content portions that are included within the data item; and performing a chunking of the data item that is based on the particular content portions that are included within the data item. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
24. A system comprising:
one or more computing devices operable to perform operations comprising; receiving a data item to be chunked; identifying a data type associated with the data item; using the identified data type to introspect the data of the data item and build a content based map of the data item; using the content based map to identify particular content portions included in the data item; using the content based map to identify a separate chunking to be performed for the particular content portions in the data item; and chunking the data item based on the particular content portions in the data item. - View Dependent Claims (25, 26)
Specification