Sorting large data sets
First Claim
1. A method for sorting a data set, the method performed on a computing device, the method comprising:
- first processing all of a plurality of portions that make up the data set, the first processing comprising performing, for each portion of the plurality of portions, the following;
determining usable physical memory of the computing device, where the usable physical memory is system memory of the computing device that is not virtual memory and that is also not processor cache memory, and where the usable physical memory is determined based on a total amount of the system memory that is currently in-use and a total amount of the system memory;
reading the each portion into the determined usable physical memory, where a size of the each portion comprises a number of records of the data set that fit into the usable physical memory without requiring use of the virtual memory;
second processing all of a plurality of sub-portions of the each portion, the second processing comprising performing, for each sub-portion of the plurality of sub-portions, the following;
determining usable processor cache memory of a processor of the computing device, where the usable processor cache memory is determined based on a total amount of the processor cache memory that is currently in-use and a total amount of the processor cache memory;
sorting, by the processor, the each sub-portion within the determined usable processor cache memory, where the each sub-portion fits entirely in the determined usable processor cache memory, and where the sorted sub-portion is reflected in the determined usable physical memory based on normal operations of the processor cache memory;
merging the first-processed portions and second-processed sub-portions resulting in the entire data set being sorted where all the sorting was performed within processor cache memory and without making use of virtual memory.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented sorting method for efficiently sorting large data sets in computing environments that include virtual memory and processor caching, the method including determining available physical memory, identifying portions of the data set that each fit in the available physical memory, determining available cache, identifying sub-portions within the portions that each fit in the available cache, sorting each sub-portion, repeating the identifying portions, identifying sub-portions, and sorting for each portion of the data set, and merging the sorted sub-portions and portions such that the data set is sorted. The sorting method avoids the use of virtual memory and seeks to identify sub-portions that fit in available cache.
-
Citations
20 Claims
-
1. A method for sorting a data set, the method performed on a computing device, the method comprising:
-
first processing all of a plurality of portions that make up the data set, the first processing comprising performing, for each portion of the plurality of portions, the following; determining usable physical memory of the computing device, where the usable physical memory is system memory of the computing device that is not virtual memory and that is also not processor cache memory, and where the usable physical memory is determined based on a total amount of the system memory that is currently in-use and a total amount of the system memory; reading the each portion into the determined usable physical memory, where a size of the each portion comprises a number of records of the data set that fit into the usable physical memory without requiring use of the virtual memory; second processing all of a plurality of sub-portions of the each portion, the second processing comprising performing, for each sub-portion of the plurality of sub-portions, the following; determining usable processor cache memory of a processor of the computing device, where the usable processor cache memory is determined based on a total amount of the processor cache memory that is currently in-use and a total amount of the processor cache memory; sorting, by the processor, the each sub-portion within the determined usable processor cache memory, where the each sub-portion fits entirely in the determined usable processor cache memory, and where the sorted sub-portion is reflected in the determined usable physical memory based on normal operations of the processor cache memory; merging the first-processed portions and second-processed sub-portions resulting in the entire data set being sorted where all the sorting was performed within processor cache memory and without making use of virtual memory. - View Dependent Claims (4, 5, 6, 7, 8, 9)
-
-
2. A computing device and at least one software module together configured for performing steps for sorting a data set, the steps comprising:
-
first processing all of a plurality of portions that make up the data set, the first processing comprising performing, for each portion of the plurality of portions, the following; determining usable physical memory of the computing device, where the usable physical memory is system memory of the computing device that is not virtual memory and that is also not processor cache memory, and where the usable physical memory is determined based on a total amount of the system memory that is currently in-use and a total amount of the system memory; reading the each portion into the determined usable physical memory, where a size of the each portion comprises a number of records of the data set that fit into the usable physical memory without requiring use of the virtual memory; second processing all of a plurality of sub-portions of the each portion, the second processing comprising performing, for each sub-portion of the plurality of sub-portions, the following; determining usable processor cache memory of a processor of the computing device, where the usable processor cache memory is determined based on a total amount of the processor cache memory that is currently in-use and a total amount of the processor cache memory; sorting, by the processor, the each sub-portion within the determined usable processor cache memory, where the each sub-portion fits entirely in the determined usable processor cache memory, and where the sorted sub-portion is reflected in the determined usable physical memory based on normal operations of the processor cache memory; merging the first-processed portions and second-processed sub-portions resulting in the entire data set being sorted where all the sorting was performed within processor cache memory and without making use of virtual memory. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
3. At least one computer-readable media including computer-executable instructions that, when executed by a computing device, cause the computing device to perform steps for sorting a data set, the steps comprising:
-
first processing all of a plurality of portions that make up the data set, the first processing comprising performing, for each portion of the plurality of portions, the following; determining usable physical memory of the computing device, where the usable physical memory is system memory of the computing device that is not virtual memory and that is also not processor cache memory, and where the usable physical memory is determined based on a total amount of the system memory that is currently in-use and a total amount of the system memory; reading the each portion into the determined usable physical memory, where a size of the each portion comprises a number of records of the data set that fit into the usable physical memory without requiring use of the virtual memory; second processing all of a plurality of sub-portions of the each portion, the second processing comprising performing, for each sub-portion of the plurality of sub-portions, the following; determining usable processor cache memory of a processor of the computing device, where the usable processor cache memory is determined based on a total amount of the processor cache memory that is currently in-use and a total amount of the processor cache memory; sorting, by the processor, the each sub-portion within the determined usable processor cache memory, where the each sub-portion fits entirely in the determined usable processor cache memory, and where the sorted sub-portion is reflected in the determined usable physical memory based on normal operations of the processor cache memory; merging the first-processed portions and second-processed sub-portions resulting in the entire data set being sorted where all the sorting was performed within processor cache memory and without making use of virtual memory. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification