Cache burst architecture for parallel processing, such as for image processing
First Claim
1. In a processing system for performing processing operations in parallel upon a data matrix stored in a memory means having L rows and M columns, where L and M are integers greater than one, the system including M processing units wherein each of the M processing units is associated with a respective column of the memory means, a method of transferring data between the memory means and the processing units, comprising the step of:
- (A) shifting, in a first clock cycle, each bit of a first row of M data bits from the memory means one or more column positions to the processing unit associated with respectively adjacent columns of the memory means.
1 Assignment
0 Petitions
Accused Products
Abstract
A parallel processing system for processing data matrices, such as images, is disclosed. The system includes a plurality of processing units, organized in four blocks of eight processing units per processing chip, and external cache burst memory, wherein each processing unit is associated with at least one column of the external memory. A barrel shifter connected between the memory and the processing units allows data to be shifted to adjacent processing chips, thus providing the means for connecting several of the chips into a ring structure. Further, digital delay lines are connected between the barrel shifter and the processing units, thus providing the capability of delaying, via a predetermined number of clock cycles, incoming column data. Each processing unit is provided with a nine bit cache memory. The system further includes a controller for each chip that sequences a burst of consecutive rows of a data matrix from the external cache burst memory, to be stored in either the cache memory associated with each of the processing units or routed directly to the processors included in each processing unit.
The barrel shifters and the delay lines cooperate to bring horizontally and vertically displaced data points in the external memory to a single processing unit in a single clock cycle period. The controller decodes instructions stored in the external memory, wherein each processing unit receives the same instruction at any given cycle; this decoded instruction is valid for subsequent data bursts from external memory, thus providing the means for allowing instructions and data to be stored in the same external memory without a significant performance penalty. Where the width of an image is greater than the number of processing units, the image must be segmented to be stored in memory. An efficient method of relating column data across segment boundaries is thus provided, using the cache memory of selected processing units.
-
Citations
62 Claims
-
1. In a processing system for performing processing operations in parallel upon a data matrix stored in a memory means having L rows and M columns, where L and M are integers greater than one, the system including M processing units wherein each of the M processing units is associated with a respective column of the memory means, a method of transferring data between the memory means and the processing units, comprising the step of:
(A) shifting, in a first clock cycle, each bit of a first row of M data bits from the memory means one or more column positions to the processing unit associated with respectively adjacent columns of the memory means. - View Dependent Claims (2, 3)
-
4. In a processing system for performing processing operations in parallel upon data from an array of data stored in a first memory means having L rows and M columns, where L and M are integers greater than one, the system including an array of M processing units wherein each processing unit is associated with a respective plurality of second memory means, a method of transferring data between the first memory means and the processing units, comprising the steps of:
-
(A) bursting data in sequential row order from a first plurality of consecutive rows of the first memory means to a first one of the second memory means associated with each of the M processing units wherein the bursted data is stored in the first one of the second memory means; and (B) transferring array data stored in step (A) to the array of processing units, including bursting the array data stored in step (A) from each first one of the second memory means to the respective processing unit. - View Dependent Claims (5, 6, 7, 8, 9)
-
-
10. In a processing system for performing processing operations in parallel upon a data matrix having P columns stored in a first memory means having L rows and M columns, where L, M and P are integers greater than one and P is greater than M, and wherein the data matrix is stored in the first memory means in a plurality of segments, the system including an array of M processing units wherein each of the M processing units is associated with a column of the first memory means, each processing unit has associated therewith a plurality of second memory means, a method of relating column data across segment boundaries, comprising the steps of:
-
(A) shifting the columns of a first segment at least one column position such that the column of data adjacent a first segment boundary is shifted across the first segment boundary; (B) storing the column shifted across the first segment boundary in step (A) in a first one of the second memory means of the processing unit to which the shifted column of data is associated; (C) shifting the columns of a second segment the same number of column positions as the first segment was shifted in step (A) such that the column of data adjacent the first segment boundary is shifted away from the first segment boundary; (D) transferring the shifted second segment of column data to the M processing units; and (E) transferring the data stored in the first one of the second memory means in step (B) to the associated processing unit such that column data on both sides of the first segment boundary are accessible to the M processing units. - View Dependent Claims (11, 12, 13, 14)
-
-
15. In a processing system for performing processing operations in a parallel upon data stored in a first memory means having L rows and M blocks of N columns, where L, M, and N are integers greater than one, a method of transferring data between the first memory means and an array of M blocks of N processing units, wherein each of the M×
- N processing units are associated with a respective one of the M×
N columns of the first memory means, comprising the steps of;(A) selecting, for at least certain of the M blocks, one of the N processing units for receiving data; (B) transferring, for blocks having selected processing units, from the first memory means a row of N data bits directly to the selected one processing unit; and (C) storing, for the blocks having selected processing units, each respective row of N data bits transferred in step (B) in the respective processing unit selected in step (A). - View Dependent Claims (16, 17)
- N processing units are associated with a respective one of the M×
-
18. In a processing system for performing processing operations in parallel upon data stored in a first memory means having L rows and M blocks of N columns, where L, M, and N are integers greater than one, a method of transferring data between the first memory means and an array of M blocks of N processing units, wherein each of the M×
- N processing units are associated with a respective one of the M×
N columns of the first memory means, and wherein each of the M×
N processing units is associated with at least N second memory means, comprising the steps of(A) selecting, for at least certain of the M blocks, one of the N processing units for outputting data; and (B) transferring, for the blocks having selected processing units, from each selected processing unit, a group of N data bits stored in the N second memory means associated with each selected processing unit to the first memory means and therein storing each transferred group of N data bits.
- N processing units are associated with a respective one of the M×
-
19. In a parallel processing system having M blocks of N processing units, where M and N are integers greater than 1, each processing unit being associated with a plurality of memory means contained therein, a method of retrieving data using indirect addressing comprising the steps of:
-
(A) selecting, for each of the M blocks, one of the N processing units; (B) reading, for each of the M blocks, a respective indirect address from a first group of the plurality of memory means associated with each processing unit selected in step (A); (C) broadcasting, for each of the M blocks, the respective indirect address read in step (B) to a second group to serve as the address of the second group, of the memory means in each of the N processing units of the respective block; (D) outputting, for each of the M blocks, the data stored at the respective indirect address of the second group of the memory means in each of the N processing units, the collective output defining a respective data word. - View Dependent Claims (20, 21, 22, 23)
-
-
24. In a processing system for performing operations in parallel, the system including an array of M×
- N processing units where M and N are integers greater than 1, a memory means, and a controlling means, a method of transferring instruction data and matrix data from the memory means to the array of M×
N processing units, comprising the steps of;(A) transferring instruction data stored in the memory means to a controlling means; (B) decoding the instruction data transferred in step (A) with the controlling means to condition the M×
N processing units to receive and process matrix data; and(C) transferring matrix data from the memory means while the M×
N processing units remain conditioned from step (B). - View Dependent Claims (25, 26, 27)
- N processing units where M and N are integers greater than 1, a memory means, and a controlling means, a method of transferring instruction data and matrix data from the memory means to the array of M×
-
28. An apparatus for transferring data comprising:
-
memory means having L rows and M columns, where L and M are greater than 1, for storing a data matrix; an array of M processing units, each of said processing units being respectively associated with one of the M columns of said memory means for performing operations in parallel upon said data matrix; shifting means coupled to said memory means and with said array of M processing units for shifting, in a first clock cycle, each bit of a first row of M bits of matrix data from the M columns of memory means one or more column positions to one of M processing units associated with respectively adjacent columns of said memory means. - View Dependent Claims (29, 30, 31, 32)
-
-
33. In a parallel processing system, an apparatus for transferring data, comprising:
-
a first memory means having L rows and M columns, where L and M are integers greater than 1, for storing an array of data; an array of M processing units, each of said processing units being respectively associated with one of said M columns of said first memory means for performing operations in parallel upon said array of data, each of said processing units being associated with a plurality of second memory means; first bursting means coupled to said M columns of first memory means and to said second memory means for bursting array data in sequential row order from a first plurality of consecutive rows of said first memory means to a first one of said second memory means for each of said M processing units and for storing said burst array data in said respective first one of said second memory means; and means coupled with said second memory means and said M processing units for transferring stored array data from the respective first one of said second memory means to each one of said M processing units, said means for transferring including second bursting means for bursting said stored array data from each first one of said second memory means to the respectively associated one of said array of M processing units. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. In a parallel processing system, an apparatus for relating column data across a segment boundary, comprising:
-
first memory means having L rows and M columns, where L and M are integers greater than 1, for storing a data matrix having P columns, where P is an integer greater than M, and wherein said data matrix is stored in said first memory means in a plurality of segments; an array of M processing units for performing operations in parallel upon said data matrix, each of said processing units being respectively associated with one of said M columns of said first memory means, each of said processing units being associated with a plurality of second memory means; shifting means coupled with said first memory means, said second memory means and said array of M processing units for shifting the columns of a first segment at least one column position such that a column of data adjacent a first segment boundary is shifted across said first segment boundary, wherein said shifted first segment of column data is stored in a first one of said second memory means of a processing unit to which said shifted column of data is associated, said shifting means being operative to shift the columns of a second segment the same number of column positions as said first segment was shifted such that a column of second segment data adjacent said first segment boundary is shifted away from said first segment boundary; first transferring means coupled with said shifting means and said array of M processing units for transferring said shifted second segment of column data to said M processing units; and second transferring means coupled with said first one of said second memory means and said processing unit associated therewith for transferring said shifted matrix data stored in said first one of said second memory means to said processing unit associated therewith wherein column data on both sides of said first segment boundary are accessible to said M processing units. - View Dependent Claims (46, 47, 48)
-
-
49. In a parallel processing system, an apparatus for transferring data, comprising:
-
a first memory means having L rows and M×
N columns, where L, M, and N are integers greater than 1, for storing a data array;an array of M blocks of N processing units for performing operations in parallel upon said array of data, each of said M×
N processing units being associated with a respective one of said M×
N columns of said memory means;selecting means coupled with said M×
N processing units for selecting, for at least certain of said M blocks, one of said N processing units for receiving data;transposing means coupled with said first memory and each one of said M×
N processing units for transferring, for blocks having selected processing units, from said first memory means a row of N data bits to a respective block of N processing units wherein each transferred row is stored in said respective one processing unit selected by said selecting means. - View Dependent Claims (50, 51, 52)
-
-
53. In a parallel processing system, an apparatus for transferring, comprising:
-
a first memory means having L rows and M×
N columns, where L, M, and N are integers greater than 1, for storing a data array;an array of M blocks of N processing units for performing operations in parallel upon said data array, each of said M×
N processing units being associated with a respective one of said M×
N columns of said first memory means, each of said M×
N processing units being associated with at least N second memory means;selecting means coupled with, for each of the M×
N processing units, said N second memory means for selecting, for at least certain of said M blocks, one of said N processing units for outputting data; andtransposing means coupled with said selecting means and said first memory means for transposing and transferring a group of N data bits stored in said N second memory means associated with each selected processing unit to said first memory means and for storing each transferred group of N data bits.
-
-
54. In a parallel processing system, an apparatus for retrieving data using indirect addressing, comprising:
-
an array of M blocks of N processing units for performing operations in parallel, each of said M×
N processing units being associated with a plurality of memory means contained therein;selecting means coupled with said plurality of memory means for selecting, for each of said M blocks, one of N processing units and for reading a respective indirect address from a first group of said plurality of memory means associated with each selected processing unit; and broadcasting means responsive to said respective indirect address for broadcasting, for each of the M blocks, said respective indirect address to a respective second group of said plurality of memory means in each of N processing units of the respective block, wherein each block outputs data stored at said respective indirect address of said second group of said plurality of memory means in each of the N processing units, the collective output of N second groups for each block defining a respective data word, whereby said respective data word is retrieved as the collective output from said second group in each processing unit using an indirect address stored in said first group. - View Dependent Claims (55, 56, 57, 58)
-
-
59. In a parallel processing system, an apparatus for transferring instructions and matrix data from a memory means, comprising:
-
a memory means; an array of M blocks of N processing units coupled with said memory means for performing operations in parallel on matrix data according to instructions, wherein said matrix data and said instructions are stored in said memory means; and a controller means coupled with said memory means and said array of processing units for fetching instructions from said memory means and for decoding said fetched instructions to condition said array of M×
N processing units to receive and process matrix data, wherein at least one instruction is fetched and decoded by said controller means, said controller means including data burst means coupled with said memory means for transferring matrix data from said memory means while said M×
N processing units remain conditioned in accordance with said instruction fetched and decoded by said controller means. - View Dependent Claims (60, 61, 62)
-
Specification