Data processing system for processing vector data and method therefor
First Claim
1. In a data processing system having a first memory and a second memory at a lower hierarchical level than the first memory, a data processor for prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba+(s*i)), and i is an index having a value from 0 to (n−
- 1), the data processor comprising;
a first register for storing n;
a second register for storing s;
a third register for storing ea;
an arithmetic unit having a first input coupled to the second register, a second input coupled to the third register, and an output terminal, for calculating the effective address ea of each unit i of the n units of the vector and for providing a fetch address to the output terminal thereof corresponding to the effective address ea when enabled;
a load unit coupled to the first memory and to the second memory and having an input terminal for receiving the fetch address, for prefetching a data element located at the fetch address from the second memory into the first memory; and
a state machine having an input coupled to the first register, for enabling the arithmetic unit in response to the data processor receiving a predetermined instruction, and for enabling the arithmetic unit repetitively until an nth unit of the vector has been prefetched.
23 Assignments
0 Petitions
Accused Products
Abstract
A data processing system includes a data processor (10) coupled to a memory system having a first memory, such as an L1 data cache (16), arranged with a second memory (such as an L2 cache) at a lower hierarchical level. The data processor (10) prefetches data elements of a vector into the first memory prior to processing such data elements. If a requested data element is not present in the first memory, a load request is issued to the second memory and to lower levels of the memory hierarchy until the requested data element is finally retrieved and stored in the first memory. The data processor (10) continues to prefetch subsequent data elements of the vector by considering the length of the data element and the stride of the vector. In one embodiment, the data processor (10) prefetches the vector into the first memory in response to a single data stream touch load (DST) instruction (100).
159 Citations
29 Claims
-
1. In a data processing system having a first memory and a second memory at a lower hierarchical level than the first memory, a data processor for prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba+(s*i)), and i is an index having a value from 0 to (n−
- 1), the data processor comprising;
a first register for storing n;
a second register for storing s;
a third register for storing ea;
an arithmetic unit having a first input coupled to the second register, a second input coupled to the third register, and an output terminal, for calculating the effective address ea of each unit i of the n units of the vector and for providing a fetch address to the output terminal thereof corresponding to the effective address ea when enabled;
a load unit coupled to the first memory and to the second memory and having an input terminal for receiving the fetch address, for prefetching a data element located at the fetch address from the second memory into the first memory; and
a state machine having an input coupled to the first register, for enabling the arithmetic unit in response to the data processor receiving a predetermined instruction, and for enabling the arithmetic unit repetitively until an nth unit of the vector has been prefetched. - View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 22)
each of the n units has a length l;
the data processor further comprises a fourth register for storing l; and
the arithmetic unit further has a third input for receiving the length l, and enables the arithmetic unit a plurality of times corresponding to the length l to provide a corresponding plurality of fetch addresses for each unit i of the n units.
- 1), the data processor comprising;
-
7. The data processor of claim 6 further comprising a register for storing the fetch address.
-
9. The method of claim 1 wherein the step of prefetching comprises the steps of:
-
determining if the cth unit is already stored in the first memory; and
if not, prefetching the cth unit of the vector from the second memory into the first memory.
-
-
10. The method of claim 1 further comprising the step of selectively terminating the method before all of the n units are prefetched into the first memory.
-
11. The method of claim 1 further comprising the step of:
receiving a predetermined instruction and performing all previous steps in response to an execution of the predetermined instruction.
-
12. The method of claim 1 wherein the step of stepping the count is further characterized as:
stepping the count c and selectively modifying the stride s.
-
22. The method of claim 9 further comprising the step of:
receiving a predetermined instruction and performing all previous steps in response to an execution of the predetermined instruction.
-
8. In a data processing system having a plurality of architectural registers adapted for storing a like plurality of vectors, a first, non-architected memory and a second memory at a lower hierarchical level than the first memory, a method for explicitly prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba+(s*i)), and i is an index having a value from 0 to (n−
- 1), the method comprising the steps of;
receiving a predetermined instruction, wherein said predetermined instruction does not cause the data processing system to affect the plurality of architectural registers;
executing said predetermined instruction by performing the steps of;
initializing a count c;
calculating the effective address ea of a cth unit of the vector;
prefetching the cth unit of the vector from the second memory into the first memory;
stepping the count c; and
if the count c is a predetermined value with respect to n, returning to the step of calculating.
- 1), the method comprising the steps of;
-
13. In a data processing system having a first memory and a second memory at a lower hierarchical level than the first memory, a method for prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba +(s*i)), and i is an index having a value from 0 to (n−
- 1), the method comprising the steps of;
initializing a count c;
calculating the effective address ea of a cth unit of the vector;
prefetching the cth unit of the vector from the second memory into the first memory;
stepping the count c; and
if the count c is a predetermined value with respect to n, returning to the step of calculating;
wherein the data processing system operates in a selected one of a first mode and a second mode, and wherein, if, when operating in the first mode, the second mode is selected, the method is suspended until the first mode is next selected. - View Dependent Claims (14, 15)
stepping the count c and selectively modifying the stride s.
- 1), the method comprising the steps of;
-
16. In a data processing system having a first memory and a second memory at a lower hierarchical level than the first memory, a method for prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba+(s*i)), and i is an index having a value from 0 to (n−
- 1), the method comprising the steps of;
initializing a count c;
calculating the effective address ea of a cth unit of the vector;
prefetching the cth unit of the vector from the second memory into the first memory;
stepping the count c; and
if the count c is a predetermined value with respect to n, returning to the step of calculating;
wherein the step of prefetching comprises the step of changing a cache state of the cth unit of the vector in the first memory. - View Dependent Claims (17, 18)
receiving a predetermined instruction and performing all previous steps in response to an execution of the predetermined instruction.
- 1), the method comprising the steps of;
-
18. The method of claim 16 wherein the step of stepping the count is further characterized as:
stepping the count c and selectively modifying the stride s.
-
19. In a data processing system having a plurality of architectural registers adapted for storing a like plurality of vectors, a first, non-architected memory and a second memory at a lower hierarchical level than the first memory, a method for explicitly prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba+(s*i)), and i is an index having a value from 0 to (n−
- 1), the method comprising the steps of;
receiving a predetermined instruction, wherein said predetermined instruction does not cause the data processing system to affect the plurality of architectural registers;
executing said predetermined instruction by performing the steps of;
for each unit i of the n units;
calculating the effective address ea of the ith unit of the vector; and
prefetching the ith unit of the vector from the second memory into the first memory. - View Dependent Claims (20, 21, 23)
determining if the ith unit is already stored in the first memory; and
if not, prefetching the ith unit of the vector from the second memory into the first memory.
- 1), the method comprising the steps of;
-
21. The method of claim 19 further comprising the step of selectively terminating the method before all of the n units are prefetched into the first memory.
-
23. The method of claim 19 further comprising, after the step of prefetching, the step of:
selectively modifying the stride s.
-
24. In a data processing system having a first memory and a second memory at a lower hierarchical level than the first memory, a method for prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba +(s*i)), and i is an index having a value from 0 to (n−
- 1), the method comprising the steps of;
for each unit i of the n units;
calculating the effective address ea of the ith unit of the vector; and
prefetching the ith unit of the vector from the second memory into the first memory;
wherein the data processing system operates in a selected one of a first mode and a second mode, and wherein, if, when operating in the first mode, the second mode is selected, the method is suspended until the first mode is next selected. - View Dependent Claims (25, 26)
selectively modifying the stride s.
- 1), the method comprising the steps of;
-
27. In a data processing system having a first memory and a second memory at a lower hierarchical level than the first memory, a method for prefetching into the first memory a vector stored in the second memory, the vector comprising n units distributed in the second memory at a stride s relative to a base address ba with an ith unit of the vector stored in the second memory at an effective address ea, where ea=(ba +(s*i)), and i is an index having a value from 0 to (n−
- 1), the method comprising the steps of;
for each unit i of the n units;
calculating the effective address ea of the ith unit of the vector; and
prefetching the ith unit of the vector from the second memory into the first memory;
wherein the step of prefetching comprises the step of changing a cache state of the ith unit of the vector in the first memory. - View Dependent Claims (28, 29)
receiving a predetermined instruction and performing all previous steps in response to an execution of the predetermined instruction.
- 1), the method comprising the steps of;
-
29. The method of claim 27 further comprising, after the step of prefetching, the step of:
selectively modifying the stride s.
Specification