Split embedded DRAM processor
First Claim
1. A computer system comprising:
- a central processing unit;
an external memory coupled to said central processor, said external memory comprising;
one or more dynamic random access memory (DRAM) arrays;
a set of local functional units;
a local program prefetch unit; and
a monitor/modify unit, said monitor/modify unit operative to evaluate each instruction opcode as it is fetched from said DRAM array, and, in response to said opcode, to perform at least one of the following actions;
(i) sending the opcode to said central processing unit;
(ii) sending the opcode to said set of local functional units; and
(iii) sending the opcode to said local program prefetch unit to fork a separate execution thread for execution by the said set of local functional units.
1 Assignment
0 Petitions
Accused Products
Abstract
A processing architecture includes a first CPU core portion coupled to a second embedded dynamic random access memory (DRAM) portion. These architectural components jointly implement a single processor and instruction set. Advantageously, the embedded logic on the DRAM chip implements the memory intensive processing tasks, thus reducing the amount of traffic that needs to be bussed back and forth between the CPU core and the embedded DRAM chips. The embedded DRAM logic monitors and manipulates the instruction stream into the CPU core. The architecture of the instruction set, data paths, addressing, control, caching, and interfaces are developed to allow the system to operate using a standard programming model. Specialized video and graphics processing systems are developed. Also, an extended very long instruction word (VLIW) architecture implemented as a primary VLIW processor coupled to an embedded DRAM VLIW extension processor efficiently deals with memory intensive tasks. In different embodiments, standard software can be accelerated either with or without the express knowledge of the processor.
-
Citations
44 Claims
-
1. A computer system comprising:
-
a central processing unit;
an external memory coupled to said central processor, said external memory comprising;
one or more dynamic random access memory (DRAM) arrays;
a set of local functional units;
a local program prefetch unit; and
a monitor/modify unit, said monitor/modify unit operative to evaluate each instruction opcode as it is fetched from said DRAM array, and, in response to said opcode, to perform at least one of the following actions;
(i) sending the opcode to said central processing unit;
(ii) sending the opcode to said set of local functional units; and
(iii) sending the opcode to said local program prefetch unit to fork a separate execution thread for execution by the said set of local functional units. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An embedded dynamic random access memory (DRAM) coprocessor designed to be coupled to a central processing unit, said embedded DRAM coprocessor comprising:
-
one or more DRAM arrays;
an external memory interface responsive to address and control signals generated from an external source, said external memory interface responding to said address and control signals to transfer data between said DRAM arrays and said external source;
a set of local functional units which execute program instructions;
a local program prefetch unit which fetches program instructions; and
a monitor/modify unit which evaluates each instruction opcode as it is fetched under control of said external source from said DRAM array and which, in response to said opcode, performs at least one of the following actions;
(i) sending the opcode to said external source;
(ii) sending the opcode to said set of local functional units; and
(iii) sending the opcode to said local program prefetch unit to fork a separate execution thread for execution by the said set of local functional units. - View Dependent Claims (9)
-
-
10. A computer system comprising:
-
a central processing unit coupled to an external memory, wherein;
said central processing unit comprises;
a first set of functional units responsive to program instructions;
a first program cache memory having at least one level of caching, said first program cache memory providing high speed access to said program instructions; and
a first prefetch unit which controls the fetching of a sequence of instructions to be executed by said first set of functional units, said instructions being fetched from said external memory unless said program instructions are found in said first program cache memory, in which case, said program instructions are fetched from said first program cache memory; and
said external memory comprises;
one or more dynamic random access memory (DRAM) arrays;
a second set of local functional units;
a second program prefetch unit; and
a second program cache memory;
and wherein;
said first program cache memory only caches instructions executed by said functional units on said central processing unit, and said second program cache memory only caches instructions executed by said second set of functional units on said external memory device. - View Dependent Claims (11, 12, 13, 14, 15, 16)
said central processing unit sends one or more attribute signals to identify certain memory read signals to be instruction fetch cycles; and
said attribute signals are decoded by logic embedded in said external memory so that said second program cache memory can identify opcode fetch cycles.
-
-
13. The computer system as defined in claim 10, wherein said register file further includes a set of multimedia extension (MMX) registers, and said at least one functional unit includes at least one MMX functional unit.
-
14. The computer system as defined in claim 10, whereby said external memory is packaged with multiple external memory modules on a printed circuit board, said printed circuit board having a standardized memory interface compatible with DRAM modules having no embedded processing logic.
-
15. The computer system as defined in claim 14, whereby said printed circuit board is a SIMM and said standardized memory interface is a SIMM interface.
-
16. The computer system as defined in claim 10, wherein:
-
said external memory further includes a monitor/modify unit which intercepts opcodes fetched by said first prefetch unit and passes said opcodes to said second prefetch unit to cause said second prefetch unit to fetch a sequence of program instructions for execution; and
opcodes of said sequence of program instructions are fetched from said one or more DRAM arrays unless they are found to reside in said second program cache.
-
-
17. An embedded dynamic random access memory (DRAM) coprocessor comprising:
-
an external memory interface for transferring instructions and data in response to address and control signals received from an external bus master;
one or more DRAM arrays;
a set of local functional units;
a program prefetch unit; and
a program cache memory, said program cache memory only caching instructions executed by said functional units on said coprocessor. - View Dependent Claims (18, 19, 20, 21, 22)
said external memory interface receives one or more attribute signals to identify certain memory read signals to be instruction fetch cycles; and
said attribute signals are decoded by logic embedded in said external memory so that said program cache can identify externally generated opcode fetch cycles.
-
-
19. The embedded DRAM coprocessor as defined in claim 18, further including a monitor/modify unit which intercepts opcodes in instructions transferred over said external memory interface and passes said opcodes to said program prefetch unit to cause said program prefetch unit to fetch a sequence of program instructions for execution, wherein opcodes of said sequence of program instructions are fetched from said one or more DRAM arrays unless said opcodes of said sequence of program instructions are found to reside in said program cache.
-
20. The embedded DRAM coprocessor as defined in claim 17, wherein said register file further includes a set of multimedia extension (MMX) registers, and said at least one functional unit includes at least one MMX functional unit.
-
21. The embedded DRAM coprocessor as defined in claim 17, whereby said external memory is packaged with multiple external memory modules on a printed circuit board, said printed circuit board having a standardized memory interface compatible with DRAM modules having no embedded processing logic.
-
22. The embedded DRAM coprocessor as defined in claim 21, whereby said printed circuit board is a SIMM and said standardized memory interface is a SIMM interface.
-
23. A computer system comprising:
-
a central processing unit coupled to an external memory, said central processor unit comprising;
a first set of functional units responsive to program instructions; and
a first prefetch unit which controls the fetching of a sequence of instructions from said external memory to be executed by said first set of functional units;
said external memory comprising;
one or more dynamic random access memory (DRAM) arrays;
a second set of local functional units;
one or more external interface busses; and
a second program prefetch unit;
wherein;
said central processing unit and said external program memory jointly execute a single program, said single program segmented into first and second program spaces, said first program space comprising type I, type II and optionally type III instructions, and said second program space comprising type II and type III instructions;
said type I instructions always execute on said first set of functional units;
said type II instructions generate interface control exchanges between said central processing unit and said external memory, wherein said type II instructions selectively are split into portions executed on said central processing unit and portions executed on said external memory; and
said type III instructions always execute on said second set of functional units. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
said type II instructions comprise first and second opcodes, said first opcode executed on said central processing unit, and said second opcode executed on said external memory;
said first opcode comprises;
instruction type identifier information;
opcode information to direct execution of a one of said first set of functional units; and
an address field to be transferred over one of said external interface busses to reference instructions in said second program space; and
said second opcode comprises;
instruction type identifier information; and
opcode information to direct execution of a one of said second set of functional units.
-
-
27. The computer system as defined in claim 26, wherein said second opcode further comprises:
-
signaling information to be passed across one of said external interface busses to said central processing unit; and
a stop field indicating to said second prefetch unit to stop fetching instructions from said second program space.
-
-
28. The computer system as defined in claim 23, wherein:
-
said type II instruction is a split branch to subroutine instruction; and
upon execution of said split branch to subroutine instruction, a subroutine branch address is passed across one of said external interface busses to activate a subroutine stored in said second program space.
-
-
29. The computer system as defined in claim 23, wherein:
-
said type II instruction involves a first operand stored in memory and a second operand stored in a register located on said central processing unit; and
said type II instruction is split into a first portion and a second portion, said first portion executing on said external memory to access the said first operand and to place it on one of said external interface busses, and said second portion executing on said central processing unit which reads said first operand from one of said external interface busses and computes a result of said type II instruction.
-
-
30. The computer system as defined in claim 23, whereby said external memory is packaged with multiple external memory modules on a printed circuit board, said printed circuit board having a standardized memory interface compatible with DRAM modules having no embedded processing logic.
-
31. The computer system as defined in claim 30, whereby said printed circuit board is a SIMM and said standardized memory interface is a SIMM interface.
-
32. The computer system of claim 23, whereby said central processing unit further comprises a local cache and when a second portion of a particular type II instruction is fetched for execution by said central processing unit from said local cache, a control signal is sent to said external memory device to cause said external memory device to execute a first portion of said particular type II instruction.
-
33. The computer system as defined in claim 23, wherein said register file further includes a set of multimedia extension (MMX) registers, and said at least one functional unit includes at least one MMX functional unit.
-
34. A central processing unit cooperative to jointly execute programs fetched from an embedded dynamic random access memory (DRAM) coprocessor, said central processing unit comprising:
-
a prefetch unit operative to fetch instructions to be executed by said central processing unit;
a set of internal registers;
a set of one or more functional units operative to execute instructions;
a program cache;
a first external memory interface operative to transfer addresses, control signals, and data to and from external memory and input/output (I/O) devices; and
a second external memory interface operative to transfer synchronization signals and optionally address information between said central processing unit and said embedded DRAM coprocessor;
wherein;
said central processing unit and said embedded DRAM coprocessor jointly execute a single program that is partitioned into first and second memory spaces, wherein the instructions in said first memory space are executed by the central processing unit, and the instructions in said second memory space are executed by said embedded DRAM coprocessor;
said instructions in said first memory space include a first type of instruction which is executed wholly on said central processing unit and a second type of instruction which, upon execution, sends address information which references instructions in said second program space to said embedded DRAM coprocessor; and
said central processor unit and said embedded DRAM coprocessor have overlapping architectures including mirror image subsets of registers and mirror image subsets of functionality of said functional units, said central processing unit and said embedded DRAM coprocessor executing an overlapping instruction set.
-
-
35. A central processing unit cooperative to jointly execute programs fetched from an embedded dynamic random access memory (DRAM) coprocessor, said central processing unit comprising:
-
a prefetch unit which fetches instructions to be executed by the central processing unit;
a set of internal registers;
a set of one or more functional units which executes instructions;
a first external memory interface which transfers addresses, control signals and data to and from external memory and input/output (I/O) devices; and
a second external memory interface which transfers synchronization signals and address information between said central processing unit and said embedded DRAM coprocessor, wherein;
said central processing unit and said embedded DRAM coprocessor jointly execute a single program that is partitioned into first and second memory spaces;
the instructions in said first memory space are executed by the central processing unit;
the instructions in said second memory space are executed by said embedded DRAM coprocessor;
said instructions in said first memory space include;
a first type of instruction which is executed wholly on said central processing unit; and
a second type of instruction which, upon execution, sends address information which references instructions in said second program space to said embedded DRAM coprocessor; and
upon execution of said second type of instruction, said central processing unit directs said embedded DRAM coprocessor to perform at least one of the following operations;
(i) fork a separate execution thread to execute a sequence of instructions stored in said second program space;
(ii) execute a fixed number of instructions and then stop; and
(iii) execute a fixed number of instructions and supply one or more results over one of said first external memory interface and said second external memory interface in alignment with a clock edge, a fixed number of clock cycles later. - View Dependent Claims (36)
-
-
37. A method to jointly execute programs on a central processing unit coupled to an embedded dynamic random access memory (RAM) coprocessor, comprising the steps of:
-
replicating a portion of a register set of the central processing unit on the embedded DRAM coprocessor;
replicating a portion of the functionality of functional units of the central processing unit to support the replicating of a portion of the instruction set of said central processing unit on said embedded DRAM coprocessor;
jointly executing a program on said central processing unit and said embedded DRAM coprocessor by partitioning computationally intensive portions of the code to run on said central processing unit and by partitioning memory intensive code segments to run on said embedded DRAM coprocessor; and
transferring the contents of selected ones of said replicated register subsets between said central processing unit and said embedded DRAM coprocessor in order to maintain program level synchronization between said central processing unit and said embedded DRAM coprocessor. - View Dependent Claims (38)
adding an architectural extension on said embedded DRAM coprocessor, said architectural extension comprising;
an additional set of registers beyond those contained on said central processing unit; and
additional instructions beyond those processed by said central processing unit; and
partitioning code segments which reference said additional registers and code segments which use said additional instructions to be executed on said embedded DRAM coprocessor.
-
-
39. An embedded dynamic random access memory (DRAM) coprocessor comprising:
-
an external memory interface for transferring instructions and data in response to address and control signals received from an external bus master, including at least one attribute signal to identify certain memory read signals to be instruction fetch cycles, said at least one attribute signal being decoded by logic embedded in said coprocessor so that said program cache can identify externally generated opcode fetch cycles;
one or more DRAM arrays;
a set of local functional units;
a program prefetch unit; and
a program cache memory, said program cache memory only caching instructions executed by said functional units on said coprocessor. - View Dependent Claims (40)
-
-
41. An embedded dynamic random access memory (DRAM) coprocessor comprising:
-
an external memory interface for transferring instructions and data in response to address and control signals received from an external bus master, including at least one attribute signal to identify certain memory read signals to be instruction fetch cycles, said at least one attribute signal being decoded by logic embedded in said coprocessor so that said program cache can identify externally generated opcode fetch cycles;
one or more DRAM arrays;
a set of local functional units;
a program prefetch unit;
a program cache memory, said program cache memory only caching instructions executed by said functional units on said coprocessor; and
a monitor/modify unit which intercepts opcodes in instructions transferred via said external memory interface and passes said opcodes to said program prefetch unit to cause said program prefetch unit to fetch a sequence of program instructions for execution, wherein opcodes of said sequence of program instructions are fetched from said one or more DRAM arrays unless said opcodes of said sequence of program instructions are found to reside in said program cache. - View Dependent Claims (42)
-
-
43. An embedded dynamic random access memory (DRAM) coprocessor comprising:
-
an external memory interface for transferring instructions and data in response to address and control signals received from an external bus master;
one or more DRAM arrays;
a set of local functional units;
a program prefetch unit; and
a selective program cache memory, said selective program cache memory selectively caching instructions executed by said functional units on said coprocessor, and not caching instructions to be used exclusively by said external bus master. - View Dependent Claims (44)
-
Specification