Integrated processing and L2 DRAM cache
First Claim
1. A single chip comprising:
- multiple, independent processors;
each processor having a private L1 cache and associated translation/memory management logic to implement a set-associative, late-select cache, each L1 cache having a multi-ported cache directory for fast coherency maintenance via a fully shared Snoopy protocol;
the L1 cache directories being interconnected by a plurality of buses to allow simultaneous interrogation and updating and having a late-write capability for "same cycle" update;
outputs from all L1 caches being interconnected by a selector/cross-point-switch for transferring data between caches, each L1 cache having a pseudo-two-port structure with associated full-line width reload and store-back buffers, each cache input/output data width being equal to a full line and connected to an input/output bus of equal width for line transfer;
each processor having a private L2 cache with an interface to the private L1 cache of the processor;
each L2 cache having translation/management logic to implement a set-associative, late-select organization with DRAM directories having a late-write capability, each L2 cache comprising a DRAM main array, and an SRAM buffer to interface to the L1 cache;
coherency between the L2s being maintained by a global directory, selectors and logic directing cross-interrogates to an appropriate L2 cache, each L2 cache having a pseudo-two-port structure with data buffers for reload and store-back and an interface to main memory, a wide data input/output bus with a width equal to a cache line for reload/store back operations; and
logic and data circuits for interfacing to an external main memory management unit, said chip being capable of working alone as a single node system or coupled via an external controller to other identical or similar nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
An integrated processor and level two (L2) dynamic random access memory (DRAM) are fabricated on a single chip. As an extension of this basic structure, the invention also contemplates multiprocessor "node" chips in which multiple processors are integrated on a single chip with L2 cache. By integrating the processor and L2 DRAM cache on a single chip, high on-chip bandwidth, reduced latency and higher performance are achieved. A multiprocessor system can be realized in which a plurality of processors with integrated L2 DRAM cache are connected in a loosely coupled multiprocessor system. Alternatively, the single chip technology can be used to implement a plurality of processors integrated on a single chip with an L2 DRAM cache which may be either private or shared. This approach overcomes a number of issues which limit the performance and cost of a memory hierarchy. When the L2 DRAM cache is placed on the same chip as the processor, the time needed for two chip-to-chip crossings is eliminated. Since these crossings require off-chip drivers and receivers and must be synchronized with the system clock, the time involved is substantial. This means that with the integrated L2 DRAM cache, latency is reduced.
-
Citations
11 Claims
-
1. A single chip comprising:
-
multiple, independent processors; each processor having a private L1 cache and associated translation/memory management logic to implement a set-associative, late-select cache, each L1 cache having a multi-ported cache directory for fast coherency maintenance via a fully shared Snoopy protocol; the L1 cache directories being interconnected by a plurality of buses to allow simultaneous interrogation and updating and having a late-write capability for "same cycle" update; outputs from all L1 caches being interconnected by a selector/cross-point-switch for transferring data between caches, each L1 cache having a pseudo-two-port structure with associated full-line width reload and store-back buffers, each cache input/output data width being equal to a full line and connected to an input/output bus of equal width for line transfer; each processor having a private L2 cache with an interface to the private L1 cache of the processor;
each L2 cache having translation/management logic to implement a set-associative, late-select organization with DRAM directories having a late-write capability, each L2 cache comprising a DRAM main array, and an SRAM buffer to interface to the L1 cache;coherency between the L2s being maintained by a global directory, selectors and logic directing cross-interrogates to an appropriate L2 cache, each L2 cache having a pseudo-two-port structure with data buffers for reload and store-back and an interface to main memory, a wide data input/output bus with a width equal to a cache line for reload/store back operations; and logic and data circuits for interfacing to an external main memory management unit, said chip being capable of working alone as a single node system or coupled via an external controller to other identical or similar nodes.
-
-
2. A single chip comprising:
-
N independent processors; each processor having a private L1 cache and associated translation/memory management logic to implement a set-associative, late-select cache, each L1 cache having a multi-ported cache directory for fast coherency maintenance via a fully shared Snoopy protocol; the L1 cache directories being interconnected by a plurality of buses to allow simultaneous interrogation and updating and having a late-write capability for "same cycle" update; outputs from all L1 caches being interconnected by a selector/cross-point-switch for transferring data between caches, each L1 cache having a pseudo-two-port structure with associated full-line width reload and store-back buffers, each cache input/output data width being equal to a full line and connected to an input/output bus of equal width for line transfer; the N independent processors all sharing one on-chip L2 cache having a directory, with the L2 cache and directory both physically structured into N banks; each bank of the L2 cache having translation/management logic to implement a set-associative, late-select organization, the L2 cache and directory having a late-write capability; the L2 cache comprising a DRAM main array and a SRAM buffer to interface to the L1 cache; coherency between the N L2 cache banks being maintained by a global directory with selectors and logic for directing cross-interrogates to the appropriate L2 bank; a wide data input/output bus having a width equal to a cache line for reload/store back operations; each bank of the L2 cache having an integrated address bus decoder-selector-priority function directly coupled to a data input/output selector-bus section; and logic and data circuits for interfacing to an external main memory management unit, such chip being capable of working alone as a single node system, or coupled via an external controlled to other identical or similar nodes. - View Dependent Claims (3, 4)
-
-
5. A single chip comprising:
-
multiple, independent processors; each processor sharing one large L1 cache and associated translation/memory management unit with logic to implement a set-associative, late-select cache, the L1 cache having a directory, and L1 cache array and directory being multiported with as many ports as there are processors on-chip, each port having an independent address and data buses, one for each processor to allow simultaneous access and updating of the L1 cache array and directory, with both having a late-write capability for "same cycle" update; L1 cache input/output data width equal to a full line size and connected to an equal width bus for line transfer; the multiple, independent processors all sharing one on-chip L2 cache having a directory, with the L2 cache and directory both physically structured into as many banks as there are processors on-chip; each bank of the L2 cache having translation/management logic to implement a set-associative, late-select organization, the L2 cache and directory having a late-write capability; each L2 cache comprising a DRAM main array and a SRAM buffer to interface to the L1 cache; coherency between the L2 cache banks being maintained by a global directory with selectors and logic for directing cross-interrogates to the appropriate L2 bank; a wide data input/output bus having a width equal to a cache line for reload/store back operations; each bank of the L2 cache having an integrated address bus decoder-selector-priority function directly coupled to a data input/output selector-bus section; and logic and data circuits for interfacing to an external main memory management unit, such chip being capable of working alone as a single node system, or coupled via an external controlled to other identical or similar nodes.
-
-
6. A multiprocessor system comprising:
-
a plurality of node chips, each containing multiple processors and L1 and L2 caches on a single chip, wherein said L1 cache is a SRAM and said L2 cache is a DRAM; and an external controller chip connecting said plurality of node chips to form a larger multiprocessor system, an amount of multiprocessing being scalable by adding more or less node chips to said controller chip, wherein at least one of said plurality of node chips further includes; in each of said multiple processors a private L1 cache and associated translation/memory management logic to implement a set-associative, late-select cache, each private L1 cache having a multi-ported cache directory for fast coherency maintenance via a fully shared Snoopy protocol; the L1 cache directories being interconnected by a plurality of buses to allow simultaneous interrogation and updating and having a late-write capability for "same cycle" update; outputs from all private L1 caches of said at least one of said plurality of node chips being interconnected by a selector/cross-point-switch for transferring data between caches, each private L1 cache having a pseudo-two-port structure with associated full-line width reload and store-back buffers, each cache input/output data width being equal to a full line and connected to an input/output bus of equal width for line transfer; each processor having a private L2 cache with an interface to the private L1 cache of the processor, each private L2 cache having translation/management logic to implement a set-associative, late-select organization with DRAM directories having a late-write capability, each private L2 cache comprising a DRAM main array, and an SRAM buffer to interface to the private L1 cache; coherency between the private L2s being maintained by a global directory, selectors and logic directing corss-interrogates to an appropriate private L2 cache, each private L2 cache having a pseudo-two-port structure with data buffers for reload and store-back and an interface to main memory, a wide data input/output bus with a width equal to a cache line for reload/store back operations; and logic and data circuits for interfacing to an external main memory management unit, said at least one of said plurality of chips being capable of working alone as a single node system or coupled via said external controller to other ones of said plurality of node chips. - View Dependent Claims (7, 8, 9, 10, 11)
-
Specification