Instruction and data cache with a shared TLB for split accesses and snooping in the same clock cycle
First Claim
1. In a computer system having at least one central processing unit (CPU), a clock having cycles with at least first and second phases, and a cache arrangement comprising a cache data array and a cache tag array, said computer system having a first bus for conveying virtual addresses for accessing said cache memory and a second bus for conveying other information to said cache memory, the improvement comprising:
- a first cache tag array coupled to said first bus and to said second bus wherein during said first phase of said clock cycle, said cache tag array is accessed from said first bus and during said second phase of said clock cycle, said first cache tag array is available to receive a snoop address from said second bus; and
a second cache tag array coupled to said first bus and to said second bus for being available to receive snoop addresses from said second bus during said first phase of a clock cycle and for accessing by said first bus during said second phase of a clock cycle;
whereby addresses from which information is to be retrieved can be processed by said first cache tag array while detected snoop address are processed by said second cache tag array and vice versa.
1 Assignment
0 Petitions
Accused Products
Abstract
A caching arrangement which can work efficiently in a superscaler and multiprocessing environment includes separate caches for instructions and data and a single translation lookaside buffer (TLB) shared by them. During each clock cycle, retrievals from both the instruction cache and data cache may be performed, one on the rising edge of the clock cycle and one on the falling edge. The TLB is capable of translating two addresses per clock cycle. Because the TLB is faster than accessing the tag arrays which in turn are faster than addressing the cache data arrays, virtual addresses may be concurrently supplied to all three components and the retrieval made in one phase of a clock cycle. When an instruction retrieval is being performed, snooping for snoop broadcasts may be performed for the data cache and vice versa. Thus, for every clock cycle, an instruction and data cache retrieval may be performed as well as snooping.
36 Citations
17 Claims
-
1. In a computer system having at least one central processing unit (CPU), a clock having cycles with at least first and second phases, and a cache arrangement comprising a cache data array and a cache tag array, said computer system having a first bus for conveying virtual addresses for accessing said cache memory and a second bus for conveying other information to said cache memory, the improvement comprising:
-
a first cache tag array coupled to said first bus and to said second bus wherein during said first phase of said clock cycle, said cache tag array is accessed from said first bus and during said second phase of said clock cycle, said first cache tag array is available to receive a snoop address from said second bus; and a second cache tag array coupled to said first bus and to said second bus for being available to receive snoop addresses from said second bus during said first phase of a clock cycle and for accessing by said first bus during said second phase of a clock cycle; whereby addresses from which information is to be retrieved can be processed by said first cache tag array while detected snoop address are processed by said second cache tag array and vice versa. - View Dependent Claims (2, 3)
-
-
4. A cache memory apparatus for use in a computer system, said computer system having at least one central processing unit (CPU), a clock having cycles with at least first and second phases and a main memory having a plurality of memory locations each represented by a physical address, said processing unit generating virtual addresses corresponding to said physical addresses when running a process, said computer system having a first bus for conveying virtual addresses for accessing said cache memory apparatus and a second bus for conveying other information comprising snoop addresses, instructions, and data to said cache memory apparatus, said cache memory apparatus comprising:
-
a translation lookaside buffer coupled to said first bus for receiving first and second virtual addresses on said first and second phases of a clock cycle, respectively, for translating said first and second virtual addresses to corresponding physical addresses; a first cache tag array coupled to said first bus and said translation lookaside buffer for receiving lower order bits of said first virtual address during said first phase of a clock cycle concurrently with said translation lookaside buffer receiving said first virtual address, said first cache tag array using the lower order bits of said first virtual address to determine a stored physical address, said first cache tag array further being coupled to said translation lookaside buffer for receiving a translated physical address, said first cache tag array comparing said translated physical address from said translation lookaside buffer to said stored physical address located at an indexed location in said first cache tag array based on the lower order bits, for determining a cache hit, said first cache tag array also being coupled to said second bus for monitoring for snoop addresses when not being accessed from said first bus; a first cache data array coupled to said first bus for receiving the lower order bits of said first virtual address during said first phase of a clock cycle concurrently with said translation lookaside buffer and said first cache tag array receiving said first virtual address, said cache data array outputting to said CPU data from said indexed location when said first cache tag array indicates a cache hit for said first virtual address, said first cache data array also being coupled to said second bus for receiving information when not being accessed from said first bus; a second cache tag array coupled to said first bus and said translation lookaside buffer for receiving lower order bits of said second virtual address during said second phase of a clock cycle concurrently with said translation lookaside buffer receiving said second virtual address, said second cache tag array using the lower order bits of said second virtual address to determine a stored physical address, said second cache tag array further being coupled to said translation lookaside buffer for receiving a translated physical address, said second cache tag array comparing said translated physical address from said translation lookaside buffer to said stored physical address located at an indexed location in said second cache tag array based on the lower order bits, for determining a cache hit, said second cache tag array also being coupled to said second bus for monitoring for snoop addresses when not being accessed from said first bus; and a second cache data array coupled to said first bus for receiving the lower order bits of said second virtual address during said second phase of a clock cycle concurrently with said translation lookaside buffer and said second cache tag array receiving said second virtual address, said cache data array outputting to said CPU data from said indexed location when said second cache tag array indicates a cache hit for said second virtual address, said second cache data array also being coupled to said second bus for receiving information when not being accessed from said first bus; wherein said first virtual address is supplied during the first phase of a clock cycle to said TLB and said first cache tag array and said first cache data array for translation and hit determination, and during the second phase of said clock cycle said second virtual address is supplied to said TLB and said second cache tag array and said second cache data array, thus allowing caching from said first and said second cache arrays during each clock cycle. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for caching in a superscalar microprocessor system wherein said microprocessor is responsive to clock cycles each having at least a first phase and a second phase and uses virtual addresses which have corresponding physical addresses, said caching system having a translation lookaside buffer, first and second cache tag arrays, and first and second cache data arrays both having at least one block having at least one line having low and high order bits, said method comprising the steps of:
-
conveying a virtual address during a first phase of a clock cycle to said translation lookaside buffer, said first cache tag array and said first cache data array; translating in said translation lookaside buffer the higher order bits of said virtual address into a corresponding physical page address and providing said physical page address to said first cache tag array; concurrent with said translating step, indexing to the line of said first cache tag array indicated by the lower order bits of said virtual address, said line of said first cache tag array holding a physical address of the data held in a corresponding location in said first cache data array; concurrent with said translating step, indexing to the block of said first cache data array indicated by the lower order bits of said virtual address; comparing said translated physical address to the physical address stored at the index line of said first cache tag array to determine if there is a cache hit; providing access to said microprocessor said indexed block of data from said first data cache array if there is a cache hit; conveying a virtual address during a second phase of a clock cycle to said translation lookaside buffer, said second cache tag array and said second cache data array; translating in said translation lookaside buffer the higher order bits of said virtual address into a corresponding physical page address and providing said physical page address to said second cache tag array; concurrent with said translating step, indexing to the line of said second cache tag array indicated by the lower order bits of said virtual address, said line of said second cache tag array holding a physical address of the data held in a corresponding location in said second cache data array; concurrent with said translating step, indexing to the block of said second cache data array indicated by the lower order bits of said virtual address; comparing said translated physical address to the physical address stored at the index line of said second cache tag array to determine if there is a cache hit; providing access to said microprocessor said indexed block of data from said second data cache array if there is a cache hit; during said first phase of said clock cycle, monitoring for a snoop address in said second cache tag array; and during said second phase of a clock cycle, monitoring for a snoop address in said first cache tag array. - View Dependent Claims (16, 17)
-
Specification