Imprecise snooping based invalidation mechanism
First Claim
1. A method for providing directed system response to an invalidation miss at a local processor cache of a data processing system having a plurality of processors, said method comprising:
- providing directional bits for a cache line within a cache directory of said local processor cache, wherein said directional bits includes at least one source bit that is utilized to store an identifier (ID) of one of said plurality of processors and at least one route bit that is utilized to indicate a transfer method from among multiple transfer methods for forwarding a request for said cache line;
in response to a snoop of an operation that causes a coherency state of said cache line in said local processor cache to go invalid, setting a value of said directional bits to indicate a processor ID associated with an origination processor that issued said operation; and
responsive to a request for said cache line by an associated local processor, immediately forwarding said request to a processor indicated by said processor ID via a transfer method indicated by said at least one route bit, whereby said request is forwarded to said origination processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system, and processor cache configuration that enables efficient retrieval of valid data in response to an invalidate cache miss at a local processor cache. A cache directory is provided a set of directional bits in addition to the coherency state bits and the address tag. The directional bits provide information that includes a processor cache identification (ID) and routing method. The processor cache ID indicates which processor'"'"'s operation resulted in the cache line of the local processor changing to the invalidate (I) coherency state. The routing method indicates what transmission method to utilize to forward the cache line, from among a local system bus or a switch or broadcast mechanism. Processor/Cache directory logic provide responses to requests depending on the values of the directional bits.
-
Citations
26 Claims
-
1. A method for providing directed system response to an invalidation miss at a local processor cache of a data processing system having a plurality of processors, said method comprising:
-
providing directional bits for a cache line within a cache directory of said local processor cache, wherein said directional bits includes at least one source bit that is utilized to store an identifier (ID) of one of said plurality of processors and at least one route bit that is utilized to indicate a transfer method from among multiple transfer methods for forwarding a request for said cache line;
in response to a snoop of an operation that causes a coherency state of said cache line in said local processor cache to go invalid, setting a value of said directional bits to indicate a processor ID associated with an origination processor that issued said operation; and
responsive to a request for said cache line by an associated local processor, immediately forwarding said request to a processor indicated by said processor ID via a transfer method indicated by said at least one route bit, whereby said request is forwarded to said origination processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 24)
first determining that said origination processor belongs to a local processor group that includes said local processor; and
in response to said determining step, setting said value of said at least one route bit to indicate forwarding via a local transmission method.
-
-
4. The method of claim 3, wherein when said local processor is connected via a switch to other processors within said local processor group, said request is forwarded directly to said origination processor, and when said local processor is connected via a local system bus, said request is broadcasted on said local system bus.
-
5. The method of claim 3, further comprising setting said value of said route bit to indicate a global, system-wide bus broadcast when said origination processor does not belong to said local group.
-
6. The method of claim 3, further comprising setting said value of said routing bit(s) to indicate a directed, system-wide bus broadcast when said origination processor does not belong to said local group and said processor groups are connected via a switch, wherein a specific processor from another processor group is sent the request directly.
-
7. The method of claim 3, further comprising, responsive to a cache miss when said request is transmitted directly to said origination processor, issuing said request to said global system bus.
-
8. The method of claim 1, wherein said forwarding further includes:
-
when more than one processor has a valid copy of the cache line, identifying which processor among the more than one processor is a closest processor;
storing an ID of the closest processor having a valid copy of said cache line within said source bit; and
forwarding said request to said closest processor, wherein said cache line is source from the processor that is closest to the requesting processor reducing a response latency.
-
-
24. The method of claim 1, wherein:
-
the route bit indicates a transmission method and path assigned to the request; and
said method and path ranges from a direct point-to-point request to the processor identified by the processor ID within the source bit, a localized broadcast request to a local bus, direct point-to-point (or targeted broadcast) of the request to a specific non-local processor, and a global broadcast request to the global data processing system bus, wherein each method and path is assigned a different value.
-
-
9. A multiprocessor data processing system that provides directed addressing of cache intervention in response to an invalidate, comprising:
-
a plurality of processors, each processor having an associated cache that supports intervention;
logic associated with a cache directory of at least one local processor cache that responsive to a snoop of an operation from a first processor that invalidates a cache line of said local processor cache;
(1) updates a directory entry of said cache line to include a processor identifier (ID) of first processor, which issued said operation; and
(2) provides at least one route bit with source routing information for said directory entry that is utilized to indicate a transfer method from among a plurality of different transfer methods for forwarding a subsequent request from a second processor for said cache line; and
wherein said logic responsive to a subsequently snooped request from a second processor to access said cache line, immediately directs said request to a processor indicated by said processor ID via a transfer method indicated by said source routing information, whereby said request is forwarded to said first processor. - View Dependent Claims (10, 11, 12, 13, 14, 15, 25)
means for first determining that said first processor belongs to a local processor group that includes said local processor; and
means, responsive to said determining step, for setting said value of said at least one route bit to indicate forwarding via a local transmission method.
-
-
11. The multiprocessor data processing system of claim 10, wherein when said local processor is connected via a switch to other processors within said local processor group, said request is forwarded directly to said first processor, and when said local processor is connected via a local system bus, said request is broadcasted on said local system bus.
-
12. The multiprocessor data processing system of claim 10, wherein said logic further comprises means for setting said value of said route bit to indicate a global, system-wide bus broadcast when said first processor does not belong to said local group.
-
13. The multiprocessor data processing system of claim 10, wherein said logic further comprises means for setting said value of said routing bit(s) to indicate a directed, system-wide bus broadcast when said first processor does not belong to said local group and said processor groups are connected via a switch, wherein a specific processor from another processor group is sent the request directly.
-
14. The multiprocessor data processing system of claim 11, wherein said logic further comprises means, responsive to a cache miss when said request is transmitted directly to said first processor, for issuing said request to said global system bus.
-
15. The multiprocessor data processing system of claim 9, wherein said logic for directing said request further includes:
-
logic, when more than one processor has a valid copy of the cache line, for identifying which processor among the more than one processor is a closest processor;
logic for storing an ID of the closest processor having a valid copy of said cache line within said source bit; and
forwarding said request to said closest processor, wherein said cache line is source from the processor that is closest to the requesting processor reducing a response latency.
-
-
25. The data processing system of claim 9, wherein:
-
the route bit indicates a transmission method and path assigned to the request; and
said method and path ranges from a direct point-to-point request to the processor identified by the processor ID within the source bit, a localized broadcast request to a local bus, direct point-to-point (or targeted broadcast) of the request to a specific non-local processor, and a global broadcast request to the global data processing system bus, wherein each method and path is assigned a different value.
-
-
16. A memory subsystem of a multiprocessor data processing system comprising:
-
a memory;
a plurality of caches associated with processors of said multiprocessor data processing system that comprise cache lines in which data is stored;
a plurality of cache directories each affiliated with a particular one of said plurality of caches, wherein each entry of said cache directory includes a coherency state for each cache line within said particular cache, an address tag, and directional bits, which include;
(1) processor ID of an origination processor whose cache contains a valid copy of data when said coherency state of said cache line is the invalidate state, wherein an operation that caused said cache line to be invalidated was issued by the origination processor; and
(2) routing bit(s) with source routing information for a directory entry that is utilized to indicate a transfer method for forwarding a request for said cache line; and
logic, responsive to a receipt of a request for said cache line, for forwarding a request for said cache line from an associated local processor to an origination processor indicated by said directional bits utilizing the transfer method indicated by said source routing information. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 26)
means, responsive to a snoop of an operation that invalidates a cache line of said local processor cache, for updating a directory entry of said cache line to include a processor identifier (ID) of the origination processor, which issued said operation; and
means, responsive to a later request from a local processor to access said cache line, for immediately forwarding said request to a processor indicated by said processor ID, whereby said request is forwarded to said origination processor.
-
-
18. The memory subsystem of claim 16, wherein said multiprocessor data processing system comprises at least two nodes of processor groups, and said logic further includes:
-
means for first determining that said origination processor belongs to a local processor group that includes said local processor; and
means, responsive to said determining step, for setting said value of said route bit(s) to indicate forwarding via a local transmission mechanism.
-
-
19. The memory subsystem of claim 18, wherein when said local processor is connected via a switch to other processors within said local processor group, said request is forwarded directly to said origination processor, and when said local processor is connected via a local system bus, said request is broadcasted on said local system bus.
-
20. The memory subsystem of claim 18, wherein said logic further comprises means for setting said value of said route bit(s) to indicate a global, system-wide bus broadcast when said origination processor does not belong to said local group.
-
21. The memory subsystem of claim 18, wherein said logic further comprises means for setting said value of said routing bit(s) to indicate a directed, system-wide bus broadcast when said origination processor does not belong to said local group and said processor groups are connected via a switch, wherein a specific processor from another processor group is sent the request directly.
-
22. The memory subsystem of claim 19, wherein said logic further comprises means, responsive to a cache miss when said request is transmitted directly to said origination processor, for issuing said request to said global system bus.
-
23. The memory subsystem of claim 22, wherein said forwarding means further includes:
-
means, when more than one processor has a valid copy of the cache line, for identifying which processor among the more than one processor is a closest processor;
means for storing an ID of the closest processor having a valid copy of said cache line within said source bit; and
means for forwarding said request to said closest processor, wherein said cache line is source from the processor that is closest to the requesting processor reducing a response latency.
-
-
26. The memory subsystem of claim 16, wherein:
-
the route bit indicates a transmission method and path assigned to the request; and
said method and path ranges from a direct point-to-point request to the processor identified by the processor ID within the source bit, a localized broadcast request to a local bus, direct point-to-point (or targeted broadcast) of the request to a specific non-local processor, and a global broadcast request to the global data processing system bus, wherein each method and path is assigned a different value.
-
Specification