Apparatus and method for a cache coherent shared memory multiprocessing system
First Claim
1. An apparatus for executing a plurality of transactions in a processing system, the system having a first command initiator device, a first memory device, and a plurality of point-to-point links, the apparatus comprising:
- a plurality of channel interface units, each channel interface unit having at least one first-in-first-out buffer and first, second, and third ports for communication in two directions, each port including at least one communication interface enabling communication in at least one direction, at least the first command initiator device and at least the first memory device being coupled to said apparatus via a respective one of the first ports and a respective one of the links, each of the links enabling communication in two directions, each link including at least one communication path enabling communication in at least one direction,wherein for each of at least some of the plurality of transactions an associated transaction header and any associated transaction data are communicated between each command initiator device coupled to said apparatus and the respective channel interface unit via a plurality of associated bit-groups transferred over the respective link, each bit-group having a common predetermined number of information bits, each information said bit of each bit-group having a bit-group-sequence-dependent one of a plurality of associated functions, the same communication path of the said at least one communication path of the respective link being used for all bit-groups transferred in each direction, each of said bit-groups being transferred over said respective link one at a time via at least a first transfer, at least some of said bit-groups being unsuccessfully communicated during the first transfer and having at least one additional transfer over said respective link, each transfer over said respective link being performed in a predetermined fixed-length time-interval, at least some of said bit-groups being queued in the at least one said first-in-first-out buffer, and at least some of the transaction headers including a transaction command and a transaction address;
configurable multipurpose interconnect, the second port of each said channel interface unit being coupled to the configurable multipurpose interconnect, said configurable multipurpose interconnect enabling inter-device communication of at least some of the transaction data for a transaction-dependent first croup of the devices coupled to said apparatus, the configurable multipurpose interconnect enabled said inter-device communication being via the coupling to said second ports of the respective channel interface units;
a command serialization resource, the third port of each said channel interface unit being coupled to said command serialization resource, said command serialization resource enabling said inter-device communication of at least some of said transaction headers for a transaction-dependent second group of the devices coupled to the apparatus, said command serialization resource enabled said inter-device communication being via the coupling to said third ports of the respective channel interface units; and
command control logic, said command control logic coupled to monitor at least part of each said transaction header communicated to said command serialization resource, said command control logic ascertaining the transaction-dependent first and second groups, said first group including the command initiator device initiating the transaction and said target device associated with said transaction, the target device being selected from the devices coupled to the apparatus, the second group including each device coupled to a respective channel interface unit for which the transaction is relevant to consistent operation of the system.
15 Assignments
0 Petitions
Accused Products
Abstract
The system and method for operating a cache-coherent shared-memory multiprocessing system is disclosed. The system includes a number of devices including processors, a main memory, and I/O devices. Each device is connected by means of a dedicated point-to-point connection or channel to a flow control unit (FCU). The FCU controls the exchange of data between each device in the system by providing a communication path between two devices connected to the FCU. The FCU includes a snoop signal path for processing transactions affecting cacheable memory and a network of signal paths that are used to transfer data between devices. Each signal path can operate concurrently thereby providing the system with the capability of processing multiple data transactions simultaneously.
105 Citations
55 Claims
-
1. An apparatus for executing a plurality of transactions in a processing system, the system having a first command initiator device, a first memory device, and a plurality of point-to-point links, the apparatus comprising:
-
a plurality of channel interface units, each channel interface unit having at least one first-in-first-out buffer and first, second, and third ports for communication in two directions, each port including at least one communication interface enabling communication in at least one direction, at least the first command initiator device and at least the first memory device being coupled to said apparatus via a respective one of the first ports and a respective one of the links, each of the links enabling communication in two directions, each link including at least one communication path enabling communication in at least one direction, wherein for each of at least some of the plurality of transactions an associated transaction header and any associated transaction data are communicated between each command initiator device coupled to said apparatus and the respective channel interface unit via a plurality of associated bit-groups transferred over the respective link, each bit-group having a common predetermined number of information bits, each information said bit of each bit-group having a bit-group-sequence-dependent one of a plurality of associated functions, the same communication path of the said at least one communication path of the respective link being used for all bit-groups transferred in each direction, each of said bit-groups being transferred over said respective link one at a time via at least a first transfer, at least some of said bit-groups being unsuccessfully communicated during the first transfer and having at least one additional transfer over said respective link, each transfer over said respective link being performed in a predetermined fixed-length time-interval, at least some of said bit-groups being queued in the at least one said first-in-first-out buffer, and at least some of the transaction headers including a transaction command and a transaction address; configurable multipurpose interconnect, the second port of each said channel interface unit being coupled to the configurable multipurpose interconnect, said configurable multipurpose interconnect enabling inter-device communication of at least some of the transaction data for a transaction-dependent first croup of the devices coupled to said apparatus, the configurable multipurpose interconnect enabled said inter-device communication being via the coupling to said second ports of the respective channel interface units; a command serialization resource, the third port of each said channel interface unit being coupled to said command serialization resource, said command serialization resource enabling said inter-device communication of at least some of said transaction headers for a transaction-dependent second group of the devices coupled to the apparatus, said command serialization resource enabled said inter-device communication being via the coupling to said third ports of the respective channel interface units; and command control logic, said command control logic coupled to monitor at least part of each said transaction header communicated to said command serialization resource, said command control logic ascertaining the transaction-dependent first and second groups, said first group including the command initiator device initiating the transaction and said target device associated with said transaction, the target device being selected from the devices coupled to the apparatus, the second group including each device coupled to a respective channel interface unit for which the transaction is relevant to consistent operation of the system. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
2. A processing system for executing a plurality of transactions, the system comprising:
-
a plurality of point-to-point links; a central shared memory having a single address space having a plurality of addressable locations, the central shared memory including at least one memory device; a plurality of command initiator devices, at least one of the command initiator devices being a processor device, each processor device having at least one cache memory for selectively caching at least some of the locations of the central shared memory, none of the at least one processor device having a resident portion of the central shared memory; a plurality of channel interface units, each channel interface unit having at least one first-in-first-out buffer and first, second, and third ports for communication in two directions, each port including at least one communication interface enabling communication in at least one direction, at least the plurality of command initiator devices and the at least one memory device being coupled to the apparatus via a respective one of the first ports and a respective one of the links, each of the links enabling communication in two directions, each link including at least one communication path enabling communication in at least one direction, wherein for each of at least some of the plurality of transactions an associated transaction header and any associated transaction data are communicated between each command initiator device coupled to the apparatus and the respective channel interface unit via a plurality of associated bit-groups transferred over the respective link, each bit-group having a common predetermined number of information bits, each information bit of each bit-group having a bit-group-sequence-dependent one of a plurality of associated functions, the same communication path of the at least one communication path of the respective link being used for all bit-groups transferred in each direction, each of the bit-groups being transferred over the respective link one at a time via at least a first transfer, at least some of the bit-groups being unsuccessfully communicated during the first transfer and having at least one additional transfer over the respective link, each transfer over the respective link being performed in a predetermined fixed-length time-interval, at least some of the bit-groups being queued in the at least one first-in-first-out buffer, at least some of the transaction headers including a transaction command and a transaction address, each processor device transmitting one or more of the transaction headers to the respective channel interface unit to access at least one of the locations of the central shared memory; configurable multipurpose interconnect, the second port of each channel interface unit being coupled to the configurable multipurpose interconnect, the configurable multipurpose interconnect enabling inter-device communication of at least some of the transaction data for a transaction-dependent first group of the devices coupled to the apparatus, the configurable multipurpose interconnect enabled inter-device communication being via the coupling to the second ports of the respective channel interface units; a command serialization resource, the command serialization resource being a cache-coherence point, the third port of each channel interface unit being coupled to the command serialization resource, the command serialization resource enabling inter-device communication of at least some of the transaction headers for a transaction-dependent second group of the devices coupled to the apparatus, the command serialization resource enabled inter-device communication being via the coupling to the third ports of the respective channel interface units; and command control logic, the command control logic coupled to monitor at least part of each transaction header communicated to the command serialization resource, the command control logic ascertaining the transaction-dependent first and second groups, the first group including the command initiator device initiating the transaction and a target device associated with the transaction, the target device being selected from the devices coupled to the apparatus, the second group including the first group and any additional transaction-dependent devices necessary to maintain the central shared memory in accordance with a cache-coherency protocol that appears to the command initiators to be a snoopy-based cache-coherency protocol.
-
-
12. In a processing system having at least first and second external command initiator subsystems and at least first and second memory subsystems, each of the command initiator subsystems including at least one external command initiator device, at least one of the command initiator subsystems including a cache for cacheable memory locations, each of the memory subsystems including at least one external memory device, a method of processing a plurality of transactions, the method comprising:
-
representing each of the plurality of transactions by a corresponding command, at least some of the commands having corresponding data, at least some of the commands including an address, and at least some of the addresses being cacheable memory addresses; communicating the transaction command and any corresponding transaction data of each of the plurality of transactions via transfers of a plurality of associated bit-groups, each bit-group having a common predetermined number of information bits, each information bit of each bit-group having a bit-group-sequence-dependent one of a plurality of associated functions, each of the bit-groups being transferred via at least a first transfer, at least some of the bit-groups being unsuccessfully communicated during the first transfer and having at least one additional transfer over the respective link, each transfer over the respective link being performed in a predetermined fixed-length time-interval, and each bit-group being selectively buffered in one or more stages of clocked storage; providing a plurality of point-to-point communication links, each link communicating at least some of the bit-groups; providing a central multiple channel interface having a plurality of interface paths, each interface path communicating at least some of the bit-groups between a link-signaling-side and a transaction-processing-side of the central multiple channel interface; coupling each of the subsystems via a respective one of the links to the link-signaling side of a respective one of the interface paths of the central multiple channel interface, the coupling establishing concurrent static communication channels corresponding to each subsystem for communicating at least some of the bit-groups between each subsystem and the transaction-processing-side of the central multiple channel interface; providing a cache-coherence point on the transaction-processing-side of the central multiple channel interface; providing a dynamically configurable interconnect on the transaction-processing-side of the central multiple channel interface, the dynamically configurable interconnect having communication paths that are configurable to enable transaction-related bit-group transfers between selected pairs of the subsystems in accordance with at least some of the transaction commands; communicating the transaction commands from the command initiator subsystems via the corresponding channels of the subsystems to the transaction-processing-side of the central multiple channel interface; communicating to the cache-coherence point the addresses of at least some of the transaction commands received on the transaction-processing-side of the central multiple channel interface; for each of at least some of the cacheable memory addressees communicated to the cache-coherence point and as required to maintain the system in accordance with a cache-coherency protocol that appears to the command initiators to be a snoopy-based cache-coherency protocol, communicating the corresponding transaction command to a select plurality of the subsystems, the select plurality not including the initiator of the transaction command; communicating the transaction data corresponding to a first transaction command between the first external command initiator subsystem and the first memory subsystem via the channel of the first external command initiator subsystem, a first of the paths of the dynamically configurable interconnect, and the channel of the first memory subsystem; and communicating the transaction data corresponding to a second transaction command between the second command initiator subsystem and the second memory subsystem via the channel of the second command initiator subsystem, a second of the paths of the dynamically configurable interconnect, and the channel of the second memory subsystem, at least a portion of the transaction data for the first and second transactions being communicated simultaneously by the dynamically configurable interconnect. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. In a processing system, a method of processing, a plurality of transactions in parallel, the method comprising:
-
defining a bounded region; providing on the inside periphery of the bounded region at least one processor channel interface unit; coupling each processor channel interface unit to a respective processor subsystem external to the bounded region, each processor subsystem including at least one processor device having at least one associated cache, each processor device and each associated cache being external to the bounded region; providing on the inside periphery of the bounded region at least two memory channel interface units; coupling each memory channel interface unit to a respective memory subsystem, each memory subsystem including at least one memory device; providing on the inside periphery of the bounded region at least one I/O channel interface unit; coupling each I/O channel interface unit to a respective I/O subsystem external to the bounded region, each I/O subsystem including at least one I/O device external to the bounded region; communicating as required a transaction command and any corresponding transaction data for each of the plurality of transactions, each of the plurality of transactions having an associated initiator and target, at least some of the communicating as required using a bit-group-based transfer technique wherein the transaction commands and the transaction data of each of the plurality of transactions are communicated via transfers of a plurality of associated bit-groups, each bit-group having a common predetermined number of information bits, each information bit of each bit-group having a bit-group-sequence-dependent one of a plurality of associated functions, each of the bit-groups being transferred via at least a first transfer, at least some of the bit-groups being unsuccessfully communicated during the first transfer and having at least one additional transfer over the respective link, each transfer over the respective link being performed in a predetermined fixed-length time-interval, and each bit-group being selectively buffered in one or more stages of clocked storage, at least some of the transaction commands including addresses; wherein each transaction command and each transaction data that is communicated between each external device and the inside of the bounded region is communicated via the channel interface units using the bit-group-based techniques, the coupling to each channel interface having an associated channel interface-protocol and channel bit-width, and each external device having an associated native interface-protocol and bit-width; external to the bounded region, converting as required between the channel interface-protocols and channel bit-widths and the respective native interface-protocols and native bit-widths of each external device; providing on the inside of the bounded region an initiator communication path for and respectively coupled to each of the processor channel interface units and the I/O channel interface units; providing on the inside of the bounded region a memory communication path for and respectively coupled to each of the memory channel interface units; providing on the inside of the bounded region a cache-coherence point coupled to each of the channel interface units; issuing each transaction command by the associated initiator; receiving each transaction command by the channel interface unit respectively coupled to the associated initiator; for each transaction command having an address corresponding to a cacheable memory location, subsequent to the receiving and prior to any memory subsystem access, communicating the transaction command from the channel interface unit of the initiator to the cache-coherence point; for at least some of the transaction commands, propagating the transaction command to the target of the transaction from the cache-coherence point and via the channel interface unit of the target, additionally selectively propagating the transaction command to a dynamically determined set of zero or more of the external processor devices having at least one associated cache, the selectively propagating being from the cache-coherence point and via the channel interface units corresponding to the dynamically determined set of external processor devices, the set being dynamically determined solely within the bounded region in accordance with a cache-coherency protocol that appears to the command initiators to be a snoopy-based cache-coherency protocol; internal to the bounded region and in accordance with a first of the transaction commands, coupling at least a first of the initiator communication paths to at least a first of the memory communication paths, and communicating first transaction data between the initiator and the target of the first transaction command via the coupled first communication paths, the channel interfaces, and the couplings associated with the initiator and the target of the first transaction command; and internal to the bounded region and in accordance with a second of the transaction commands, coupling at least a second of the initiator communication paths to at least a second of the memory communication paths, and communicating second transaction data between the initiator and the target of the second transaction command via the coupled second communication paths, the channel interfaces, and the couplings associated with the initiator and the target of the second transaction command; wherein at least some of the first transaction data and second transaction data are communicated in parallel. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55)
-
Specification