Processes, circuits, devices, and systems for branch prediction and other processor improvements
First Claim
1. A processor for processing instructions comprising a pipeline including a fetch stage and an execute stage;
- a first storing circuit associated with said fetch stage and operable to store a history of actual branches; and
a second storing circuit associated with said fetch stage and operable to store a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, said second storing circuit coupled to said first storing circuit, said execute stage coupled back to said first storing circuit; and
a third storing circuit readable for branch Taken and Not-Taken information indexed on read by a pattern from said second storing circuit and operable to supply a predicted taken bit, and update logic circuitry operable to supply an updated pattern to said second storing circuit including at least some bits of a pattern from said second storage circuit and the predicted taken bit from said third storing circuit.
1 Assignment
0 Petitions
Accused Products
Abstract
A processor (1700) for processing instructions has a pipeline (1710, 1736, 1740) including a fetch stage (1710) and an execute stage (1870), a first storing circuit (aGHR 2130) associated with said fetch stage (1710) and operable to store a history of actual branches, and a second storing circuit (wGHR 2140) associated with said fetch stage (1710) and operable to store a pattern of predicted branches, said second storing circuit (wGHR 2140) coupled to said first storing circuit (aGHR 2130), said execute stage (1870) coupled back to said first storing circuit (aGHR 2130). Other processors, wireless communications devices, systems, circuits, devices, branch prediction processes and methods of operation, processes of manufacture, and articles of manufacture, as disclosed and claimed.
-
Citations
97 Claims
-
1. A processor for processing instructions comprising a pipeline including a fetch stage and an execute stage;
-
a first storing circuit associated with said fetch stage and operable to store a history of actual branches; and a second storing circuit associated with said fetch stage and operable to store a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, said second storing circuit coupled to said first storing circuit, said execute stage coupled back to said first storing circuit; and a third storing circuit readable for branch Taken and Not-Taken information indexed on read by a pattern from said second storing circuit and operable to supply a predicted taken bit, and update logic circuitry operable to supply an updated pattern to said second storing circuit including at least some bits of a pattern from said second storage circuit and the predicted taken bit from said third storing circuit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 11, 12, 13, 17, 18, 19, 20, 21, 22, 23)
-
-
8. A processor for processing instructions comprising
a pipeline including a fetch stage and an execute stage; -
a first storing circuit associated with said fetch stage and operable to store a history of actual branches; and a second storing circuit associated with said fetch stage and operable to store a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, said second storing circuit coupled to said first storing circuit, said execute stage coupled back to said first storing circuit; an instruction cache having cache lines stored therein, and a cache line branch decoder circuit responsive to a cache line to update said second storing circuit; wherein said cache line branch decoder circuit is operable to detect branches on the cache line and includes update circuitry coupled to said second storing circuit and operable to modify the pattern with different numbers of bits dependent on the number of detected branches; and wherein said update circuitry is responsive to a predicted address to ignore branches on the cache line having an address prior to the predicted address, whereby the different numbers of bits are also dependent on the predicted address. - View Dependent Claims (9, 10)
-
-
14. A processor for processing instructions comprising
a pipeline including a fetch stage and an execute stage; -
a first storing circuit associated with said fetch stage and operable to store a history of actual branches; a second storing circuit associated with said fetch stage and operable to store a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, said second storing circuit coupled to said first storing circuit, said execute stage coupled back to said first storing circuit; and a branch target buffer holding branch target addresses and the third storing circuit being further indexed on read by particular index bits dependent on a subset of bits from a branch target address accessed from said branch target buffer in response to the instruction address. - View Dependent Claims (15)
-
-
16. A processor for processing instructions comprising
a pipeline including a fetch stage and an execute stage; -
a first storing circuit associated with said fetch stage and operable to store a history of actual branches; and a second storing circuit associated with said fetch stage and operable to store a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, said second storing circuit coupled to said first storing circuit, said execute stage coupled back to said first storing circuit; and a writable third storing circuit situated in said pipeline earlier than said fetch stage associated with said first storing circuit, said third storing circuit writable to load branch Taken and Not-Taken information indexed on write by a pattern from said first storing circuit.
-
-
24. A method of branch prediction in a processor having a pipeline with a fetch stage and an execute stage, the method comprising
storing a history of actual branches and separately storing a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, so that the storing and separately storing are both time-wise parallel to the fetch stage, and coupling back branch data from the execute stage for the storing of the history of actual branches, and branch decoding to detect branches on a cache line, and modifying the pattern with different numbers of bits dependent on the number of detected branches, wherein the branch decoding ignores branches on the cache line having an address prior to a predicted address, whereby the different numbers of bits are also dependent on the predicted address.
-
37. A method of branch prediction in a processor having a pipeline with a fetch stage and an execute stage, the method comprising
storing a history of actual branches and separately storing a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, so that the storing and separately storing are both time-wise parallel to the fetch stage, and updating the pattern by concatenating at least some bits of the pattern with at least one bit of predicted branch history obtained from a global history buffer indexed on the pattern; - and
detecting branches on a cache line, and including a number of insertion bits between the at least some bits of the pattern and the at least one bit of predicted branch history, the number of insertion bits being a function of the number of detected branches on the cache line. - View Dependent Claims (38)
- and
-
42. A wireless communications unit comprising
a wireless antenna; -
a wireless transmitter and receiver coupled to said wireless antenna; a microprocessor coupled to at least one of the transmitter and receiver, the microprocessor including a pipeline having a fetch stage and an execute stage, a first storing circuit associated with said fetch stage and operable to store a history of actual branches, and a second storing circuit associated with said fetch stage and operable to store a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch, said second storing circuit coupled to said first storing circuit, said execute stage coupled back to said first storing circuit and a writable third storing circuit situated in said pipeline earlier than said fetch stage associated with said first storing circuit, said third storing circuit writable to load branch Taken and Not-Taken information indexed on write by a pattern from said first storing circuit; and a user interface coupled to said microprocessor;
whereby the wireless communication unit has increased instruction efficiency. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49)
-
-
50. A processor for processing instructions comprising
an instruction cache having a cache line; -
a pipeline having at least one fetch stage and at least one decode stage; an additional decode circuit having respective circuit portions situated for fetch purposes time-wise in parallel with said at least one fetch stage and said at least one decode stage, said additional decode circuit responsive to said cache line to generate at least one set of bits representing presence of plural branches in said cache line when plural branches occur and at least one different bit representing presence of a single branch in said cache line; and a pattern storing circuit responsive to the additional decode circuit to hold and update a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch; and a branch history storing circuit responsive to the pattern to supply a branch Taken/Not-Taken prediction for a branch instruction in the cache line, and wherein said pattern storing circuit is operable to update the pattern by concatenating plural bits already in the pattern with differing numbers of bits depending on the presence of branches on the cache line, together with at least one bit representing the branch Taken/Not-Taken prediction from the branch history storing circuit. - View Dependent Claims (51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 65, 67, 68)
-
-
62. A processor for processing instructions comprising
an instruction cache having a cache line; -
a pipeline having at least one fetch stage and at least one decode stage; an additional decode circuit having respective circuit portions situated for fetch purposes time-wise in parallel with said at least one fetch stage and said at least one decode stage, said additional decode circuit responsive to said cache line to generate at least one set of bits representing presence of plural branches in said cache line when plural branches occur and at least one different bit representing presence of a single branch in said cache line; and a pattern storing circuit responsive to the additional decode circuit to hold and update a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch; and wherein said additional decode circuit is operable to detect branch instructions in said cache line wherein a branch instruction wraps around from said cache line to a succeeding cache line.
-
-
63. A processor for processing instructions comprising
an instruction cache having a cache line; -
a pipeline having at least one fetch stage and at least one decode stage; an additional decode circuit having respective circuit portions situated for fetch purposes time-wise in parallel with said at least one fetch stage and said at least one decode stage, said additional decode circuit responsive to said cache line to generate at least one set of bits representing presence of plural branches in said cache line when plural branches occur and at least one different bit representing presence of a single branch in said cache line; and a pattern storing circuit responsive to the additional decode circuit to hold and update a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch; and wherein said additional decode circuit is operable to detect a branch instruction that wraps around from another cache line to said cache line.
-
-
64. A processor for processing instructions comprising
an instruction cache having a cache line; -
a pipeline having at least one fetch stage and at least one decode stage; an additional decode circuit having respective circuit portions situated for fetch purposes time-wise in parallel with said at least one fetch stage and said at least one decode stage, said additional decode circuit responsive to said cache line to generate at least one set of bits representing presence of plural branches in said cache line when plural branches occur and at least one different bit representing presence of a single branch in said cache line; and a pattern storing circuit responsive to the additional decode circuit to hold and update a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch wherein said additional decode circuit is operable to count branch instructions in said cache line wherein the branch instructions are of different lengths.
-
-
66. A processor for processing instructions comprising
an instruction cache having a cache line; -
a pipeline having at least one fetch stage and at least one decode stage; an additional decode circuit having respective circuit portions situated for fetch purposes time-wise in parallel with said at least one fetch stage and said at least one decode stage, said additional decode circuit responsive to said cache line to generate at least one set of bits representing presence of plural branches in said cache line when plural branches occur and at least one different bit representing presence of a single branch in said cache line; and a pattern storing circuit responsive to the additional decode circuit to hold and update a pattern of predicted branches, said pattern comprising a plurality of bits wherein each bit in said plurality of bits corresponds to a predicted taken or predicted not taken behavior of a respective branch; and wherein said additional decode circuit is operable to count branch instructions in said cache line wherein the branch instructions are of different lengths, not counting any branch instruction on the cache line that has an address less than a predicted address.
-
-
69. A method of branch prediction in a processor having a pipeline with a fetch stage and a decode stage and an instruction cache having a cache line, the method comprising decoding branch instructions in the cache line time-wise in parallel with the fetch stage and the decode stage;
further comprising responding to the pattern to supply a branch Taken/Not-Taken prediction for a branch instruction in the cache line, and the updating the pattern includes concatenating plural bits already in the pattern with differing numbers of bits depending the presence of branches on the cache line together with at least one bit representing the branch Taken/Not-Taken prediction from the branch history storing circuit. - View Dependent Claims (70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 85)
-
84. A method of branch prediction in a processor having a pipeline with a fetch stage and a decode stage and an instruction cache having a cache line, the method comprising decoding branch instructions in the cache line time-wise in parallel with the fetch stage and the decode stage, wherein the decoding branch instructions includes counting branch instructions in the cache line wherein the branch instructions are of different lengths, not counting any branch instruction on the cache line that has an address less than a predicted address.
-
86. A method of updating a first pattern for accessing a global history buffer in a branch predictor in a processor, the method comprising
detecting a number of branches on a cache line; -
accessing the global history buffer with the first pattern to obtain a branch prediction datum; and supplying an updated pattern including at least some bits of the first pattern, and a number of bits that depend on the number of branches detected on the cache line, and the updated pattern further including the branch prediction datum. - View Dependent Claims (87, 88, 89)
-
-
90. Circuitry for a processor having an instruction cache with a cache line, the circuit comprising
a register for a first pattern; -
a detector of a number of branches on the cache line; a global history buffer responsive to the first pattern to obtain a branch prediction datum; and an update circuit operable to supply an updated pattern to said register including at least some bits of the first pattern, and a number of bits that depend on the number of branches detected on the cache line, and the updated pattern further including the branch prediction datum. - View Dependent Claims (91, 92, 93)
-
-
94. A processor for processing instructions comprising
a pipeline including a fetch stage and an execute stage; -
a first storing circuit associated with said fetch stage and operable to store a history of actual branches; and a second storing circuit associated with said fetch stage and operable to store a pattern of predicted branches, said second storing circuit coupled to said first storing circuit, said execute stage coupled back to said first storing circuit; wherein said cache line branch decoder circuit is operable to detect branches on the cache line and includes update circuitry coupled to said second storing circuit and operable to modify the pattern with different numbers of bits dependent on the number of detected branches; and wherein said update circuitry is responsive to a predicted address to ignore branches on the cache line having an address prior to the predicted address, whereby the different numbers of bits are also dependent on the predicted address.
-
-
95. A method of branch prediction in a processor having a pipeline with a fetch stage and a decode stage and an instruction cache having a cache line, the method comprising decoding branch instructions in the cache line time-wise in parallel with the fetch stage and the decode stage, wherein the decoding branch instructions includes detecting branch instructions in the cache line wherein a branch instruction wraps around from the cache line to a succeeding cache line.
-
96. A method of branch prediction in a processor having a pipeline with a fetch stage and a decode stage and an instruction cache having a cache line, the method comprising decoding branch instructions in the cache line time-wise in parallel with the fetch stage and the decode stage, wherein the decoding branch instructions includes detecting a branch instruction that wraps around from another cache line to the cache line.
-
97. A method of branch prediction in a processor having a pipeline with a fetch stage and a decode stage and an instruction cache having a cache line, the method comprising decoding branch instructions in the cache line time-wise in parallel with the fetch stage and the decode stage, wherein the decoding branch instructions includes counting branch instructions in the cache line wherein the branch instructions are of different lengths, not counting a branch instruction that wraps around to a succeeding cache line.
Specification