Understand CPU Branch Instructions Better

Branch instructions are the primary means by which a program running on a CPU makes a decision.

This post is part of a series of posts on CPU performance, as part of the Pointer Wars: Linked List Edition challenge. This challenge is great for undergraduates, graduate students, and new engineers who want feedback about writing high performance C or C++ code. Much more info here.

The Sequential Execution Model and Branch Instructions

Programs written to execute on a CPU follow something called the sequential execution model. Sequential execution means that each CPU instruction is executed one at a time, and unless specified otherwise, the next instruction to execute is at a memory location directly after the current instruction. Do note that the sequential execution model refers to the semantics that the CPU must follow, not their actual implementations, as implementations violate this rule in ways the programmer isn’t aware of all the time. Instructions that might change the control flow of a program (control flow being the path the program takes through the code) are called branches or jumps, depending on the instruction set of the CPU. There are also dedicated call and return instructions to support function calls and those that are used to implement privileged or debug execution, such as system calls. For the purposes of this blog post, I’m generally going to refer to all of those instructions as “branches”.

Branches ultimately have a target, which is an instruction to which they may branch to if some condition is met. When they redirect the CPU to that target instruction, the branch is said to be “taken”. That’s true even in the case when the target instruction is the next sequential address (note that this is an odd corner case that likely never exists in “real code”). When a branch instruction is not taken, it’s called “not taken”. Relatively straightforward terminology.

Conditional versus Unconditional Branches

One classification of branches is whether they are conditional or unconditional. An unconditional branch is one that is always taken, by definition. Unconditional branches tend to show up when the compiler absolutely can’t avoid them, often because the target of the branch cannot be placed sequentially in memory following the instruction that proceeds it, for whatever reason (a real reason, or a compiler heuristic telling it not to). That tends to occur with function calls and returns, goto statements, switch statements, tail call recursive function implementations, and unoptimized C loop implementations (for instance C semantics are such that the condition of a while loop is evaluated first, and then the loop body is executed, repeat). There are likely others, compilers emit some beautifully optimized code, but those in my mind are the common cases.

Conditional branches on the other hand are only taken if a particular condition is met, otherwise they “fall through” to the next sequential instruction. The condition to be met is specified by the assembly language instruction. There tend to be two classes of specifying conditions, where an instruction set tends to gravitate towards one or the other:

Whether certain CPU flags are set. Certain CPU implementations have a flags register, a special purpose register which is populated by the ALU or other instructions indicating certain conditions have occurred. The four common conditions are Zero (Z), Carry (C), Negative (N), and signed carry (V), and a combination of these flags can be used to determine equality, overflow, and inequality (less than/greater than). These instruction sets often rely on an explicit compare instruction to update the flags (although the result of many other ALU instructions can update the flags as well) and may have specific instructions whose sole purpose is to set or clear individual flags.

Certain CPU implementations have a flags register, a special purpose register which is populated by the ALU or other instructions indicating certain conditions have occurred. The four common conditions are Zero (Z), Carry (C), Negative (N), and signed carry (V), and a combination of these flags can be used to determine equality, overflow, and inequality (less than/greater than). These instruction sets often rely on an explicit compare instruction to update the flags (although the result of many other ALU instructions can update the flags as well) and may have specific instructions whose sole purpose is to set or clear individual flags. Comparison between two registers or other data. Other CPU implementations compare two register values, or a register and an immediate value, or a register and a value in a memory location, or even a memory value and an immediate value. An immediate value is a constant that is directly encoded in the instruction, or inferred by the instruction of the branch (eg, compare register against zero). Based on the comparison that the branch instruction performs, and the criteria specified by the instruction (eg, equal, greater than), the branch is either taken or not taken. Some instruction sets have helper instructions, such as the MIPS SLT (Set Less Than), which are used in conjunction with equality testing to implement certain branch conditions across multiple instructions.

... continue reading