if (value1 & value2 & value3) { // Do something } else { // Do something else. }
I initially thought this was a harmless typo. If you were trying determine if 3 boolean values happened to all be true this would evaluate the same as if you had used the logical && operator ((0x01 & 0x01 & 0x01) == (true && true && true)).
Fair enough, I thought, but out of interest I wondered how different this would appear at the assembly level, so I opened up Visual Studio's disassembler. The results were very interesting. I created a simple program that does comparisons on 3 different boolean values from 3 different arrays in a loop and increments one of two counters depending on whether or not all three elements are true:
bool g_Array1[NUM_ELEMENTS]; bool g_Array2[NUM_ELEMENTS]; bool g_Array3[NUM_ELEMENTS]; // Initialize to random values. int numPassed = 0; int numFailed = 0; for (int i = 0; i < NUM_ELEMENTS; i++) { if (g_Array1[i] AND g_Array2[i] AND g_Array3[i]) numPassed++; else numFailed++; }
Let's focus on the inner loop's generated code:
Bitwise AND
for (int i = 0; i < NUM_ELEMENTS; i++) 008D10AE xor ecx,ecx { if (g_Array1[i] & g_Array2[i] & g_Array3[i]) 008D10B0 mov al,byte ptr [ecx+9C75B0h] 008D10B6 and al,byte ptr [ecx+0ABB7F0h] 008D10BC test byte ptr [ecx+8D3370h],al 008D10C2 je main+57h (08D10C7h) numPassed++; 008D10C4 inc esi else 008D10C5 jmp main+58h (08D10C8h) numFailed++; 008D10C7 inc edi for (int i = 0; i < NUM_ELEMENTS; i++) 008D10C8 inc ecx 008D10C9 cmp ecx,0F4240h 008D10CF jl main+40h (08D10B0h) }
This is as one would expect. Load the first boolean value into a register, perform a bitwise AND with the second, and finally test (set the contents of the status register to the result of the AND of) the third boolean value. This is followed by a jump to increment the correct counter.
Logical AND
for (int i = 0; i < NUM_ELEMENTS; i++) 00F210AE xor eax,eax { if (g_Array1[i] && g_Array2[i] && g_Array3[i]) 00F210B0 cmp byte ptr [eax+0F23370h],0 00F210B7 je main+5Eh (0F210CEh) 00F210B9 cmp byte ptr [eax+110B7F0h],0 00F210C0 je main+5Eh (0F210CEh) 00F210C2 cmp byte ptr [eax+10175B0h],0 00F210C9 je main+5Eh (0F210CEh) numPassed++; 00F210CB inc esi else 00F210CC jmp main+5Fh (0F210CFh) numFailed++; 00F210CE inc edi for (int i = 0; i < NUM_ELEMENTS; i++) 00F210CF inc eax 00F210D0 cmp eax,0F4240h 00F210D5 jl main+40h (0F210B0h) }
In this context, we see something strange. The logical AND forces a comparison/potential branch for every argument in the comparison. Now conventional wisdom says to make sure your code is as branch free as possible (Modern high frequency, deeply pipelined CPUs like being able to prefetch and execute instructions well in advance and branching introduces an element of uncertainty into the process), but there is a reason for the code to be this way.
C like languages use an idiom called "short-circuiting" in the evaluation of their logical operator statements. What it entails is that evaluation of the arguments of a logical operation will only happen to the point where the result of the comparison can be determined. For example if you have an AND comparison to test if 100 arguments are all true, and the first argument evaluates to false, there will be no point in wasting time evaluating the other arguments. Likewise for the logical OR operator, which will jump to its "true" branch as soon as one of its arguments evaluates to true. This is a good thing if determining your arguments is a potentially expensive operation. After all, if you use your imagination a little, your code could potentially need to do this:
if (FetchFromDatabase() && LinkedListTraverse() && CreateNewFile())
However, for operations involving simple arguments, short-circuiting isn't optimal.
The only question left is just how much performance could you possibly gain by this kind of micro optimization? I set the size of the arrays in my test program to a million elements, populated with random boolean values. The loop we discussed is performed on them, and timings taken.
The program is so simple, I won't bother including code, just the final results.
For the logical AND evaluation, the time taken in the loop is ~7.10 milliseconds.
For the bitwise AND evaluation, the time taken in the loop drops to ~2.4 milliseconds.
That's almost 3x as fast in this case (Proportional to the number of branches in the loop?).
This code was executed on a desktop grade CPU, so it's not unreasonable to imagine that results on simpler architectures would be higher.