Sometimes you come across a weird bug where the output seems to be completely impossible. And it’s extremely hard to debug or search on google for, if you don’t know about this:
If your code has Undefined Behaviour, the compiler is allowed to assume it won’t happen, and can ‘optimize out’ chunks of your code.
Here’s a code snippet in a real bug I found yesterday:
#define FOO_SIZE 10 int foo2[FOO_SIZE]; ... int n=0; do { p->foo[n] = foo2[n]; n++; } while( (foo2[n] != 0) && (n < FOO_SIZE) ); printk("n is %d\n", n);
This printed out:
n is 165
and the system crashed.
But how can n become larger than FOO_SIZE=10 ? Because the compiler ‘optimized’ away the check. Here’s the asm:
} while( (foo2[n] != 0) && (n < FOO_SIZE) ); 21a14: f813 2f01 ldrb.w r2, [r3, #1]! 21a18: 2a00 cmp r2, #0 21a1a: d1f8 bne.n 21a0e <NV_Get+0x82> 21a1c: e7ca b.n 219b4 <NV_Get+0x28>
What we are seeing here is that the “n < FOO_SIZE” check has been completely removed by the compiler. Why?
Because in the check, if n == FOO_SIZE, we would be checking if: foo2[FOO_SIZE] != 0. But this is out of bounds for foo. The compiler knows that this is out of bounds, and so knows that this is undefined behaviour. But the compiler is allowed to assume that undefined behaviour doesn’t happen, and so it assumes that n can NEVER be >= FOO_SIZE. Thus it can remove the n < FOO_SIZE check.
This can be fixed by switching the order of the && like:
} while( (n < FOO_SIZE) && (foo2[n] != 0) );
Or, by checking n-1 instead (which is slightly different behaviour, but good enough for me. I was changing a lot of code with this bug.)
} while( (foo2[n-1] != 0) && (n < FOO_SIZE) ); 21a26: 7809 ldrb r1, [r1, #0] 21a28: 2900 cmp r1, #0 21a2a: d0c3 beq.n 219b4 <NV_Get+0x28> 21a2c: 429a cmp r2, r3 21a2e: d1f5 bne.n 21a1c <NV_Get+0x90> 21a30: e7c0 b.n 219b4 <NV_Get+0x28>
Now we see the second comparison in the asm.