Next: Optimization and debugging, Previous: Optimization levels, Up: Compiling with optimization [Contents][Index]
The following program will be used to demonstrate the effects of different optimization levels:
#include <stdio.h> double powern (double d, unsigned n) { double x = 1.0; unsigned j; for (j = 1; j <= n; j++) x *= d; return x; } int main (void) { double sum = 0.0; unsigned i; for (i = 1; i <= 100000000; i++) { sum += powern (i, i % 5); } printf ("sum = %g\n", sum); return 0; }
The main program contains a loop calling the powern
function.
This function computes the n-th power of a floating point number by
repeated multiplication—it has been chosen because it is suitable for
both inlining and loop-unrolling. The run-time of the program can be
measured using the time
command in the GNU Bash shell.
Here are some results for the program above, compiled on a 566MHz Intel Celeron with 16KB L1-cache and 128KB L2-cache, using GCC 3.3.1 on a GNU/Linux system:
$ gcc -Wall -O0 test.c -lm $ time ./a.out real 0m13.388s user 0m13.370s sys 0m0.010s $ gcc -Wall -O1 test.c -lm $ time ./a.out real 0m10.030s user 0m10.030s sys 0m0.000s $ gcc -Wall -O2 test.c -lm $ time ./a.out real 0m8.388s user 0m8.380s sys 0m0.000s $ gcc -Wall -O3 test.c -lm $ time ./a.out real 0m6.742s user 0m6.730s sys 0m0.000s $ gcc -Wall -O3 -funroll-loops test.c -lm $ time ./a.out real 0m5.412s user 0m5.390s sys 0m0.000s
The relevant entry in the output for comparing the speed of the resulting executables is the ‘user’ time, which gives the actual CPU time spent running the process. The other rows, ‘real’ and ‘sys’, record the total real time for the process to run (including times where other processes were using the CPU) and the time spent waiting for operating system calls. Although only one run is shown for each case above, the benchmarks were executed several times to confirm the results.
From the results it can be seen in this case that increasing the optimization level with -O1, -O2 and -O3 produces an increasing speedup, relative to the unoptimized code compiled with -O0. The additional option -funroll-loops produces a further speedup. The speed of the program is more than doubled overall, when going from unoptimized code to the highest level of optimization.
Note that for a small program such as this there can be considerable variation between systems and compiler versions. For example, on a Mobile 2.0GHz Intel Pentium 4M system the trend of the results using the same version of GCC is similar except that the performance with -O2 is slightly worse than with -O1. This illustrates an important point: optimizations may not necessarily make a program faster in every case.
Next: Optimization and debugging, Previous: Optimization levels, Up: Compiling with optimization [Contents][Index]