[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Quantifying the benefits of kernel recompiling
I have seen in Slashdot than Infoworld has an article about kernel
compiling so I exploded.
Suppressing kernel recompiling in Linux have always been a personal
obsession of mine. It happenned that in 95 my brother in law needed a
computer to write a thesis. What dispelled my dream about him doing
it using Linux was kernel compiling (it was 1.2 times): a litterature
professor wouldn't survive to this ordeal. In addition he had real
work to do. One thing was made clear in my mind that day: Linux will
remain a minor system as long as litterature professors can't use it
and that will not happen as long as kernel compiling reamains
So I have studied the real benefits of kernel compiling.
I) Memory savings: In Matt Welsh times kernel compiling provided a
real benefit. The unmodular 1.2 could be shrunk by as much as 1.5 Meg
after having compiled it. The typical box had 8 Megs. So kernel
compiling allowed to get 7 Megs for apps instead of 5.5: a 27%
increase. That could translate into a system noticeably less slow (the word
faster would be inappropriate ) in thrashing situations.
Today the typical 700$ computer comes with 32 Megs. In addition state of
the art distribs use modules so if the distrib guy made a half decent work
you will only gain about 500K. So instead of having 30 Megs for the apps
you will get 30.5 megs: improvement is less than 2%. A triffle. In fact
if sometimes you _need_ to recompile is because some distributions pay
little attention to the kernel shipped assuming the user will recompile
II) Adjusting to processor used:
I have used the 2.0.34 source. First I hunted for #ifdefs who were
processor dependent. There were _only_ two differences between 386s
and other processors. The first one was about byte swapping: this is
necessary for networking. While faster doing it the Pentium way if
you have one it will have negligible impact on the overall
performance. The second one is about selective invalidation of TLB
entries: the 386 only supports total invalidation of the TLB. That
means that when invalidating it (due to context switching to a
_different_ user process than the previous one) you also invalidate he
kernel entries and you will have to spend CPU cycles translating
virtual kernel addresses into real ones by looking at the page tables.
The faster the processor the more CPU cycles you lose, the slower the
memory (DRAM instead of SDRAM but don't forget you can be lucky and
find your page entries in L2 cache) the more you lose, the more kernel
entries in TLB the more you lose, the bigger the TLB the more you
lose. But don't forget a tick is 1/100 seconds and at worse with a 64
entries TLB, and 70 ns RAM you lose 14,000 ns ie 14 microseconds
assuming the entire tick is used by the process.
There was a third difference related to how blocks of memory were copied:
using a special instruction in 386 and PPros and load-stores in 486s and
Pentiums but that has been deactivated so now all processors do it the 386
About compiler flags: I compiled the Byte 95 benchmarks using the same
flags that for a 386 kernel compiling and then for a Pentium kernel
compiling. Then I ran the tests on a P75. Inprovement was slightly
inferior to two percent. That is for the "adjusting to processor"
fallacy. And that is all the benefit you will gain if you follow
the advice from articles and books.
Now I gained much higher benefits in the Byte test by trying more
aggressive optimizations than standard
"-O2 -fnostrength-reduce -fomit-framepointer"
like "-O6 -funroll-loops" but before you risk a crash and perhaps damaging
your partition (don't forget there are many asyncronous events in kernel
mode and much hand coded assembler) let's remember:
1) Programs like the VanGog effect in GIMP will spend 99.9% of their CPU
time in user mode.
2) Some programs will spend most of their CPU time in kernel mode but in
fact they are accessing peripherals and when looking the detail you will
see most of time is spent in active loops: the most optimized is your kernel
the most loops it does. :-) A poorly tuned IDE drive can suck 95% of your
3) You get "pure kernel" time when accessing a block found in cache or
when a program transmits data to another. However libc tries to
send data to the kernel in 1K blocks not byte per byte and that reduces
time spent in kernel mode. About X servers there is the SHM extension
who greatly reduces the volume of data transiting through the kernel.
4) The frequently used parts of Linux kernel have plenty of assembler
inclusions and those will be unaffected by compiler flags.
The end result is: even if you use custom flags and pgcc, recompiling
the kernel has more of a placebo effect than anything else respective
to CPU time spent in kernel mode. And still lesser respective to overall
Jean Francois Martinez
Project Independence: Linux for the Masses