[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gettimeofday() and clock



Steve Baker wrote:
> Mads Bondo Dydensborg wrote:
> 
>> On Mon, 2 Sep 2002, Steve Baker wrote:
> 
> 
>>> but in practice it can't wake
>>> your program up any faster than the kernel timeslice - which is 
>>> 1/50th second.
>>
>>
>>
>> is wrong. Not the first part, only the latter, and this may only be
>> because I misread it (I answer these mails too late in the evening,
>> English is not my native language, I tend to misunderstand the core of a
>> thread sometimes). I believe the kernel _can_ wake you up in HZ of a
>> second, which is why the original poster saw sleeps of 10 seconds + 10 
>> ms.
>>
>> You are investigating something sligthly different though; what is the
>> minimal delay to be _put to sleep_ and _waken up_.  This may very well 
>> be 20 ms on Intel - at least if you do not want to eat up the cycles.
> 
> 
> I'm getting very confused.  I presumed that calling 'usleep' (or 'sleep)
> causes the following train of events:
> 
>   1) The current process gives up the remainder of it's timeslice,
>      which the kernel immediately gives to the next most deserving
>      process...or halt's the CPU to save power if there is nothing
>      to run.
> 
>   2) If that process doesn't give up the CPU (or if we halt'ed), then at
>      the next (100Hz maybe) timer interrupt, the kernel forcably takes
>      control and examines the list of processes that want to run.  Since
>      my process's 'usleep' timer has expired *many* milliseconds ago, it's
>      again eligible to be run...so...
> 
>    3) My process should wake up and continue to run.
> 
> Since my little test program consumes almost zero CPU time, and
> immediately goes back for a 1ms sleep, that *ought* to mean that
> on an idle system, it gets awoken every 10ms.
> 
> However, (at least on my system), it only wakes up every 20ms...well,
> 19.9ms or so.

Please type

man nanosleep

and read carefully. At least my version of the manpage describes the 
behaviour you experience (it's the first paragraph in the BUGS section). 
Presumably sleep and usleep are no more than wrappers around nanosleep() 
as there is no other similar syscall available.

Btw: sys_nanosleep() is implemented in linux/kernel/timer.c and itself 
is a pretty neat and straightforward piece of code. It ends up pretty 
soon calling schedule() (from inside schedule_timeout() in 
kernel/sched.c), first to switch to another task, and again after the 
timer has expired. The priority of the calling process is not changed 
explicitely by that function. However, the second call to schedule() 
seems to give the next time slice away to another process and the next 
schedule() activates the task that has called nanosleep(). Thus it adds 
up to a litle bit less than 20ms, which is the time your program measures.

> 
> That would be consistent with a 50Hz kernel rate - but if the kernel
> really wakes up at 100Hz, then we have to ask *WHY* it didn't restart
> my little process as close as possible to it's requested sleep period,
> but instead waited until the FOLLOWING timeslice in order to do that.
> 

See above.

> With more care (such as in the embedded systems I work on), you can turn 
> off
> *all* annoying background processes.  We had a system running for an entire
> weekend without missing a single 20ms tick.  However, we *never* see 10ms
> sleeps unless the process had been running for >10ms before it slept.
> 
>>> Maybe it's time for us games/graphics types to hang out on the kernel
>>> mailing list and lobby for a higher rate.
>>>
>>> Whatever the rate is, it's been the same since 33MHz 386's - and now
>>> we are close to having 3.3GHz CPU's (100 times faster), asking for a 
>>> mere
>>> 10x speedup in the kernel's update rate seems a rather modest request.
>>
>>
>>
>> Not only that, with increased caches and memories, the cost of 
>> switching contexts should have gone down (although I am unsure about 
>> the virtual page tables).
> 
> 
> Yes.
> 

Am I the only one to doubt this? The ratio between memory clock and cpu 
clock is rather extreme: it's more than 1:10 and rising. This means that 
new data can arrive from memory about ever fifth cpu cycle in a pretty 
optimal cycle. Of course this is only true when we miss all caches.

For each process an x86 cpu needs a lot of data. For a process using 
less than 4MB of memory this is at least 8kb which must be loaded by the 
processor. Even for processes with small memory footprint this is 
probably more, as executable code and data need at least 2 seperate page 
tables. And shared libraries and environment variables have their own 
address region as well. So it is safe to assume that with each context 
switch the processor has to load at least 12 to 16kb of data into an 
internal buffer to conduct the mapping between linear and physical 
address space. This is done when a new page directory is loaded.

This happens with every context switch. Given that there are 5 processes 
which tend to get scheduled the amount of context data for these 
processes is about 60 to 80kb of memory which tries to find its way into 
data caches in addition to the code and data of the processes 
themselves. I estimate that the code and data the processor needs to 
fetch during a process's time slice are somewhere between 1k and 100k 
depending on what the process does.

So the point is that task switching is a bigger overhead than it might 
probably seem. And although there are rather big caches a higher 
scheduling frequency causes more cache misses and therefore slows the 
system down because the processor needs to wait for data to arrive from 
the memory. This lets the real overhead become bigger than the probably 
expected frequency * scheduler overhead with increasing frequency.

Or am I missing something important here?

Gregor