[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: clock jump in, too

     On Mon, 31 Mar 2008 22:49:05 -0700 Lucky Greeen <shamrock@xxxxxxxxxxxxxx>
>Scott Bennett wrote:
>>      It appears that the "clock jump" problem does persist into
>> Here are the notice-level log messages since I started up the new version
>> this afternoon:
>> [remainder eldided]
>If  you are using a dual-core or multiprocessor system, it is likely
>that you too are experiencing issues with the TSC being out of syc
>between the cores. The TSC counters on physical CPUs can get out of sync
>just as CPUs on virtual CPUs can. Physical CPUs just experience this
>issue so rarely that the most users will never encounter it. The
>likelihood that the TSCs between two cores on the same system get out of
>sync appears to be system/motherboard/CPU dependent.

     It's actually a hyperthreading 3.4 GHz P4.  FreeBSD supports such chips
as SMP systems, although it's really a single core with two logical CPUs vying
for pipeline slots.  Running systat's vmstat display with two-second updates,
the time interrupt counts for cpu0 and cpu1 hover around 2000 and are usually
equal.  Only rarely do they differ between them by more than a single interrupt
in each two-second interval, so it's difficult to imagine the kernel tolerating
an accumulated difference of over 100 K interrupts in the two counts occurring.
It's also hard to imagine how it could happen when there's really almost no
load on the system.
>See http://kerneltrap.org/node/14003 on the rationale for adding the
>"notsc" option to the Linux kernel. Also see

     I notice it refers to a problem in the LINUX x86_64 support for dual core
systems.  Mine is only hyperthreading-capable, not dual cored, and is only
a 32-bit chip, so I run FreeBSD's i386 versions, not the amd64 versions.

>http://lkml.org/lkml/2005/11/4/173 for a detailed explanation by an AMD
>engieer of why and under which conditions the TSCs might go out of sync.

     Fascinating, indeed, but probably irrelevant to a single-cored, HTT-
enabled P4 chip.  sysctl reveals quite a few variables that are available on
such a system for the chip as a whole, but not for individual (logical) CPUs.

>At least in the past FreeBSD was also impacted by TSCs getting out of
>sync, causing the time to jump as the the process migrates between CPUs.

     Interesting thread.  Thanks.
>There is a very simple way for you to test if the time jumps you have
>been seeing are caused by the TSCs between two CPU cores being out of
>sync: simply temporarily disable SMP in your kernel. If the problem goes
>away, the TSCs likely are out of sync. If so, you may want to find out
>of there is a way to disable FreeBSD's use of the TSC similar to the
>"notsc" option in Linux and see if that addresses the problem.
     Here's a bit more information, the clue for which came from the second
URL's article above.

[hellas] 333 % sysctl kern.timecounter
kern.timecounter.stepwarnings: 0
kern.timecounter.nbinuptime: 751724079
kern.timecounter.nnanouptime: 5273
kern.timecounter.nmicrouptime: 353617
kern.timecounter.nbintime: 295482161
kern.timecounter.nnanotime: 157914288
kern.timecounter.nmicrotime: 137569526
kern.timecounter.ngetbinuptime: 2285019
kern.timecounter.ngetnanouptime: 18897431
kern.timecounter.ngetmicrouptime: 133591454
kern.timecounter.ngetbintime: 0
kern.timecounter.ngetnanotime: 70820
kern.timecounter.ngetmicrotime: 602672586
kern.timecounter.nsetclock: 4
kern.timecounter.hardware: ACPI-fast
kern.timecounter.choice: TSC(-100) ACPI-fast(1000) i8254(0) dummy(-1000000)
kern.timecounter.tick: 1
kern.timecounter.smp_tsc: 0
[hellas] 334 % sysctl hw.acpi
hw.acpi.supported_sleep_state: S1 S3 S4 S5
hw.acpi.power_button_state: S5
hw.acpi.sleep_button_state: S1
hw.acpi.lid_switch_state: NONE
hw.acpi.standby_state: S1
hw.acpi.suspend_state: S3
hw.acpi.sleep_delay: 1
hw.acpi.s4bios: 1
hw.acpi.verbose: 0
hw.acpi.disable_on_reboot: 0
hw.acpi.handle_reboot: 0
hw.acpi.reset_video: 0
hw.acpi.cpu.cx_lowest: C1
hw.acpi.acline: 1
hw.acpi.battery.life: 100
hw.acpi.battery.time: -1
hw.acpi.battery.state: 0
hw.acpi.battery.units: 1
hw.acpi.battery.info_expire: 5
hw.acpi.thermal.min_runtime: 0
hw.acpi.thermal.polling_rate: 10
hw.acpi.thermal.user_override: 0
hw.acpi.thermal.tz0.temperature: 55.5C
hw.acpi.thermal.tz0.active: -1
hw.acpi.thermal.tz0.passive_cooling: 0
hw.acpi.thermal.tz0.thermal_flags: 0
hw.acpi.thermal.tz0._PSV: -1
hw.acpi.thermal.tz0._HOT: -1
hw.acpi.thermal.tz0._CRT: 89.0C
hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
hw.acpi.thermal.tz0._TC1: -1
hw.acpi.thermal.tz0._TC2: -1
hw.acpi.thermal.tz0._TSP: -1
[hellas] 335 %

Now, I make no claim to knowing the significance of all of the above.  My
guess is that the value of kern.timecounter.hardware means that the kernel is
not using the TSC anyway, so the problem you describe above would not be what
is happening here.  Another guess is that kern.timecounter.hardware may be set
that way because kern.timecounter.smp_tsc is set to 0.  (The hw.acpi stuff
shows some examples of variables not duplicated for the two logical CPUs, e.g.,
cpu.cx_lowest and thermal.tz0.*.  There are some others hidden elsewhere, too.)
     How does it look to you?  BTW, I'm running FreeBSD 6.3-STABLE.  The
message your second URL pointed to is from 2004, and an awful lot of things
have changed since then.  (FreeBSD 7 includes many, many more changes,
including the change of default scheduler to the ULE scheduler, made reliable
at last for SMP operations, so if anything, 7.x is probably even less
vulnerable to the vagaries of various timer support mechanisms than 6.x, but
I'm not running that system yet.)

                                  Scott Bennett, Comm. ASMELG, CFIAG
* Internet:       bennett at cs.niu.edu                              *
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *