[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-relays] FreeBSD 13.1: clock_gettime(CLOCK_MONOTONIC_FAST) ~ 50 % performance gain



Hello everyone,

I was doing some profiling on my two relays running on FreeBSD 13.1
and noticed that they were spending a lot of time in clock_gettime()
which prompted me to have a look at the implementation.

Time implementation
===================

The time implementation is abstracted in src/lib/time/compat_time.c
where different mechanisms are used for different operating systems.
On Linux CLOCK_MONOTONIC_COARSE is a clock that gives worse precision
than CLOCK_MONOTONIC, but is faster and the abstraction layer checks
for its presense and provides more performat less precise time where
applicable.

On FreeBSD, there is also a fast monotonic time source available
called CLOCK_MONOTONIC_FAST. In the header file
src/lib/time/compat_time.h, a comment references this clock, but it is
not used. I thought it might be worth a shot seeing what difference it
would make if I enable the use of CLOCK_MONOTONIC_FAST on FreeBSD and
on the VM where I run my two FreeBSD relays, the difference was
stunning.

I made did a quick patch simply replacing CLOCK_MONOTONIC_COARSE with
CLOCK_MONOTONIC_FAST, see patches attached, compiled and tested.
Tracing system calls to make sure the correct call was being used,
which it was.

Results
=======

This lead to reducing the CPU usage of the patched relay by about 50 %
compared to the unpatched relay. I was a bit shocked so I wrote a
small benchmark program and ran it on my VM giving the following
results:

CLOCK_MONOTONIC: 4.776675 s
CLOCK_MONOTONIC_FAST: 0.260002 s

Showing that on my VM the performance of CLOCK_MONOTONIC_FAST is about
20 times better than CLOCK_MONOTONIC.

I have tested on a few different systems and I think that the
performance increase of CLOCK_MONOTONIC_FAST is thanks to commit
60b0ad10dd0fc7ff6892ecc7ba3458482fcc064c - "vdso: lower precision of
vdso implementation of CLOCK_MONOTONIC_FAST and CLOCK_UPTIME_FAST"
that was cherry-picked to 13.1.

Try it yourself and report your results
=======================================

If you want to benchmark your server to see whether switching clock
could benefit you, you can compile and run my attached test program by
doing

	user>clang -o bench.c -o bench
	user>./bench

In case the program terminates too quickly or slowly for your liking, adjust

	const unsigned long iterations = 1000000;

up or down to change the execution time.

My supplied patches appear to work fine on my system, but aren't
really upstream appropriate since a solution that works for both
FreeBSD and Linux is needed. If you want to test them and you're
building Tor from the ports tree, drop them in
/usr/ports/security/tor/files and build and install.

I'm very interested in seeing some performance data from other people
to see whether I think it worth either pestering some Tor devs to have
a look at this or putting in some effort myself to write an
upstreamable patch.

Thank you for reading!
Cordially,
Andreas Kempe
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

const unsigned long iterations = 1000000;

int run_bench(clockid_t id)
{
	struct timespec tp_start;
	struct timespec tp;
	struct timespec tp_end;

	if (clock_gettime(CLOCK_MONOTONIC_FAST, &tp_start) == -1)
	{
		perror("Error: ");
		return 1;
	}

	for (long i = 0; i < iterations; i++)
	{
		if (clock_gettime(id, &tp) == -1)
		{
			perror("Error: ");
			return 1;
		}
	}

	if (clock_gettime(CLOCK_MONOTONIC_FAST, &tp_end) == -1)
	{
		perror("Error: ");
		return 1;
	}

	printf("%lf s\n", (double)(tp.tv_sec - tp_start.tv_sec +
		((double)tp.tv_nsec - (double)tp_start.tv_nsec)/1000000000));

	return 0;
}

int main()
{
	printf("CLOCK_MONOTONIC: ");
	if (run_bench(CLOCK_MONOTONIC))
		return 1;

	printf("CLOCK_MONOTONIC_FAST: ");
	if (run_bench(CLOCK_MONOTONIC_FAST))
		return 1;

}
--- src/lib/time/compat_time.c.orig	2022-06-20 22:28:59 UTC
+++ src/lib/time/compat_time.c
@@ -368,27 +368,27 @@ monotime_add_msec(monotime_t *out, const monotime_t *v
 /* end of "__APPLE__" */
 #elif defined(HAVE_CLOCK_GETTIME)
 
-#ifdef CLOCK_MONOTONIC_COARSE
+#ifdef CLOCK_MONOTONIC_FAST
 /**
  * Which clock should we use for coarse-grained monotonic time? By default
- * this is CLOCK_MONOTONIC_COARSE, but it might not work -- for example,
+ * this is CLOCK_MONOTONIC_FAST, but it might not work -- for example,
  * if we're compiled with newer Linux headers and then we try to run on
  * an old Linux kernel. In that case, we will fall back to CLOCK_MONOTONIC.
  */
-static int clock_monotonic_coarse = CLOCK_MONOTONIC_COARSE;
-#endif /* defined(CLOCK_MONOTONIC_COARSE) */
+static int clock_monotonic_coarse = CLOCK_MONOTONIC_FAST;
+#endif /* defined(CLOCK_MONOTONIC_FAST) */
 
 static void
 monotime_init_internal(void)
 {
-#ifdef CLOCK_MONOTONIC_COARSE
+#ifdef CLOCK_MONOTONIC_FAST
   struct timespec ts;
-  if (clock_gettime(CLOCK_MONOTONIC_COARSE, &ts) < 0) {
-    log_info(LD_GENERAL, "CLOCK_MONOTONIC_COARSE isn't working (%s); "
+  if (clock_gettime(CLOCK_MONOTONIC_FAST, &ts) < 0) {
+    log_info(LD_GENERAL, "CLOCK_MONOTONIC_FAST isn't working (%s); "
              "falling back to CLOCK_MONOTONIC.", strerror(errno));
     clock_monotonic_coarse = CLOCK_MONOTONIC;
   }
-#endif /* defined(CLOCK_MONOTONIC_COARSE) */
+#endif /* defined(CLOCK_MONOTONIC_FAST) */
 }
 
 void
@@ -405,7 +405,7 @@ monotime_get(monotime_t *out)
   tor_assert(r == 0);
 }
 
-#ifdef CLOCK_MONOTONIC_COARSE
+#ifdef CLOCK_MONOTONIC_FAST
 void
 monotime_coarse_get(monotime_coarse_t *out)
 {
@@ -419,7 +419,7 @@ monotime_coarse_get(monotime_coarse_t *out)
   int r = clock_gettime(clock_monotonic_coarse, &out->ts_);
   if (PREDICT_UNLIKELY(r < 0) &&
       errno == EINVAL &&
-      clock_monotonic_coarse == CLOCK_MONOTONIC_COARSE) {
+      clock_monotonic_coarse == CLOCK_MONOTONIC_FAST) {
     /* We should have caught this at startup in monotime_init_internal!
      */
     log_warn(LD_BUG, "Falling back to non-coarse monotonic time %s initial "
@@ -430,7 +430,7 @@ monotime_coarse_get(monotime_coarse_t *out)
 
   tor_assert(r == 0);
 }
-#endif /* defined(CLOCK_MONOTONIC_COARSE) */
+#endif /* defined(CLOCK_MONOTONIC_FAST) */
 
 int64_t
 monotime_diff_nsec(const monotime_t *start,
--- src/lib/time/compat_time.h.orig	2022-06-20 22:43:26 UTC
+++ src/lib/time/compat_time.h
@@ -172,7 +172,7 @@ typedef struct monotime_t {
 #endif /* defined(__APPLE__) || ... */
 } monotime_t;
 
-#if defined(CLOCK_MONOTONIC_COARSE) && \
+#if defined(CLOCK_MONOTONIC_FAST) && \
   defined(HAVE_CLOCK_GETTIME)
 #define MONOTIME_COARSE_FN_IS_DIFFERENT
 #define monotime_coarse_t monotime_t
@@ -188,7 +188,7 @@ typedef struct monotime_coarse_t {
 #define monotime_coarse_t monotime_t
 #else
 #define monotime_coarse_t monotime_t
-#endif /* defined(CLOCK_MONOTONIC_COARSE) && ... || ... */
+#endif /* defined(CLOCK_MONOTONIC_FAST) && ... || ... */
 
 /**
  * Initialize the timing subsystem. This function is idempotent.
_______________________________________________
tor-relays mailing list
tor-relays@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays