[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-relays] FreeBSD 13.1: clock_gettime(CLOCK_MONOTONIC_FAST) ~ 50 % performance gain
Hello everyone,
I was doing some profiling on my two relays running on FreeBSD 13.1
and noticed that they were spending a lot of time in clock_gettime()
which prompted me to have a look at the implementation.
Time implementation
===================
The time implementation is abstracted in src/lib/time/compat_time.c
where different mechanisms are used for different operating systems.
On Linux CLOCK_MONOTONIC_COARSE is a clock that gives worse precision
than CLOCK_MONOTONIC, but is faster and the abstraction layer checks
for its presense and provides more performat less precise time where
applicable.
On FreeBSD, there is also a fast monotonic time source available
called CLOCK_MONOTONIC_FAST. In the header file
src/lib/time/compat_time.h, a comment references this clock, but it is
not used. I thought it might be worth a shot seeing what difference it
would make if I enable the use of CLOCK_MONOTONIC_FAST on FreeBSD and
on the VM where I run my two FreeBSD relays, the difference was
stunning.
I made did a quick patch simply replacing CLOCK_MONOTONIC_COARSE with
CLOCK_MONOTONIC_FAST, see patches attached, compiled and tested.
Tracing system calls to make sure the correct call was being used,
which it was.
Results
=======
This lead to reducing the CPU usage of the patched relay by about 50 %
compared to the unpatched relay. I was a bit shocked so I wrote a
small benchmark program and ran it on my VM giving the following
results:
CLOCK_MONOTONIC: 4.776675 s
CLOCK_MONOTONIC_FAST: 0.260002 s
Showing that on my VM the performance of CLOCK_MONOTONIC_FAST is about
20 times better than CLOCK_MONOTONIC.
I have tested on a few different systems and I think that the
performance increase of CLOCK_MONOTONIC_FAST is thanks to commit
60b0ad10dd0fc7ff6892ecc7ba3458482fcc064c - "vdso: lower precision of
vdso implementation of CLOCK_MONOTONIC_FAST and CLOCK_UPTIME_FAST"
that was cherry-picked to 13.1.
Try it yourself and report your results
=======================================
If you want to benchmark your server to see whether switching clock
could benefit you, you can compile and run my attached test program by
doing
user>clang -o bench.c -o bench
user>./bench
In case the program terminates too quickly or slowly for your liking, adjust
const unsigned long iterations = 1000000;
up or down to change the execution time.
My supplied patches appear to work fine on my system, but aren't
really upstream appropriate since a solution that works for both
FreeBSD and Linux is needed. If you want to test them and you're
building Tor from the ports tree, drop them in
/usr/ports/security/tor/files and build and install.
I'm very interested in seeing some performance data from other people
to see whether I think it worth either pestering some Tor devs to have
a look at this or putting in some effort myself to write an
upstreamable patch.
Thank you for reading!
Cordially,
Andreas Kempe
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
const unsigned long iterations = 1000000;
int run_bench(clockid_t id)
{
struct timespec tp_start;
struct timespec tp;
struct timespec tp_end;
if (clock_gettime(CLOCK_MONOTONIC_FAST, &tp_start) == -1)
{
perror("Error: ");
return 1;
}
for (long i = 0; i < iterations; i++)
{
if (clock_gettime(id, &tp) == -1)
{
perror("Error: ");
return 1;
}
}
if (clock_gettime(CLOCK_MONOTONIC_FAST, &tp_end) == -1)
{
perror("Error: ");
return 1;
}
printf("%lf s\n", (double)(tp.tv_sec - tp_start.tv_sec +
((double)tp.tv_nsec - (double)tp_start.tv_nsec)/1000000000));
return 0;
}
int main()
{
printf("CLOCK_MONOTONIC: ");
if (run_bench(CLOCK_MONOTONIC))
return 1;
printf("CLOCK_MONOTONIC_FAST: ");
if (run_bench(CLOCK_MONOTONIC_FAST))
return 1;
}
--- src/lib/time/compat_time.c.orig 2022-06-20 22:28:59 UTC
+++ src/lib/time/compat_time.c
@@ -368,27 +368,27 @@ monotime_add_msec(monotime_t *out, const monotime_t *v
/* end of "__APPLE__" */
#elif defined(HAVE_CLOCK_GETTIME)
-#ifdef CLOCK_MONOTONIC_COARSE
+#ifdef CLOCK_MONOTONIC_FAST
/**
* Which clock should we use for coarse-grained monotonic time? By default
- * this is CLOCK_MONOTONIC_COARSE, but it might not work -- for example,
+ * this is CLOCK_MONOTONIC_FAST, but it might not work -- for example,
* if we're compiled with newer Linux headers and then we try to run on
* an old Linux kernel. In that case, we will fall back to CLOCK_MONOTONIC.
*/
-static int clock_monotonic_coarse = CLOCK_MONOTONIC_COARSE;
-#endif /* defined(CLOCK_MONOTONIC_COARSE) */
+static int clock_monotonic_coarse = CLOCK_MONOTONIC_FAST;
+#endif /* defined(CLOCK_MONOTONIC_FAST) */
static void
monotime_init_internal(void)
{
-#ifdef CLOCK_MONOTONIC_COARSE
+#ifdef CLOCK_MONOTONIC_FAST
struct timespec ts;
- if (clock_gettime(CLOCK_MONOTONIC_COARSE, &ts) < 0) {
- log_info(LD_GENERAL, "CLOCK_MONOTONIC_COARSE isn't working (%s); "
+ if (clock_gettime(CLOCK_MONOTONIC_FAST, &ts) < 0) {
+ log_info(LD_GENERAL, "CLOCK_MONOTONIC_FAST isn't working (%s); "
"falling back to CLOCK_MONOTONIC.", strerror(errno));
clock_monotonic_coarse = CLOCK_MONOTONIC;
}
-#endif /* defined(CLOCK_MONOTONIC_COARSE) */
+#endif /* defined(CLOCK_MONOTONIC_FAST) */
}
void
@@ -405,7 +405,7 @@ monotime_get(monotime_t *out)
tor_assert(r == 0);
}
-#ifdef CLOCK_MONOTONIC_COARSE
+#ifdef CLOCK_MONOTONIC_FAST
void
monotime_coarse_get(monotime_coarse_t *out)
{
@@ -419,7 +419,7 @@ monotime_coarse_get(monotime_coarse_t *out)
int r = clock_gettime(clock_monotonic_coarse, &out->ts_);
if (PREDICT_UNLIKELY(r < 0) &&
errno == EINVAL &&
- clock_monotonic_coarse == CLOCK_MONOTONIC_COARSE) {
+ clock_monotonic_coarse == CLOCK_MONOTONIC_FAST) {
/* We should have caught this at startup in monotime_init_internal!
*/
log_warn(LD_BUG, "Falling back to non-coarse monotonic time %s initial "
@@ -430,7 +430,7 @@ monotime_coarse_get(monotime_coarse_t *out)
tor_assert(r == 0);
}
-#endif /* defined(CLOCK_MONOTONIC_COARSE) */
+#endif /* defined(CLOCK_MONOTONIC_FAST) */
int64_t
monotime_diff_nsec(const monotime_t *start,
--- src/lib/time/compat_time.h.orig 2022-06-20 22:43:26 UTC
+++ src/lib/time/compat_time.h
@@ -172,7 +172,7 @@ typedef struct monotime_t {
#endif /* defined(__APPLE__) || ... */
} monotime_t;
-#if defined(CLOCK_MONOTONIC_COARSE) && \
+#if defined(CLOCK_MONOTONIC_FAST) && \
defined(HAVE_CLOCK_GETTIME)
#define MONOTIME_COARSE_FN_IS_DIFFERENT
#define monotime_coarse_t monotime_t
@@ -188,7 +188,7 @@ typedef struct monotime_coarse_t {
#define monotime_coarse_t monotime_t
#else
#define monotime_coarse_t monotime_t
-#endif /* defined(CLOCK_MONOTONIC_COARSE) && ... || ... */
+#endif /* defined(CLOCK_MONOTONIC_FAST) && ... || ... */
/**
* Initialize the timing subsystem. This function is idempotent.
_______________________________________________
tor-relays mailing list
tor-relays@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays