[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Writing geoip stats to disk on directories



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Nick, Roger,

I'm thinking about changing the timing of how directories write geoip
stats to disk. Right now, directories measure requests over at most 3
periods of 8 hours each (see REQUEST_HIST_LEN and REQUEST_HIST_PERIOD in
geoip.c). That means, whenever they have measured 24 hours of requests
they forget about the oldest 8 hours of requests. The directories write
these geoip stats to disk once an hour (see DUMP_GEOIP_STATS_INTERVAL in
main.c), regardless of when request periods start or end. More
precisely, directories overwrite the local geoip-stats file every hour.
I also found a config option DirRecordUsageSaveInterval which should say
how often geoip data is flushed to disk and which defaults to 6 hours,
but which is not used in the code. I think we should improve the timing
of writing geoip stats to disk.

My first idea is to synchronize request history periods with writing
down stats. This basically means writing down stats only when periods
end. The main reason is that we should ensure that only requests are
written to disk that have been measured over exactly 24 hours. Writing
down stats earlier might be problematic from an anonymity point of view.
And after a restart we don't pick up these values anyway. Longer times
(or in general different times than 24 hours) would complicate the
analysis to a certain extent. In terms of code that means dropping
DUMP_GEOIP_STATS_INTERVAL in main.c and dumping stats whenever we change
the request period. Also, stats should be appended to the geoip-stats
file rather than replacing that file.

My next thought is whether or not we want to make the period length
configurable. From earlier measurements I found that the period length
of 8 hours (as defined in REQUEST_HIST_PERIOD) works fine. Also,
configurable period lengths might complicate analysis, too. If we want
to make the period length configurable, we should define a lower limit
of, say, 2 hours. Otherwise, people could compare subsequent
observations to learn more details about requests. Possible values for
period lengths would then be 2, 3, 4, 6, 8, 12, or 24 hours. But again,
do we need to make this configurable? And if so, should we use the
config option DirRecordUsageSaveInterval instead of REQUEST_HIST_PERIOD?

Does this all make sense to you? If so, I'd prepare a patch to make the
described changes. Or did I just misunderstand your intentions when
adding this functionality.

The next step would be to add the geoip-stats lines (or a subset of
them) to extra-info documents. I think a proposal for that would be in
place, but first I think it's fine to start with measuring on a few
nodes and working with files.

Thanks!
- --Karsten
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkobEYUACgkQ0M+WPffBEmXzhQCgxWIVrMQqBpSEkAYGXr4TkYSJ
4csAn0O8tlg5TKgZua7HOS8Sy0ptcsCE
=0DXG
-----END PGP SIGNATURE-----