Time to take the defibrillator paddles to this proposal once again. As
per Nick's request this is a bit more focused on the motivation for
getting connection related information. The proposed use cases are
just some naive examples I've come up with. If anyone with a stronger
security background (which wouldn't take much...) has the time I'd
love comments like "WTF?!? This idiot's looking for the completely
wrong things! This is obviously worthless if he doesn't look for X."
Also, could we move forward on the other (less controversial) items?
For instance, bandwidth totals tend to be a very highly requested
piece of information and pipe's already provided a nice patch to get
it (http://www.mail-archive.com/or-talk@xxxxxxxxxxxxx/msg13085.html).
For reference, here's the not-so-controversial GETINFO options I proposed:
"info/relay/bw-limit" -- Effective relayed bandwidth limit (currently
RelayBandwidthRate if set, otherwise BandwidthRate).
"info/relay/burst-limit" -- Effective relayed burst limit.
"info/relay/read-total" -- Total bytes relayed (download).
"info/relay/write-total" -- Total bytes relayed (upload).
"info/uptime-process" -- Total uptime of the tor process (in seconds).
"info/uptime-reset" -- Time since last reset (startup or sighup
signal, in
seconds).
"info/descriptor-used" -- Count of file descriptors used.
"info/descriptor-limit" -- File descriptor limit (getrlimit results).
"ns/authority" -- Router status info (v2 directory style) for all
recognized directory authorities, joined by newlines.
I'm not planning on converting the following to the customary
80-character width until it's at least past being a first draft for a
couple reasons:
1. I find editing fixed-width documents to be a time consuming pain
in the ass.
2. I've yet to hear why we do this. Is it just to cater to mail
clients too dumb to know how to line wrap?
that said, keeping my fingers crossed that this starts going
somewhere! -Damian
PS. For previous discussions of this proposal see:
http://marc.info/?t=126101683100002&r=1&w=1
<http://marc.info/?t=126101683100002&r=1&w=1>
----------------------------------------
Filename: xxx-connection-getinfo-option.txt
Title: GETINFO controller option for connection information
Author: Damian Johnson
Created: 14-Apr-2010
Status: Draft
Overview:
This details an additional GETINFO option for tor controllers that
would provide information concerning a relay's current connections.
Motivation:
All Internet facing applications (tor included) are possible
vectors for attack on the operator's system. With hundreds of
connections to relatively unknown destinations tor is already the bane
of any network based IDS, and unless tor can be proved infallible and
bug free (which would be quite a feat!) it cannot be blindly trusted.
While it is impossible to guard against every potential future
vulnerability, controllers can attempt to mitigate this threat by both
auditing tor's behavior and providing indicator of its activity to
savvy users. Connection related information is a useful tool for both
of these purposes.
In terms of auditing, the following are some conditions
controllers can check for with connection information:
- Persistent unestablished circuits. For instance a circuit has
an outbound connection without a corresponding inbound counterpart. If
such a connection was active (had substantial traffic) this would be
troubling enough to alert the user.
- Relatively asymmetric traffic on circuits. Ie, if the
controller sees 10 kb/s inbound on a circuit and 5 mb/s outbound this
could be a good indicator that someone's using tor to issue a dos,
fetch data from the local system, etc.
- Any connections to the local network when
ExitPolicyRejectPrivate is set, indicating that tor's being used to
proxy connections to the local lan.
- Peculiar patterns of connections, for instance numerous
outbound connections to a single IP, or if 99% of all bandwidth
belonging to a single circuit.
- Scrubbed connection data limits our ability to check for
obedience to the exit policy, but for strictly non-exit relays we can
still alert the user if any non-relay outbound connections occur.
Of course if we're working from the assumption that tor has been
compromised, then the information provided from the control port
cannot be blindly trusted. Hence connection data should be
validateable against the system's connection querying utilities
(netstat, ss, lsof, etc - which are more likely to be under a host
based IDS, if present). This requires that the system's been
completely compromised (elevated permissions) before controllers can
be tricked, rather than just tor.
While automated detection is handy for detecting known behavior
that might indicate issues, visualization gives us the possibility of
finding much more thanks to our tinfoil hat wearing user base. A clear
display of tor's current behavior gives assurance that tor's
functioning as it should, plus a level of transparency desirable from
anyone with even the slightest bit of paranoia. Tor is a guest process
in the system of relay operators and we should not hide what it does
without legitimate reason.
Another (albeit unintended) benefit of visualizing tor's behavior
is that it becomes a helpful tool in puzzling out how tor works. For
instance, tor spawns numerous client connections at startup (even if
unused as a client). As a newcomer to tor these asymmetric (outbound
only) connections mystified me for quite a while until until Roger
explained their use to me. The proposed TYPE_FLAGS would let
controllers clearly label them as being client related, making their
purpose a bit clearer.
At the moment connection data can only be retrieved via commands
like netstat, ss, and lsof. However, fetching it via the control port
provides several advantages:
- scrubbing for private data
Raw connection data has no notion of what's sensitive and
what is not. The relay's flags and cached consensus can be used to
take educated guesses concerning which connections could possibly
belong to client or exit traffic, but this is both difficult and
inaccurate.
- additional information
All connection querying commands strictly provide the ip
address and port of connections, and nothing else. However, for
auditing and visualization the far more interesting attributes are the
connection's bandwidth usage, uptime, and the circuit to which it belongs.
- improved performance
Querying connection data is an expensive activity,
especially for busy relays or low end processors (such as mobile
devices). Tor already internally knows its circuits and connections,
allowing for vastly quicker lookups.
- cross platform capability
The connection querying utilities mentioned above not only
aren't available under Windows, but differ widely among different *nix
platforms. FreeBSD in particular takes a very unique approach,
dropping important options from netstat and assigning ss to a
spreadsheet application instead. A controller interface, however,
would provide a uniform means of retrieving this information.
Security Implications:
The original version of this proposal left the responsibility of
scrubbing connection data with client applications (vidalia, arm,
etc). However, this was deemed unacceptable by Sebastian and Nick in
previous discussions. The proposal now includes dropping the ip
address/port of client and exit connections from the controller's
response. That said, I think it's a mistake to drop those connections
entirely since some of their attributes *are* of legitimate usefulness:
- Existence
At the very least it'd be nice if Tor indicated their existence
(ie, I'd say "yea, an exit connection exists on this circuit but we
won't tell you where it goes."). This would be useful, for instance,
if the relay operator has misconfigured their firewall to block some
of the outbound ports permitted by their exit policy (arm would show
this as RELAY -> YOU -> UNESTABLISHED, and provide a warning to
indicate the issue).
- Bandwidth
For auditing the most interesting attribute of connections,
imho, is the bandwidth. If, says 10 KB/s is coming in and 1 MB/s is
going out on a circuit that's a good indicator that something is
*very* wrong (I'd start suspecting a security issue, personally). If
we rounded all bandwidth measurements (say, to the nearest KB) would
this be sufficient to prevent entry/exits from correlating this data
to attack anonymity?
- Uptime
If connections are being cycled abnormally quickly (say, all
connection longevity is under thirty seconds) this could indicate the
ISP (or other middlemen like the great firewall) are sending reset
packets to kill the relay's attempts to make exit connections.
Specification:
The following addition would be made to the control-spec's GETINFO
section:
"conn/<Circuit identity>/<Connection identity>" -- Provides entry
for the
associated connection, formatted as:
CONN_ID CIRC_ID OR_ID IP PORT L_PORT TYPE_FLAGS READ WRITE UPTIME
none of the parameters contain whitespace, and additional results
must be
ignored to allow for future expansion. Parameters are defined as
follows:
CONN_ID - Unique identifier associated with this connection.
CIRC_ID - Unique identifier for the circuit this belongs to (0
if this
doesn't belong to any circuit). At most their may be two
connections
(one inbound, one outbound) with any given CIRC_ID except in
the case
of exit connections.
OR_ID - Relay fingerprint, 0 if connection doesn't belong to a
relay.
IP/PORT - IP address and port used by the associated connection,
0 if
connection is used for relaying client or exit traffic.
L_PORT - Local port used by the connection, 0 if connection is
used for
relaying client or exit traffic.
TYPE_FLAGS - Single character flags indicating directionality
and type
of the connection (consists of one from each category, may become
longer for future expansion).
Connection Directionality:
I: inbound, i: listening (unestablished inbound),
O: outbound, o: unestablished outbound
Usage Type:
C: client traffic, R: relaying traffic,
X: control, H: hidden service, D: directory
Destination:
T: inter-tor connection, t: outside the tor network
For instance, "IRt" would indicate that this was an established
1st-hop (or bridged) relay connection.
READ/WRITE - Total bytes read/written over the life of this
connection.
UPTIME - Time the connection's been established in seconds.
"conn/all" -- Newline separated listing of all current connections.