[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Proposal: GETINFO controller option for connection information



Yesterday Jake met with me to discuss this proposal, making the very
good points that both:
  1. It's completely ineffectual for the auditing purposes I've
mentioned since either (a) these results can be fetched from netstat
already or (b) the information would only be provided via tor and
can't be validated.
  2. The things I'm really interested in can be fetched with much less
(and safer) information.

In particular we discussed making the proposal circuit based rather
than connection based, being something like the following:

  "circ/<Circuit identity>" -- Provides entry for the associated circuit,
    formatted as:
      CIRC_ID IN_TYPE OUT_TYPE READ WRITE UPTIME

    none of the parameters contain whitespace, and additional results must be
    ignored to allow for future expansion. Parameters are defined as follows:
      CIRC_ID - Unique identifier for the circuit this belongs to.
      IN_TYPE/OUT_TYPE - Single character flags indicating the purpose of the
        inbound or outbound connection. If no connection is established then
        this provides an empty string. Otherwise, it consists of one from each
        of the following categories (this may become longer in future
        expansion):
          Usage Type:
            C: client traffic, R: relaying traffic,
            X: control, H: hidden service, D: directory
          Destination:
            I: inter-tor connection, O: outside the tor network, L: localhost
        For instance, "RO" would indicate that this was an established
        1st-hop (or bridged) relay connection.
      READ/WRITE - Total bytes read/written over the life of this connection.
      UPTIME - Time the connection's been established in seconds.

  "circ/all" -- Newline separated listing of all current circuits.

This would be almost just as useful for the purposes I'm interested in
while also stripping the most sensitive data entirely (ip addresses,
ports, and connection based bandwidth breakdowns). In particularly
this information could still address the following:

- Basic Relay Usage Questions
How is the bandwidth I'm contributing broken down? Is it being evenly
distributed or is someone hogging most of it? Do these circuits belong
to the hidden service I'm running or something else? Now that I'm
using exit policy X am I desirable as an exit, or are most people just
using me as a relay?

- Debugging
Say a relay has a restrictive firewall policy for outbound
connections, with the ORPort whitelisted but doesn't realize that tor
needs random high ports. Tor would report success ("your orport is
reachable - excellent") yet the relay would be nonfunctional. This
proposed information would reveal numerous RELAY -> YOU ->
UNESTABLISHED circuits, giving a good indicator of what's wrong.

- Visualization
This would still yield the benefits mentioned in the last proposal of
helping to demystify behavior the operator isn't expecting (see the
client example from before).

----------------------------------------

Second, Jake made a great point that at present if a malicious party
gets ahold of the control port then the relay's quite effectively
screwed. The current capabilities of the control port are overkill for
many controllers (like arm) which are just interested in retrieving
information from tor (GETINFO options, event listening, etc). To make
the control port safer we could include a torrc option that makes the
control port read-only...

  SafeControlPort 0|1
    Restricts access of the control port to only include read-only operations.
    (Default: 0)

Making this the default would be a no-go due to vidalia (though still
a nice option to have...). If this is implemented its setting should
be part of the PROTOCOLINFO response.

----------------------------------------

Finally, the other proposed GETINFO options still seem useful (with
the possible exception of "info/uptime-reset"), and could be improved
with the addition of:

  "info/user" -- User under which the tor process is running, providing an
    empty string if none exists.

  "info/pid" -- Process id belonging to the tor process, -1 if none exists for
    the platform.

* this one is both useful and surprisingly difficult for me to
retrieve at present (arm attempts to get it from pidof, ps, and
netstat yet still fails on some systems...)

In addition Jake mentioned the possibility of making info/* options
for all limits and capabilities (though I'd hold off until we have use
cases needing them...) and the following entries for getting activity
snapshots:

  "info/relay/[read, write]/avg/[1, 5, 15]" -- Provides the average traffic
    (bytes read or written per second) over the last 1, 5, or 15 minutes.

  "info/relay/circ/avg/[1, 5, 15]" -- Provides the average number of circuits
    established in the last 1, 5, or 15 minutes.

Cheers! -Damian

On Wed, Apr 14, 2010 at 9:16 AM, Damian Johnson <atagar1@xxxxxxxxx> wrote:
> Time to take the defibrillator paddles to this proposal once again. As per
> Nick's request this is a bit more focused on the motivation for getting
> connection related information. The proposed use cases are just some naive
> examples I've come up with. If anyone with a stronger security background
> (which wouldn't take much...) has the time I'd love comments like "WTF?!?
> This idiot's looking for the completely wrong things! This is obviously
> worthless if he doesn't look for X."
>
> Also, could we move forward on the other (less controversial) items? For
> instance, bandwidth totals tend to be a very highly requested piece of
> information and pipe's already provided a nice patch to get it
> (http://www.mail-archive.com/or-talk@xxxxxxxxxxxxx/msg13085.html). For
> reference, here's the not-so-controversial GETINFO options I proposed:
>
>   "info/relay/bw-limit" -- Effective relayed bandwidth limit (currently
>     RelayBandwidthRate if set, otherwise BandwidthRate).
>
>   "info/relay/burst-limit" -- Effective relayed burst limit.
>
>   "info/relay/read-total" -- Total bytes relayed (download).
>
>   "info/relay/write-total" -- Total bytes relayed (upload).
>
>   "info/uptime-process" -- Total uptime of the tor process (in seconds).
>
>   "info/uptime-reset" -- Time since last reset (startup or sighup signal, in
>     seconds).
>
>   "info/descriptor-used" -- Count of file descriptors used.
>
>   "info/descriptor-limit" -- File descriptor limit (getrlimit results).
>
>   "ns/authority" -- Router status info (v2 directory style) for all
>     recognized directory authorities, joined by newlines.
>
> I'm not planning on converting the following to the customary 80-character
> width until it's at least past being a first draft for a couple reasons:
>   1. I find editing fixed-width documents to be a time consuming pain in the
> ass.
>   2. I've yet to hear why we do this. Is it just to cater to mail clients
> too dumb to know how to line wrap?
>
> that said, keeping my fingers crossed that this starts going somewhere!
> -Damian
>
> PS. For previous discussions of this proposal see:
> http://marc.info/?t=126101683100002&r=1&w=1
>
> ----------------------------------------
>
> Filename: xxx-connection-getinfo-option.txt
> Title: GETINFO controller option for connection information
> Author: Damian Johnson
> Created: 14-Apr-2010
> Status: Draft
>
> Overview:
>
>     This details an additional GETINFO option for tor controllers that would
> provide information concerning a relay's current connections.
>
> Motivation:
>
>     All Internet facing applications (tor included) are possible vectors for
> attack on the operator's system. With hundreds of connections to relatively
> unknown destinations tor is already the bane of any network based IDS, and
> unless tor can be proved infallible and bug free (which would be quite a
> feat!) it cannot be blindly trusted.
>
>     While it is impossible to guard against every potential future
> vulnerability, controllers can attempt to mitigate this threat by both
> auditing tor's behavior and providing indicator of its activity to savvy
> users. Connection related information is a useful tool for both of these
> purposes.
>
>     In terms of auditing, the following are some conditions controllers can
> check for with connection information:
>       - Persistent unestablished circuits. For instance a circuit has an
> outbound connection without a corresponding inbound counterpart. If such a
> connection was active (had substantial traffic) this would be troubling
> enough to alert the user.
>       - Relatively asymmetric traffic on circuits. Ie, if the controller
> sees 10 kb/s inbound on a circuit and 5 mb/s outbound this could be a good
> indicator that someone's using tor to issue a dos, fetch data from the local
> system, etc.
>       - Any connections to the local network when ExitPolicyRejectPrivate is
> set, indicating that tor's being used to proxy connections to the local lan.
>       - Peculiar patterns of connections, for instance numerous outbound
> connections to a single IP, or if 99% of all bandwidth belonging to a single
> circuit.
>       - Scrubbed connection data limits our ability to check for obedience
> to the exit policy, but for strictly non-exit relays we can still alert the
> user if any non-relay outbound connections occur.
>
>     Of course if we're working from the assumption that tor has been
> compromised, then the information provided from the control port cannot be
> blindly trusted. Hence connection data should be validateable against the
> system's connection querying utilities (netstat, ss, lsof, etc - which are
> more likely to be under a host based IDS, if present). This requires that
> the system's been completely compromised (elevated permissions) before
> controllers can be tricked, rather than just tor.
>
>     While automated detection is handy for detecting known behavior that
> might indicate issues, visualization gives us the possibility of finding
> much more thanks to our tinfoil hat wearing user base. A clear display of
> tor's current behavior gives assurance that tor's functioning as it should,
> plus a level of transparency desirable from anyone with even the slightest
> bit of paranoia. Tor is a guest process in the system of relay operators and
> we should not hide what it does without legitimate reason.
>
>     Another (albeit unintended) benefit of visualizing tor's behavior is
> that it becomes a helpful tool in puzzling out how tor works. For instance,
> tor spawns numerous client connections at startup (even if unused as a
> client). As a newcomer to tor these asymmetric (outbound only) connections
> mystified me for quite a while until until Roger explained their use to me.
> The proposed TYPE_FLAGS would let controllers clearly label them as being
> client related, making their purpose a bit clearer.
>
>     At the moment connection data can only be retrieved via commands like
> netstat, ss, and lsof. However, fetching it via the control port provides
> several advantages:
>
>       - scrubbing for private data
>           Raw connection data has no notion of what's sensitive and what is
> not. The relay's flags and cached consensus can be used to take educated
> guesses concerning which connections could possibly belong to client or exit
> traffic, but this is both difficult and inaccurate.
>
>       - additional information
>           All connection querying commands strictly provide the ip address
> and port of connections, and nothing else. However, for auditing and
> visualization the far more interesting attributes are the connection's
> bandwidth usage, uptime, and the circuit to which it belongs.
>
>       - improved performance
>           Querying connection data is an expensive activity, especially for
> busy relays or low end processors (such as mobile devices). Tor already
> internally knows its circuits and connections, allowing for vastly quicker
> lookups.
>
>       - cross platform capability
>           The connection querying utilities mentioned above not only aren't
> available under Windows, but differ widely among different *nix platforms.
> FreeBSD in particular takes a very unique approach, dropping important
> options from netstat and assigning ss to a spreadsheet application instead.
> A controller interface, however, would provide a uniform means of retrieving
> this information.
>
> Security Implications:
>
>     The original version of this proposal left the responsibility of
> scrubbing connection data with client applications (vidalia, arm, etc).
> However, this was deemed unacceptable by Sebastian and Nick in previous
> discussions. The proposal now includes dropping the ip address/port of
> client and exit connections from the controller's response. That said, I
> think it's a mistake to drop those connections entirely since some of their
> attributes *are* of legitimate usefulness:
>
>     - Existence
>       At the very least it'd be nice if Tor indicated their existence (ie,
> I'd say "yea, an exit connection exists on this circuit but we won't tell
> you where it goes."). This would be useful, for instance, if the relay
> operator has misconfigured their firewall to block some of the outbound
> ports permitted by their exit policy (arm would show this as RELAY -> YOU ->
> UNESTABLISHED, and provide a warning to indicate the issue).
>
>     - Bandwidth
>       For auditing the most interesting attribute of connections, imho, is
> the bandwidth. If, says 10 KB/s is coming in and 1 MB/s is going out on a
> circuit that's a good indicator that something is *very* wrong (I'd start
> suspecting a security issue, personally). If we rounded all bandwidth
> measurements (say, to the nearest KB) would this be sufficient to prevent
> entry/exits from correlating this data to attack anonymity?
>
>     - Uptime
>       If connections are being cycled abnormally quickly (say, all
> connection longevity is under thirty seconds) this could indicate the ISP
> (or other middlemen like the great firewall) are sending reset packets to
> kill the relay's attempts to make exit connections.
>
> Specification:
>
>    The following addition would be made to the control-spec's GETINFO
> section:
>
>   "conn/<Circuit identity>/<Connection identity>" -- Provides entry for the
>     associated connection, formatted as:
>       CONN_ID CIRC_ID OR_ID IP PORT L_PORT TYPE_FLAGS READ WRITE UPTIME
>
>     none of the parameters contain whitespace, and additional results must
> be
>     ignored to allow for future expansion. Parameters are defined as
> follows:
>       CONN_ID - Unique identifier associated with this connection.
>       CIRC_ID - Unique identifier for the circuit this belongs to (0 if this
>         doesn't belong to any circuit). At most their may be two connections
>         (one inbound, one outbound) with any given CIRC_ID except in the
> case
>         of exit connections.
>       OR_ID - Relay fingerprint, 0 if connection doesn't belong to a relay.
>       IP/PORT - IP address and port used by the associated connection, 0 if
>         connection is used for relaying client or exit traffic.
>       L_PORT - Local port used by the connection, 0 if connection is used
> for
>         relaying client or exit traffic.
>       TYPE_FLAGS - Single character flags indicating directionality and type
>         of the connection (consists of one from each category, may become
>         longer for future expansion).
>           Connection Directionality:
>             I: inbound, i: listening (unestablished inbound),
>             O: outbound, o: unestablished outbound
>           Usage Type:
>             C: client traffic, R: relaying traffic,
>             X: control, H: hidden service, D: directory
>           Destination:
>             T: inter-tor connection, t: outside the tor network
>         For instance, "IRt" would indicate that this was an established
>         1st-hop (or bridged) relay connection.
>       READ/WRITE - Total bytes read/written over the life of this
> connection.
>       UPTIME - Time the connection's been established in seconds.
>
>   "conn/all" -- Newline separated listing of all current connections.
>
>