Re: Proposal: GETINFO controller option for connection information

opened a ticket

On Tue, Jun 29, 2010 at 8:01 AM, Nick Mathewson <nickm@xxxxxxxxxxxxx> wrote:

On Mon, Jun 28, 2010 at 6:08 PM, Paul Syverson
<syverson@xxxxxxxxxxxxxxxx> wrote:
> On Mon, Jun 28, 2010 at 05:59:07PM -0400, Nick Mathewson wrote:
>> On Thu, Jun 24, 2010 at 1:34 AM, Damian Johnson <atagar1@xxxxxxxxx> wrote:
>> > Hi Nick. Thanks for the comments!
>> >
>> >> * IN_TYPE/OUT_TYPE talk about the type of an inbound/outbound
>> >> "connection." Do you mean circuits, or connections on the circuits?
>> >> Either way I'm confused. For example, a control connection is never
>> >> attached to a circuit at all.
>> >
>> > Yea, that isn't really appropriate and was making the spec messier than it
>> > needed to be. Replaced with a single TYPE parameter to indicate the
>> > placement in the circuit (guard/bridge, relay, exit, or one-hop in case
>> > they're allowing them).
>>
>> Hm. But we don't necessarily know this. Our "are we client-facing"
>> tests are approximate, not certain, and the only way to tell whether
>> we're intermediate or exiting is to wait and see if we're told to
>> exit. In fact, the leaky-pipe topology means that we're potentially
>> intermediate _and_ exiting on a single circuit.
>
> Wah. I know I'm well out of the development loop, but is leaky-pipe
> topology ever currently used and if so for what?

Well, I said "potentially". ;) The servers support it, but I don't
believe we use it. If we did, it would probably be for fetching
directory info from a guard that happens also to be a cache, or
something like that.

--
Nick

Filename: xxx-circ-getinfo-option.txt Title: GETINFO controller option for circuit information Author: Damian Johnson Created: 03-June-2010 Status: Draft Overview: This details an additional GETINFO option that would provide information concerning a relay's current circuits. Motivation: The original proposal was for connection related information, but Jake make the excellent point that any information retrieved from the control port is... 1. completely ineffectual for auditing purposes since either (a) these results can be fetched from netstat already or (b) the information would only be provided via tor and can't be validated. 2. The more useful uses for connection information can be achieved with much less (and safer) information. Hence the proposal is now for circuit based rather than connection based information. This would strip the most controversial and sensitive data entirely (ip addresses, ports, and connection based bandwidth breakdowns) while still being useful for the following purposes: - Basic Relay Usage Questions How is the bandwidth I'm contributing broken down? Is it being evenly distributed or is someone hogging most of it? Do these circuits belong to the hidden service I'm running or something else? Now that I'm using exit policy X am I desirable as an exit, or are most people just using me as a relay? - Debugging Say a relay has a restrictive firewall policy for outbound connections, with the ORPort whitelisted but doesn't realize that tor needs random high ports. Tor would report success ("your orport is reachable - excellent") yet the relay would be nonfunctional. This proposed information would reveal numerous RELAY -> YOU -> UNESTABLISHED circuits, giving a good indicator of what's wrong. - Visualization A nice benefit of visualizing tor's behavior is that it becomes a helpful tool in puzzling out how tor works. For instance, tor spawns numerous client connections at startup (even if unused as a client). As a newcomer to tor these asymmetric (outbound only) connections mystified me for quite a while until until Roger explained their use to me. The proposed TYPE_FLAGS would let controllers clearly label them as being client related, making their purpose a bit clearer. At the moment connection data can only be retrieved via commands like netstat, ss, and lsof. However, providing an alternative via the control port provides several advantages: - scrubbing for private data Raw connection data has no notion of what's sensitive and what is not. The relay's flags and cached consensus can be used to take educated guesses concerning which connections could possibly belong to client or exit traffic, but this is both difficult and inaccurate. Anything provided via the control port can scrubbed to make sure we aren't providing anything we think relay operators should not see. - additional information All connection querying commands strictly provide the ip address and port of connections, and nothing else. However, for the uses listed above the far more interesting attributes are the circuit's type, bandwidth usage and uptime. - improved performance Querying connection data is an expensive activity, especially for busy relays or low end processors (such as mobile devices). Tor already internally knows its circuits, allowing for vastly quicker lookups. - cross platform capability The connection querying utilities mentioned above not only aren't available under Windows, but differ widely among different *nix platforms. FreeBSD in particular takes a very unique approach, dropping important options from netstat and assigning ss to a spreadsheet application instead. A controller interface, however, would provide a uniform means of retrieving this information. Security Implications: This is an open question. This proposal lacks the most controversial pieces of information (ip addresses and ports) and insight into potential threats this would pose would be very welcomed! Specification: The following addition would be made to the control-spec's GETINFO section: "rcirc/id/<Circuit identity>" -- Provides entry for the associated relay circuit, formatted as: CIRC_ID=<circuit ID> CREATED=<timestamp> UPDATED=<timestamp> TYPE=<flag> READ=<bytes> WRITE=<bytes> none of the parameters contain whitespace, and additional results must be ignored to allow for future expansion. Parameters are defined as follows: CIRC_ID - Unique numeric identifier for the circuit this belongs to. CREATED - Unix timestamp (as seconds since the Epoch) for when the circuit was created. UPDATED - Unix timestamp for when this information was last updated. TYPE - Single character flag indicating the positioning in the circuit: C: client facing (first hop / bridge) M: intermediate E: exiting B: both client facing and exiting READ - Total bytes transmitted toward the exit over the circuit. WRITE - Total bytes transmitted toward the client over the circuit. "rcirc/all" -- The 'rcirc/id/*' output for all current circuits, joined by newlines. The following would be included for circ info update events. 4.1.X. Relay circuit status changed The syntax is: "650" SP "RCIRC" SP CircID SP Notice [SP Created SP Updated SP Type SP Read SP Write] CRLF Notice = "NEW" / ; first information being provided for this circuit "UPDATE" / ; update for a previously reported circuit "CLOSED" ; notice that the circuit no longer exists Notice indicating that queryable information on a relay related circuit has changed. If the Notice parameter is either "NEW" or "UPDATE" then this provides the same fields that would be given by calling "GETINFO rcirc/id/" with the CircID.

Filename: xxx-getinfo-option-expansion.txt Title: GETINFO Option Expansion Author: Damian Johnson Created: 02-June-2010 Status: Draft Overview: Over the course of developing arm there's been numerous hacks and workarounds to gleam pieces of basic, desirable information about the tor process. As per Roger's request I've compiled a list of these pain points to try and improve the control protocol interface. Motivation: The purpose of this proposal is to expose additional process and relay related information that is currently unavailable in a convenient, dependable, and/or platform independent way. Examples of this are... - The relay's total contributed bandwidth. This is a highly requested piece of information and, based on the following patch from pipe, looks trivial to include. http://www.mail-archive.com/or-talk@xxxxxxxxxxxxx/msg13085.html - The process ID of the tor process. There is a high degree of guess work in obtaining this. Arm for instance uses pidof, netstat, and ps yet still fails on some platforms, and Orbot recently got a ticket about its own attempt to fetch it with ps: https://trac.torproject.org/projects/tor/ticket/1388 This just includes the pieces of missing information I've noticed (suggestions or questions of their usefulness are welcome!). Security Implications: None that I'm aware of. From a security standpoint this seems decently innocuous. Specification: The following addition would be made to the control-spec's GETINFO section: "relay/bw-limit" -- Effective relayed bandwidth limit. "relay/burst-limit" -- Effective relayed burst limit. "relay/read-total" -- Total bytes relayed (download). "relay/write-total" -- Total bytes relayed (upload). "relay/flags" -- Space separated listing of flags currently held by the relay as repored by the currently cached consensus. "process/user" -- Username under which the tor process is running, providing an empty string if none exists. "process/pid" -- Process id belonging to the main tor process, -1 if none exists for the platform. "process/uptime" -- Total uptime of the tor process (in seconds). "process/uptime-reset" -- Time since last reset (startup, sighup, or RELOAD signal, in seconds). "process/descriptors-used" -- Count of file descriptors used. "process/descriptor-limit" -- File descriptor limit (getrlimit results). "ns/authority" -- Router status info (v2 directory style) for all recognized directory authorities, joined by newlines. "state/names" -- A space-separated list of all the keys supported by this version of Tor's state. "state/val/<key>" -- Provides the current state value belonging to the given key. If undefined, this provides the key's default value.