I've inserted a couple of corrections and clarifications below. I'm leaving original text fully quoted for the archive. Mike Perry: > Hello all, > > What follows is a summary of the primitives that Marc Juarez aims to > implement for his Google Summer of Code project on prototyping defenses > for Website Traffic Fingerprinting and follow-on research. > > After a review of Tamaraw[1], Adaptive Padding[2], CS-BuFLO[3], and the > Supersequence[4] work, as well as some discussion at the Tor dev > meeting, we came up with this list of padding message primitives. > > These functions are meant to be sent as command cells from one endpoint > to the other. Some are bidirectional commands, others can only be sent > in one direction (client to relay, or relay to client). > > For now, they will be implemented as ad-hoc messages between two > modified obfsproxy (called wfpadtools) instances. The source code for > this fork lives here for the moment: > https://bitbucket.org/mjuarezm/obfsproxy-wfpadtools/ > > Long term, I am thinking that all of these should be specified as if > they were their own RELAY_* cell type from tor-spec.txt: > https://gitweb.torproject.org/torspec.git/blob/HEAD:/tor-spec.txt#l1215 > > This means that padding would be a circuit-level property, and it would > be possible to send it to and from any hop in the circuit, due to > leaky-pipe topology (using the "recognized" field). In a world of very > cheap and excessive middle and Guard node bandwidth, we would run this > padding to the middle node. For the wfpadtools prototype, it will > obviously only cover the first hop. > > Here's the list of primitives, broken down by research paper: > > Generic Messages (common to all defenses): > RELAY_IGNORE() > Description: > Simple fixed-length (CELL_SIZE) padding cell. > Direction: > Bidirectional (client to relay, relay to client) > Parameters: > None > > RELAY_SEND_PADDING(N,t) > Description: > Send the requested number of padding cells in response. > Direction: > Client to relay > Parameters: > N: > Number of padding cells to send in response to this cell > t: > microseconds delay before sending > > RELAY_APP_HINT(session_id, status) > Description: > A hint from the application layer for session start/stop > Direction: > Client to relay > Parameters: > session_id: > A string identifying the session (ie keyed hash of > url bar domain) > status: > "Start" or "Stop", indicating session start and end. > > > Adaptive Padding Messages: > RELAY_BURST_HISTOGRAM(histogram[], labels_ms[], remove_toks, fuzz, when) > Description: > Specifies a histogram that encodes a delay distribution > representing the probability of sending a single padding packet > after a given delay in response to either an upstream cell, or a > client-originating cell. > Direction: > Client to relay > Parameters: > histogram[]: > Contains delay distribution of sending an IGNORE packet > after sending a real packet > labels_ms[]: > millisecond labels for the bins (with "Infinity"/"Ignore" bin to > allow encoding the probability of not sending any padding packet in > response to this packet). In both of the Adaptive Padding primitives, a considerably more compact form is possible for bin labeling. The original Adaptive Padding literature used a 2^K exponential spacing between these bins, but didn't rigorously analyze the choice. If experimentation finds this is indeed reasonable/optimal, we can send a single parameter W instead of a labels_ms. W would be bin width in milliseconds, and L is the length of the histogram[] array. The labels would then be computed as W*2^K with K ranging from 0 to L-2, with the addition of a zero-delay bin label. For the prototype though, I think we should allow the labeling to remain a free-form array, for maximum flexibility in research. We may decide we want some form of polynomial spacing instead of exponential here, for example. > remove_toks: > If true, follow Adaptive Padding token removal rules. > If false, histograms are immutable. > fuzz: > If true, randomize the delay uniformly between bin labels. > If false, use bin labels as exact delay values. "fuzz" here might be better called "interpolate" in both this and the below primitive, since linear interpolation is what I mean here. I suppose we could imagine smoother types of interpolation, like quadratic, cubic, etc, but I don't expect the actual interpolation mechanism to matter a whole lot. I expect it to be most useful for preventing the adversary from subtracting the padding due to its arrival at a fixed delay quantity after another packet. > when: > If set to "receive", this histogram governs the probably of sending > a padding packet after some delay in response to a packet > originating from. If set to "send", this histogram governs padding ^ | insert "the client" after that "from" > packets that are transmitted after a packet arrives from upstream > (the middle node). In both cases, the padding packet is sent in the > direction of the client. > > RELAY_GAP_HISTOGRAM(histogram[], labels_ms[], remove_toks, fuzz, when) > Description: > Specifies a histogram that encodes a delay distribution > representing the probability of sending a single additional padding > packet after a given delay following a padding packet that > originated at this hop. > Direction: > Client to relay > Parameters: > histogram[]: > Contains delay distribution of sending an IGNORE packet after sending > an IGNORE packet > labels_ms[]: > millisecond labels for the bins (with "Infinity"/"Ignore" bin to > allow encoding the probability of not sending any padding packet in > response to this packet). > remove_toks: > If true, follow Adaptive Padding token removal rules. > If false, histograms are immutable. > fuzz: > If true, randomize the delay uniformly between bin labels. > If false, use bin labels as exact delay values. > > when: > If "receive", this histogram applies to locally-inserted padding > packets that were initially sent in response to client-originated > data. If "send", this histogram applies to packets sent in response > to locally-inserted padding packets sent in response to upstream > data. Note that this means that implementations must maintain this > metadata as internal state as the system transitions from > BURST_HISTOGRAM initiated padding into GAP_HISTOGRAM initiated > padding. In both cases, the padding packet is sent in the direction > of the client. > > CS-BuFLO Messages: > RELAY_TOTAL_PAD(session_id, t): > Description: > Requests that endpoint pad all batches to nearest 2^K cells total > or until APP_HINT(session_id, stop) > Direction: > Client to relay > Parameters: > session_id: > The session ID from APP_HINT() > t: > The number of microseconds to wait between cells to consider them > part of the same batch. > > RELAY_PAYLOAD_PAD(session_id, t, R): > XXX: The paper was not clear enough for me to understand this primitive > > > Tamaraw: > RELAY_BATCH_PAD(session_id, L, t) > Description: > Requests that endpoint pad all batches of cells to L cells total > or until APP_HINT(session_id, stop) > Direction: > Client to relay > Parameters: > session_id: > The session ID from APP_HINT() > L: > The multiple of cells to pad to. > t: > The number of microseconds to wait between cells to consider them > part of the same batch. > > > > Comments, questions, and suggestions are welcome! In particular, I'm not > sure I got either the CS-Buflo or the tamaraw primitives completely > correct. > > > > > 1. http://cacr.uwaterloo.ca/techreports/2013/cacr2013-30.pdf > 2. http://freehaven.net/anonbib/cache/ShWa-Timing06.pdf > 3. http://arxiv.org/abs/1401.6022 > 4. http://cacr.uwaterloo.ca/techreports/2014/cacr2014-05.pdf > > > -- > Mike Perry > _______________________________________________ > tor-dev mailing list > tor-dev@xxxxxxxxxxxxxxxxxxxx > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev -- Mike Perry
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev