[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #5023 [Pluggable transport]: Morpher pluggable transport: Select algorithm for packet size morphing
#5023: Morpher pluggable transport: Select algorithm for packet size morphing
---------------------------------+------------------------------------------
Reporter: asn | Owner: asn
Type: task | Status: new
Priority: normal | Milestone:
Component: Pluggable transport | Version:
Keywords: | Parent: #4680
Points: | Actualpoints:
---------------------------------+------------------------------------------
Comment(by asn):
(Let's rename 'random sampling' to 'direct sampling' from now on:)
The attachments in #4680 spawn at least two questions:
a) It seems that in the Server->Client case
(attachment:500000_sc.png:ticket:4680) 'traffic morphing' actually
delivers. It results in 1/4 of the overhead of the 'direct sampling'
method.
The (not-so-)funny thing is that in the Client->Server case
(attachment:500000_cs.png:ticket:4680), 'traffic morphing' actually causes
more overhead than 'direct sampling' does!
My guess is that this happens because in the C->S case, HTTPS has large
probabilities in small packet sizes (attachment:https_cs.png:ticket:4680),
so the morphing matrix tries to split into a small packet, and then the
second half is padded to 1460. Example from the logs
(attachment:cs_log.txt:ticket:4680):
{{{
sampling: Got packet size 586. We must morph it to 5. Splitting to 581 and
sending the first part...
sampling: Got packet size 581. We must morph it to 1459. Padding with 878
and sending.
morphing: Got packet size 586. We must morph it to 127. Splitting to 459
and sending the first part...
morphing: Got packet size 459. We must morph it to 123. Splitting to 336
and sending the first part...
morphing: Got packet size 336. We must morph it to 1459. Padding with 1123
and sending.
264: OVERHEAD ROUND SUMMARY: Sampling: 928 : Morphing: 1223
264: OVERHEAD ROUND SUMMARY: Morpher lost (295)
}}}
Morpher gets a packet of 586 bytes to morph, splits it to 127+459. Sends
127 bytes, splits the 459 part to 123+336. Sends 123 bytes, and pads the
336 part to 1459. This results in an overhead of 1223 bytes. Direct
sampling was better due to randomness.
On the other hand, in the S->C case, Tor seems to have good-enough
probabilities of outputting small packets
(attachment:tor_sc.png:ticket:4680) [0] which fit nicely with the good-
enough probabilities of HTTPS of outputting small packets
(attachment:https_sc.png:ticket:4680) and the results are:
{{{
sampling: Got packet size 65. We must morph it to 1413. Padding with 1348
and sending.
morphing: Got packet size 65. We must morph it to 79. Padding with 14 and
sending.
5: OVERHEAD ROUND SUMMARY: Sampling: 1348 : Morphing: 14
5: OVERHEAD ROUND SUMMARY: Morpher won (1334)
}}}
{{{
sampling: Got packet size 11. We must morph it to 1379. Padding with 1368
and sending.
morphing: Got packet size 11. We must morph it to 53. Padding with 42 and
sending.
6: OVERHEAD ROUND SUMMARY: Sampling: 1368 : Morphing: 42
6: OVERHEAD ROUND SUMMARY: Morpher won (1326)
}}}
so in such cases, 'traffic morphing' beats stupid 'direct sampling'
easily.
And even when the original packet size is large, something like this
happens:
{{{
sampling: Got packet size 1460. We must morph it to 1379. Splitting to 81
and sending the first part...
sampling: Got packet size 81. We must morph it to 894. Padding with 813
and sending.
morphing: Got packet size 1460. We must morph it to 1414. Splitting to 46
and sending the first part...
morphing: Got packet size 46. We must morph it to 81. Padding with 35 and
sending.
7: OVERHEAD ROUND SUMMARY: Sampling: 863 : Morphing: 85
7: OVERHEAD ROUND SUMMARY: Morpher won (778)
}}}
So the question is, what do we do? Should we use morphing matrices S->C,
and direct sampling in C->S? Or use direct sampling in both cases?
----
b) The other question is, even if we select the minimum overhead in each
case, are we happy with this kind of overhead? Looking at the plot of 500
packets (attachment:500_cs.png:ticket:4680) in the C->S case, we get
approx. 70k bytes of overhead in 100 packets [1]. In the S->C case
(attachment:500_sc.png:ticket:4680) we see approx. 100k bytes overhead
using direct sampling, and 70k bytes of overhead using morphing matrices,
in 100 packets.
Are we happy with this?
Some things to note:
a) The morphing matrices might be wrong. I'm the only one who has reviewed
morpher or the matrices.
b) The probability distributions might be biased. We know that they were
captured in Chicago, but maybe there is lots of other traffic in there
apart from Tor (it was captured by TCP port). Also see [0].
c) The tests/graphs might be wrong. I wrote the tests and I sometimes add
bugs in my code.
So apart from the two questions, we should try resolving these last three,
as well.
----
[0]: Hmm, why is this? Why do Tor relays send small packets to clients,
when the opposite doesn't happen that much? Do Tor relays use variable
sized cells more often than the clients (note that v3 link handshake
didn't exist when these packets were captured)? Or is it just a
coincidence that happens when big TLS records get split, and it just
happened more to relays during the packet capture?
[1]: 100 actual packets, packets splitted during morphing don't count.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5023#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs