[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #27112 [Core Tor/Stem]: Decouple payload processing from pop/unpack + tune abstraction layers
#27112: Decouple payload processing from pop/unpack + tune abstraction layers
---------------------------+--------------------------------
Reporter: dmr | Owner: dmr
Type: enhancement | Status: needs_revision
Priority: Medium | Milestone:
Component: Core Tor/Stem | Version:
Severity: Normal | Resolution:
Keywords: client | Actual Points:
Parent ID: | Points:
Reviewer: atagar | Sponsor:
---------------------------+--------------------------------
Comment (by dmr):
Sorry again for the delay in my response, and for the verbosity...
I think it's probably best that I step back a bit to explain a bit more of
the broader picture as I see it.
The changes in that branch were indeed messy, but I think they did push
towards the vision I have. While I don't have the "end design" quite
figured out (hence most of the messiness), I think it would help if I
clarify at least an intermediate part of my vision. Thereafter, I can
respond a bit more specifically to your comments.
I see the following current limitations in stem.client's design:
1. dropping multiplexed cells or (worse) throwing exceptions when they
occur
2. having `unpack()`/`pop()` methods that effectively won't work on the
stream of data received from a guard, if there's //any// RELAY/RELAY_EARLY
cells in there
3. allowing only a single layer of decryption, instead of an arbitrary
number of layers
4. being limited to only a single "actor" that may send/process Cells
(related to `1.`)
There are more, but the above are the main ones that I was trying to
address with this ticket.
I felt my changes took a pretty big step forward towards these (not
completed), but I feel the current master has not - mostly taking only
some abstraction-layer tuning and small fixes.
I'm not dead-set on any given design, but I hope the following helps
explain why I was trending the way I was.
==== Limitation 1: dropping multiplexed cells or (worse) throwing
exceptions when they occur
By this I mean: Tor is multiplexed by nature. Most cell sequences on the
wire can be interleaved with others (the spec calls out just a few
exceptions).
This should be fairly self-evident by the ability to have multiple
circuits and streams open concurrently, but I wanted to call it out
explicitly.
This is related to `4.` but goes beyond it - any hop on any of our
circuits may send a Cell to us - it does not have to be a response in part
of a sequence.
The client needs some way to handle this - i.e. to demultiplex.
**My proposed solution** to this for stem is to centralize ORPort reads -
separating that from more specific handlers. A central point (probably a
dedicated thread) should be the only thing reading from the ORport socket.
Sorting and handling of incoming cells probably will be facilitated by
queues (not there yet).
(This ticket doesn't aim to centralize that, but rather take a step
towards making that possible.)
==== Limitation 2: having `unpack()`/`pop()` methods that effectively
won't work on the stream of data received from a guard, if there's //any//
RELAY/RELAY_EARLY cells in there
Because our `Cell.unpack()`/`Cell.pop()` methods will directly create an
interpreted RelayCell, from the payload sent over the wire (which, per the
spec, is always encrypted), these methods effectively can't be used on
incoming bytes that may have a RELAY/RELAY_EARLY cell.
(Said a different way, interpreting an encrypted payload as if it were
unencrypted doesn't get us the data we want, and has plenty of potential
to throw an exception.)
Since most of a tor client's useful operation involves relayed Cells,
these methods in their current form don't quite make sense to me.
The `Cell.by_value()` lookup method seemed like a natural place to adjust
the returned class, but I'm not tied to changing that if you'd suggest
another method.
To make these methods more universally useful (e.g. to use `Cell.unpack()`
in whatever solution fixes `1.`), **my proposed solution** for this is to
unpack RELAY cells into some intermediate form that allows parsing the
payload but not interpreting it. This was the basis for the
`RawRelayCell`.
I'm not tied to `RawRelayCell`, but I did see it as an interim way to
solve this. (I'm still trying to figure out the best hierarchy for
RELAY/RELAY_EARLY Cell classes.)
==== Limitation 3: allowing only a single layer of decryption, instead of
an arbitrary number of layers
Right now, the `Cell.decrypt()` method creates a `RelayCell`, meaning that
it suffers from the same problems as `unpack()`/`pop()` does, for use in
multi-hop circuits.
We need a way to arbitrarily decrypt the payload without creating a
"Exception hazard" (here: RelayCell) automatically. When decrypting, we at
least need to parse and look at `'recognized'`, and per the spec
[[https://gitweb.torproject.org/torspec.git/tree/tor-
spec.txt?id=43c2a65a28ea671cd627df9ea5e89f371825a6cd#n1548|only consider a
cell decrypted if we properly also confirm the digest]]:
> If the digest is correct, the cell is considered "recognized" for the
purposes of decryption (see section 5.5 above).
Maybe something unclear here is this: //any// relay in our circuit may
send us a Cell, so we can't just assume an encrypted cell is from the
final hop - we need to iteratively decrypt until we can tell it is
decrypted. Only at that point do we know //which// relay sent it to us.
`RawRelayCell` also served that purpose - it was just a `Cell` container
for the payload.
Using this intermediary allowed a fairly nice API for multi-hop
decryption, without the consumer needing to know specifics. I just want to
point out the
[[https://gitweb.torproject.org/stem.git/commit/?id=5baa2e37728ab50a5132d81225070e097e6bd058|example
code that I had in a commit message]]. Here's the main snippet:
{{{
# relay_1 := nearest relay in circuit
decryption_states = [
(relay_1, digest_1, decryptor_1),
(relay_2, digest_2, decryptor_2),
(relay_3, digest_3, decryptor_3),
]
new_decryption_states = copy.deepcopy(decryption_states)
from_relay = None
for i in range(len(new_decryption_states)):
relay, digest, decryptor = new_decryption_states[i]
cell, fully_decrypted, digest, decryptor = cell.decrypt(digest,
decryptor)
new_decryption_states[i] = (relay, digest, decryptor)
if fully_decrypted:
from_relay = relay
break
if from_relay:
decryption_states = new_decryption_states
else:
# handle this, since the cell couldn't be decrypted
# probably raise Something()
pass
}}}
I don't think we want the `Cell.decrypt()` method to handle anything more
than a //layer// of decryption. I believe the Circuit should maintain the
crypto state, be told whether the cell is fully decrypted, and know what
to do if fully decrypted or not.
So if `decrypt()` doesn't return `cell, fully_decrypted, digest,
decryptor` (where `cell` is typically a raw cell), I think it should at
least return `payload, fully_decrypted, digest, decryptor`. These are
practically the same except the latter is lower-level (arguably leaky
abstraction) with an added need to create a `Cell` from that payload - a
step that could be reduced - and likely a static method instead of an
instance method. (An instance method, to me, seems more natural.)
In other words, **my proposed solution** for this is to have `decrypt()`:
* callable for an abitrary number of encryption layers
* //not// cause Exceptions
* //not// return an interpreted Cell by default
==== Limitation 4: being limited to only a single "actor" that may
send/process Cells (related to `1.`)
All of the stem.client code currently locks the ORPort, sends cells, and
receives cells, and releases the lock.
This is related to `1.`, but different (hopefully I can explain this ok).
Given limitation `1.`, it's not //that// bad if only a single actor makes
use of stem.client.
But this limitation means that stem.client only allows a single
synchronous message stream, and doesn't have good ways to support longer-
term streams.
Again, **my proposed solution** for this is for stem is to centralize
ORPort reads, as well as writes.
(This ticket doesn't aim to centralize that, but rather take a step
towards making that possible.)
==== Summary
I'm not tied to any specific solutions, and definitely open to suggestion.
I hope the above at least explains what I'm trying to solve in this
ticket.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/27112#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs