[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Preprocessing cipher input for plausible deniability in human communications



I just posted the following to sci.crypt and thought I'd also post it
here since I mention mixminion, and thought it might be interesting
here...

Cryptography often finds use in textual messaging systems, for example,
email encryption and instant messenger communications.  In this
context, given most practical encryption systems, an attacker will not
know exactly what a given message contains, but he can likely identify
the vast majority of possible decryptions as incorrect because the vast
majority of the messages will be not be remotely plausible (not valid
text in a language shared by the communicators).

Depending on the cipher used, its mode, its keysize, and the amount of
input encrypted with a key, it is quite possible that there is only a
single correct key which produces human readable text. The only ciphers
which I am aware of which provide an assurance of a large quantity of
plausible human text as potential decryptions are ones where every key
defines a unique bijective mapping from the input space to the output
space, and where the keysize is equal to the size of the input message.
We refer to such systems as one time pads, and they are usually
achieved by simply xoring the input with the key. Although other
schemes are possible, they will all share the requirement of needing
too much key material for most uses.

This presents an annoying problem:  It is can often be difficult for to a
sender to deny he sent a specific blob of encrypted output, unless he
has a secure channel (in which case, why does he need encryption?).  In
some application the encrypted transmission is very public: there is no
doubt who created the message.

An attacker may somehow be able to find a key which decodes the message
by using 'rubberhose' cryptography against the recipient, by
brute-forcing the probable space of user-provided passwords, by bugs in
the software causing poor key choice, or via other means.

With the practical encryption schemes in use today, once an attacker
has obtained a key which decodes the message he is easily able to use
that key to extend whatever confidence there is in that the sender sent
the encrypted message into confidence that the sender sent that
particular plain-text.

Additionally, if a sender is forced to give up his key or face death,
he will be forced to disclose the real key because no other key will
produce a remotely plausible message. Thus, there is a substantial
incentive for an attacker to use this approach.

Below I will propose a system which addresses these issues within a
limited domain.

Let us consider a communication system which exchanges complete
messages of human readable text as encrypted fixed sized messages of N
bits.  Fixed sized messages are used in some private communication
because they are the only proven method of avoiding statistical
analysis on the message size. I am not sure if my idea can be easily
extended to non-fixed sized systems, so I will exclude them for here.
An example system using fixed sized messages is Mixminion
(http://www.mixminion.net/).

If we consider the set of all possible messages of N bits or less (here
after called set M), it follows that we can count each message with a
number representable in N bits or less in size.

We define function T to be a bijective mapping from the set M onto set
M such that messages are now ordered by their probability in the
context of the communication system (for example, if the system is
exchanging english text, "I went to the park." will be strictly before
"wr9ir4Pi;49 9irw9i3i$jRwe").  Because of the size of M it is not
possible to produce an optimal function T, but we can approximate it.
More below.

We define function S to be a surjective mapping on the size in bits of
the message, such that sizes of around perhaps 40 bits have much
shorter, and thus more probable output values and that longer messages
have outputs of substantially lower probability. (Think of a huffman
code where sizes around 40 bits are very likely).

Function E(m,k) encrypts message m with key k. The only requirement is
that the  encryption function does not make it easy to build a
dictionary of S_output,key -> first few bytes of ciphertext. Since S's
output should be fairly small, most block ciphers even in ECB mode will
suffice.

Function R(x) produces N - sizeof(x) random bits where N is the fixed
message size.

We now compute the encryption of message m by computing E_m =
E(S(m)+T(m)+R(T(m)+S(m))). Where + is concatenation.

We see that this encryption process is somewhat expansive (depending on
the size of the S output.) We ensure that no input message can be
longer than N minus the maximum size of S, and shrink T's output
accordingly, to account for  this.

If we decrypt E_m with an incorrect key, the resulting S will likely
indicate a fairly short message. Because of the nature of the T
mapping, the short messages will all be 'high probability' and thus
tend to be plausible english text.

Because of this property, it will be much more difficult to construct
an automated bruteforcing system since the goal will be to look for the
key giving the least probable decoding, and the attacker will be out of
luck if the actual transmission happened to be a high probability one.

The sender has an increased ability to deny a specific key because
there are so many other keys which also produce english text. It could
be argued that a key is correct because it produced a fairly low
probability message, but the issue is certainly more clouded.

Furthermore, after encrypting a message, a sender could test various
keys for decryption and find some decryption which provides a basic
level of plausibility. This key could be provided to an attacker under
duress.

Obviously any authentication must be on the encrypted output, not on
the plaintext.

The suggestion of 40bits as the tuning size for S was only a guess, S
should be tuned so that it's output is long enough to produces messages
of a believable length, but not so long that random decodings yield
implausible content. Tending towards shorter bad-decodings also helps
the sender find reasonable cover messages.

Now let us consider the construction of the T function. The T function
is actually an idealized version of what compression software tries to
achieve. So finding T would also find us the perfect compression
algorithm, which is obviously not a trivial task.

However, T does not need to be optimal in order to be usable. Our goal
in T is that every possible input gives a valid decoding, and that that
smaller outputs will tend to map to reasonable english text and that it
is not too unlikely for short human readable inputs to map to also map
to short outputs. A dictionary based PPM compressor with the right
backend (one which ensures all inputs decode, and that the most
significant bits come first) will be sufficient.

So, is this idea useful or am I missing something obvious?