Unicity Distance (OTP)
Introduction
This post will discuss the concept of unicity distance in cryptography and briefly introduce the one-time pad cipher and it’s relation to unicity distance. An implementation of the one-time pad cipher in Python is also included.
Unicity Distance
Originally defined by Claude Shannon in his 1949 paper "Communication Theory of Secrecy Systems”, unicity distance is the length of ciphertext needed to break the cipher by reducing the number of spurious keys to zero in a brute force attack.
Spurious keys
A spurious key is a key which decrypts the ciphertext to meaningful, yet incorrect, plaintext.
For example take the ciphertext “FHTSNA”.
The key may decrypt this to incoherent plaintext such as “VBSAQU”. Thus is not a possible key as it does not decrypt to meaningful plaintext.
However, may decrypt this ciphertext to “ATTACK” and may decrypt this ciphertext to “RESCUE”.
Thus, without more ciphertext we cannot determine the correct key.
If more ciphertext is intercepted; “FHTSNA TXC”, we can test both and to see if we can determine the correct key:
Ciphertext decrypted with : “ATTACK NOW”
Ciphertext decrypted with : “RESCUE TGX”
With this additional ciphertext we can now discard as a possible key.
The unicity distance gives us the minimum about of ciphertext required before it has only one possible decryption.
Key length
In the example above, there are a possible possible keys if we assume a key must be made up of 5 upper-case English letters. The majority of these keys will not decrypt to any meaningful plaintext and can be discarded. However, of these keys will decrypt to meaningful plaintext but only one will be the correct key.
It is likely that is much smaller than and since gets smaller as the length of the ciphertext increases, there will be some length of ciphertext, , which will reduce the number of spurious keys to zero (i.e reduce to 1). is the unicity distance.
Redundancy
The unicity distance can be formally defined as:
Where is the unicity distance, is the number of possible keys, and is the redundancy of the plaintext.
For example, the redundancy of the English language has been found to be 3.2 bits. If we consider a simple Caesar cipher (25 possible keys), we can compute the unicity distance for this cipher as:
Meaning that with a ciphertext length of just 2 characters, an attacker have enough information to reduce the number of spurious keys to zero in a brute force attack. This is intuituve given how a Caesar cipher functions.
Security
Let be the length of the ciphertext.
If we have an unconditionally secure system as there will exist spurious keys and there is not enough information for an attacker to determine the correct key.
If we have an insecure system as there are no false keys and an attacker has enough information to determine the correct key in a brute force attack.
Generally speaking, in a secure cipher should be as large as possible.
One-time pad
The One-time pad (OTP) is a perfect cipher that cannot be cracked. In the OTP the plaintext and key are combined (XOR in binary) to produce the ciphertext. In the OTP cipher, the key must be:
- Completely random (uniformly distributed in the set of all possible keys and independent of the plaintext)
- At least as long as the plaintext message
- Secret
- Never reused
As noted by Claude Shannon, given the key is completely random and always at least the length of the plaintext, the unicity distance of this cipher will always be longer than the length of the ciphertext. Meaning there will always exist spurious keys and the cipher can never be cracked.
An example of the OTP in Python can be found here.
In practice, this cipher is not feasible to use due to the difficultly in generating a truly random key which is at least the length of the plaintext. Imagine the scenario of encrypting a 10GB file - we would need a 10GB truly random key.
References
Note: these references exclude hyperlinks included throughout the document.
Disclaimer: Do not use this to secure any information. This code is purely to be used for educational purposes only.