Back

RSA

Introduction

This post will not focus on the history of RSA or why it is used. There are plenty of resources available that explain this already - see here. Instead we will focus on some of the mathematics around how RSA is implemented, why it is secure, and how it can be implemented in Python.

How it works

RSA is a public-key cryptosystem based on exponentiation modulo a composite prime nn such that n=p.qn = p.q, where pp and qq are large primes.

The protocol is as follows:

  1. Bob selects two large primes pp and qq - these are kept private.
  2. He computes the modulus n=p.qn = p.q - this is made public.
  3. Bob selects a public key ee such that ee is coprime with Φ(n)\Phi(n)*. Where Φ(n)\Phi(n) is computed with either the Totient function or the Carmichael function.
  4. Bob computes a private key dd such that e.d1modΦ(n)e.d \equiv 1 \mod \Phi(n).

We will now look at each step in more detail.

  • Note: two numbers, aa and bb are coprime if gcd(a,b)=1gcd(a,b)=1.

Step 1 - Generating large primes

This first step in RSA is to generate sufficiently large primes, pp and qq. In order for nn to be 2,048-bits, pp and qq must both be \approx 1,024-bits each.

A common way to generate a large prime pp is as follows:

  1. Create σ\sigma, a random nn-bit number (e.g 1,024-bits). It should be the case that σ\sigma is odd as no prime number, with the exception of 2, is even. Thus we select a decimal number in the range 2n1+12^{n-1} + 1 up to 2n12^n -1. We call σ\sigma the prime candidate.
  2. We can then perform a low-level primality test by checking if σ\sigma is divisible by any of the first kk prime numbers (this list of primes is precomputed). The value of kk should be as large as possible. If σ\sigma is divisible by any of these pre-computed primes, the prime candidate is composite and we return to step 1.
  3. Once we find a σ\sigma that passes this low-level primality test, we can perform a high-level primality test known as the Miller–Rabin primality test. In the case of testing for primality of extremely large numbers, such as those used in RSA, a deterministic method is not feasible in terms of compute power. Instead, the Miller–Rabin primality test uses a probabilistic approach. It tests a prime candidate (σ\sigma) for primality many times. Each time the test passes the candidate has a 75% chance of being prime. By performing many iterations, we can increase the probability that such a candidate is in fact prime. Enough tests should be performed such that the probability of the prime candidate being composite is less than 12128\frac{1}{2^{128}}. If any of the iterations fail, we return to step 1.

With the above algorithm we can now generate pp and qq which are both 1,024-bit primes.

Step 2 - Compute the modulus nn

Once we have values for pp and qq, it is trivial to compute n=p.qn = p.q.

Step 3 - Select a public key ee

We must now select a public key ee such that ee is co-prime with Φ(n)\Phi(n).

Φ(n)\Phi(n) can be calculated using either Euler’s Totient function or the Carmichael function.

  • Using Euler’s Totient function Φ(n)=(p1)(q1)\Phi(n) = (p-1)(q-1)
  • Using the Carmichael function Φ(n)=lcm(p1,q1)\Phi(n) = lcm(p-1,q-1)

Once Φ(n)\Phi(n) is computed, a public key ee should be chosen such that ee and Φ(n)\Phi(n) are co-prime. Although the value of ee can be any value provided gcd(e,Φ(n))=1gcd(e,\Phi(n)) = 1, typically ee is set to 65,537. 65,537 is a Fermat Prime. This is useful as given it is prime, it is guaranteed to be co-prime with Φ(n)\Phi(n) and it is also very easy to calculate modular exponents that are Fermat Numbers (as in the case of encryption in RSA).

Step 4 - Compute a private key dd

The private key dd is calculated such that e.d1modΦ(n)e.d \equiv 1 \mod \Phi(n). This is done by using the Extended Euclidean algorithm.

The Extended Euclidean algorithm calculates xx and yy such that ax+by=gcd(a,b)ax + by = gcd(a,b). A simple online calculator can be found here.

We can now look at why this works. If we let a=ea = e and b=Φ(n)b = \Phi(n) we can rewrite the above as:

ex+Φ(n)y=gcd(e,Φ(n))ex + \Phi(n)y = gcd(e,\Phi(n))

We know that ee and Φ(n)\Phi(n) are co-prime and thus gcd(e,Φ(n))=1gcd(e,\Phi(n)) = 1. Therefore we can rewrite this as:

ex+Φ(n)y=1ex + \Phi(n)y = 1

We then take this modulo Φ(n)\Phi(n) to get:

ex+Φ(n)y1modΦ(n)ex + \Phi(n)y \equiv 1 \mod \Phi(n)

The value of yy does not matter as it will be eliminated modulo Φ(n)\Phi(n). Thus we are left with:

ex1modΦ(n)ex \equiv 1 \mod \Phi(n)

Here xx is equal to our private key dd.

We have now computed all the values required to use RSA for encryption and decryption.

Encryption and Decryption

To encrypt data, the message is broken down into blocks such that each block is smaller than the modulus nn. Note if pp and qq are both 1024-bit primes, nn will be 2048-bits. Meaning that each block must be shorter than 2048-bits to avoid data loss.

To encrypt data:

  • Bob makes his public key ee and the modulus nn public.
  • Alice encrypts plaintext ll such that c=lemodnc = l^e \mod n where cc is the resulting ciphertext.

To decrypt data:

  • Bob, the holder of the private key dd, can now decrypt cc by l=cdmodnl = c^d \mod n where ll is the recovered plaintext.

Note: RSA is not typically used to encrypt large amounts of data. Instead it is used as part of a key exchange protocol so that Alice and Bob can agree upon a shared secret for use with a symmetric cipher such as AES. This is because RSA is not as performant as some symmetric ciphers which leverage bitwise operations rather than more complex mathematical operations seen here.

Security in RSA

Brute force approach:

Given ee and nn are public, if Eve was to know some ciphertext cc, she could try all plaintext options ll until she finds an ll that satisfies c=lemodnc = l^e \mod n. However, this not feasible when nn is large.

Breaking the cipher:

If Eve was to learn any one of pp, qq, Φ(n)\Phi(n), or dd she can fully break the cipher.

If she learns pp or qq, it is trivial to calculate the other values given n=p.qn = p.q and nn is public. From here she can workout all remaining values.

If she learns Φ(n)\Phi(n), she can solve for pp and qq given she knows the function Φ(n)\Phi(n) and that n=p.qn = p.q. This is a case of solving for two unknowns with two equations.

If she learns dd, the private key, she can decrypt any chipertext easily.

If we assume that a brute force approach is not feasible and that all private values remain private, then the security of RSA relies on Eve being able to factorise nn into pp and qq. This is known as prime factorisation.

Prime factorisation

The best attack for Eve is to factor nn into pp and qq. However, there is no efficient, non-quantum integer factorisation algorithm known when pp and qq are sufficiently large. Even the fastest prime factorisation algorithms on the fastest computers will take so much time to make the search impractical. However, knowing pp and qq makes computing n=p.qn = p.q trivial, this is the one way function used in RSA.

Some algorithms used for prime factorisation include general number field sieve, the quadratic sieve, Fermat's factorization method, and Pollard's rho algorithm. As it stands, for large RSA key sizes (in excess of 1024-bits) no efficient method for computing the factors is known.

Interestingly, a recent paper by Schnorr [6], controversially claimed that a new factoring method “destroys the RSA cryptosystem”. However, it has subsequently been shown not to be the case [7].

RSA in Python

A complete implementation of RSA in Python can be found here. The code should be self explanatory given the information above.

One interesting point that I have not discussed is how to compute b=acb = a^c efficiently if cc is large. One way to handle this is by using exponentiation by squaring. In the code you will notice on line 41 of main.py I compute the ciphertext by raising the plaintext to the power of ee directly. This is feasible given ee is relatively small. However, on line 45 of main.py, I compute the plaintext by using exponentiation by squaring instead as dd is too large to compute this directly. Python comes with a built in function to handle this - see here.

Besides encryption and decryption of data, RSA is also used for digital signatures. I have not gone into much detail about digital signatures as that is for a different post but I have included a full example in the code attached. You will notice that the only difference is that we use the keys in reverse order. I.e you sign a message with your private key and anyone who has a copy of your public key can verify it was you who signed it.

References

  1. RSA - Wikipedia

  2. RSA - calculator

  3. RSA_ Problem

  4. https://coderoasis.com/implementing-rsa-from-scratch-in-python/

  5. Generating large primes

  6. Fast Factoring Integers by SVP Algorithms - Claus Peter Schnorr

  7. No RSA is not broken

Note: these references exclude hyperlinks included throughout the document.

Disclaimer: Do not use this to secure any information. This code is purely to be used for educational purposes only.

Back