Diffie-Hellman Tutorial with Python

diffie hellman exampleDiffie-Hellman key exchange (sometimes Diffie-Hellman-Merkel, often abbreviated as D-H) allows two parties to derive a shared secret key over an insecure channel. This means that if Alice and Bob are strangers on the internet and they want to create a secure, encrypted channel for communication, they can do so even if an attacker, Eve, is listening to all of their communications. The complete source code for this post is available on github.

Warning: The code in this tutorial is for instructional purposes only. It’s unwise to use homebrew encryption code in sensitive or production environments. If you need real security, use an established, audited and thoroughly tested encryption package like OpenSSL or GNUTLS.

When I first learned about Diffie-Hellman, I didn’t understand how such a scheme could be possible. If all communications between two parties are being intercepted, how can they establish a shared encryption key that the attacker won’t also know? The answer lies in the discrete logarithm problem.

The Discrete Logarithm

The discrete log problem involves solving $$a^x=b \pmod p$$ where \(a, x\) and \(b\) are real numbers and \(p\) is prime.

For instance, \(3\) is a solution to \(2^x=8 \pmod {11}\) because \(2^3 \pmod {11} = 8\). However, \(3\) isn’t the only solution. \(13, 23\) and \(43\) are also solutions because they each solve the equation \(\log_2 8 = x\pmod {11}\) or \(2^x \pmod {11} = 8\). Because of the modulus operation, there are infinitely many solutions to each discrete log equation.

The discrete log problem is this: Given the values of \(a, b\) and \(p\) in the equation \(a^x=b \pmod p\), it’s difficult to determine which value of \(x\) was the original solution to the equation. The discrete log problem is computationally difficult, and no known algorithm can efficiently solve the problem using currently available computers. Schor’s algorithm can be used to solve the discrete log problem, but it requires a quantum computer that hasn’t been realized yet.

How Diffie-Hellman Works

Diffie-Hellman Key Exchange involves two parties computing exponents modulo a large prime, exchanging results, and using the combination of publicly exchanged results with private values to derive a shared secret. An eavesdropper can see both publicly exchanged values but since he or she can’t access the parties’ private values, the eavesdropper can’t compute the same shared secret. Wikipedia’s paint analogy does a good job of showing how it works:

Diffie-Hellman Key Exchange

Diffie-Hellman analogue, performed with paint.

Since Eve the eavesdropper doesn’t know Alice or Bob’s secret paint colors, she won’t be able to derive the correct common secret color.

Both parties agree on a prime \(p\) and a generator \(g\) ahead of time, or the values can be exchanged publicly over an insecure channel. In practice, \(g\) is usually \(2, 3\) or \(5\) and \(p\) is usually taken from RFC 3526 or generated dynamically.

  • Alice computes \(A\) and sends it to Bob. $$g^a \pmod p = A$$
  • Bob computes \(B\) and sends it to Alice. \(A\) and \(B\) are Alice and Bob’s public keys, respectively. $$g^b \pmod p = B$$
  • Alice computes the shared secred, \(s\):$$B^a \pmod p = (g^b)^a \pmod p = s$$
  • Bob performs an equivalent computation to arrive at the same value for \(s\): $$A^b \pmod p = (g^a)^b \pmod p = s$$

Since \(g^{ab} \pmod p = g^{ba} \pmod p\), Alice and Bob will both compute the same secret. Even if Eve captures \(A\) and \(B\), since she doesn’t know \(a\) or \(b\) (the secret paint colors in the illustration) she won’t be able to compute the same shared secret as Alice and Bob.

A variant of Diffie-Hellman can be used between more than two parties. Although the protocol protects against eavesdropping, it doesn’t protect against Man in the Middle attacks. If Eve gets between Alice and Bob and convinces each that she is the other, she can negotiate separate keys with each party and transparently intercept messages. Unless authentication is used in addition to Diffie-Hellman, neither Alice nor Bob will know that Eve is an impostor.

Simple Diffie-Hellman Example (in Python)

Python provides a handy and computationally efficient method for modular exponentiation (exponentiation by squares, in case you’re interested). Instead of using a**b % c to compute modular exponents, use pow(a, b, c). The two representations are functionally equivalent, but pow() is faster and uses less memory.

The core of Diffie-Hellman is simple to implement. Generate a secure random number for each private key (\(a\) and \(b\)), perform the modular exponentiation and pass the result (\(A\) or \(B\)) to the other party:

RFC 3526 specifies safe primes that can be used in Diffie-Hellman implementations and provides some key-strength estimates for each prime. For instance, if you want to generate a 128-bit AES key, you should use the 3072-bit MODP prime with an exponent that has at least 256 bits of randomness. To generate a 256-bit AES key, use the 8192-bit MODP prime with an exponent that has at least 512 bits of randomness. The resulting shared secret can then be hashed to obtain a key of usable size.

Diffie-Hellman in Python: A more practical example

The DiffieHellman python class is a fully-functional implementation of the Diffie-Hellman protocol. It includes functions for generating private and public keys, shared secrets and a usable encryption key. It also includes a function to make sure a received public key is valid by verifying that its Legendre Symbol is equal to one.

Verifying the Legendre symbol of a public key is recommended by NIST SP800-56, but the publication doesn’t assume the use of safe primes. Checking the Legendre symbol isn’t really necessary when safe primes are used, and OpenSSL doesn’t bother to check, but I do (more background here).

The following code will create two instances of the DiffieHellman class and perform an exchange:

The complete source code for this post is available in the pyDHE repository on github.

Further Reading

RFC 2631 – Diffie-Hellman Key Agreement Method

OpenSSL: Diffie-Hellman

Incoming search terms:

4 thoughts on “Diffie-Hellman Tutorial with Python

  1. That’s informative. Great.

    I have a little question.

    In public-key crypto theory, Alice encrypt her data using Bob’s public key and send bob. Bob decrypt her data using his public key.  And Bob encrypt vice versa. I am confused about D-H key exchange. Is it the same concept as the pk crypto?

    Thank you for your post.

    • RSA is more like the kind of asymmetric encryption you’re describing. Whereas Diffie-Hellman uses public and private keys to establish a shared secret, RSA is used to encrypt data with a public key and decrypt the cyphertext with a private key.

  2. Hey evening,

     

    the code in “Diffie-Hellman in Python: A more practical example” is not able to run in Python 3, do you have any other ways?