Hashing and Python

Wednesday , 25, September 2019

Let’s now take a look at using cryptographic hashing functions in Python programs. We’ll mainly focus on the hashlib module, that provides all the common hashing functions we’re likely to need. To cover almost every use case you’ll only actually need a handful of these despite your interpreter probably supporting more.

To see the hashing functions guaranteed to be supported on all platforms:
>>> hashlib.algorithms_guaranteed
{'sha512', 'md5', 'blake2s', 'sha256', 'sha224', 'sha3_256', 'shake_256', 'sha1', 'shake_128', 'sha3_512', 'sha3_224', 'sha3_384', 'sha384', 'blake2b'}

Use of the basic hashing functions involves 3 basic steps and one more to view the result, we’ll demonstrate with SHA256 but it applies broadly:
>>> import hashlib
>>> h = hashlib.sha1()
>>> h.update(b'hello world')
>>> h.hexdigest()

Note that as many updates() can be performed as needed. The constructor can also take the data as a parameter. In fact these dedicated constructors produce objects with digest() or hexdigest() methods that print the resulting message digest as a bytestring or converted to hexadecimal:

>>> hashlib.sha1(b'hello world').hexdigest()

Note also that this module is built on the underlying OpenSSL libraries. You can directly access message digest functions from your OpenSSL library even when there are no constructors in hashlib by using a dedicated 'new' constructor for this purpose:

>>> hashlib.new('rmd160')
>>> h.update(b'hello')
>>> h.hexdigest()

Sometimes projects require use of a specific algorithm like SHA1, but if you don’t have that constraint, SHA3 is supported on all platforms as shown earlier. Also available is Blake2, which is usually faster and is generally considered more secure. Blake2 comes in two variants, blake2s() optimized for 8/16/32-bit platforms and blake2b() that is optimized for 64-bit systems.

Blake2 supports arbitrary length digests, so your hashes for blake2s() can be from 1 to 32 bytes and blake2b() can produce hashes of 1 to 64 bytes in length. This can be passed in as a parameter digest_size to the constructor, as can the data to be hashed:

>>> h = hashlib.blake2b(b'hello', digest_size=20)
>>> h.hexdigest()

To see actual code implementing the major hashing functions using hashlib module including blake2b(), see shasum.py hashing program, patterned after the terrific Perl program of the same name. Incidentally, I used 20 bytes in the example above to illustrate the fact that Blake2 was designed with abritrary length digests partly to allow using it as a replacement for SHA1 when some use case depends on 160-bit hashes.

Creating password hashes

How to safely create and store credentials? The short answer is to create salted hashes and store those instead. They can be compared to the hashed value of a credential at a future point in time in order to authenticate a user. Python3 has two main methods available to use: pbkdf2_hmac() and scrypt().

hashlib.pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None)

hashlib.scrypt(password, *, salt, n, r, p, maxmem=0, dklen=64)

Here is a good example showing hashing and verifying passwords using this function. Always be sure to use os.urandom() as your source of randomness for best security. This pbkdf2_hmac() function is slower and less secure than scrypt() which is available on Python 3.6+ (OpenSSL 1.1+) and should be the preferred method for systems with sufficient CPU and memory resources and meeting the software requirements. A slower hashing algorithm like scrypt or bcrypt is actually desirable, as efficient cracking via GPUs depends on rapidly trying many possibilities.

As we mentioned in the first hashing post on hashing basics there are many practical uses for hashing in modern programming. We tried to explain about the various hashing algorithms in the previous post so we could get to this point – actually using them in Python programs. Generally the hashlib module should provide options that easily handle the task securely and reliably.