linux 🐧

Hashing, Salting, and Password Storage on Linux.

Whenever you want to check your email, open an SSH session, or even make a purchase on any of your favorite websites, you were prompted to create (and if not too much asked, try to remember it!) a password. It is the first line of defense against unauthorized access to your data.

Aleksandra Sloan, Gabriella Bennett, Kenzo Kylo

29 Dec 2021 • 8 min read

Image credit: Unknown. Please contact us if you are the owner.

Have you ever asked yourself ...
What happens when I create a password?
How does this procedure aid in securing your data?
Where do users' passwords are stored in the file system?

When you create a password on a website, it isn't saved verbatim on the server. That's because if there was a security breach, your password would be publicly exposed. Instead, your password goes through a "hashing" procedure, which greatly enhances security (ALA, the provided password is strong enough).

What is hashing and how does it work?

A cryptographic hash function is an algorithm that accepts an arbitrary amount of data—a credential/source data called the "key" or "message"—and generates a fixed-size output of enciphered text known as a hash value, "hash", "hash code", "hash sum" or "message summary".
That enciphered text can then be saved instead of the password itself and used to validate the person whenever needed.

ps: encrypt is to conceal information by means of a code or cipher while encipher is to convert plain text into cipher; to encrypt.

A good hashing algorithm would exhibit a property called the avalanche effect, where the resulting hash output would change significantly or entirely even when a single bit or byte of data within a file is changed. A hash function that does not do this is considered to have poor randomization, which would be easy to break by hackers.

A hash is usually a hexadecimal string of several characters. Hashing is also a unidirectional process so you can never work backward to get back the original data.

A good hash algorithm should be complex enough such that it does not produce the same hash value from two different inputs. If it does, this is known as a hash collision. A hash algorithm can only be considered good and acceptable if it can offer a very low chance of collision.

What are the benefits of Hashing?

One main use of hashing is to compare two files for equality. Without opening two document files to compare them word-for-word, the calculated hash values of these files will allow the owner to know immediately if they are different.

Hashing a password is good because it is quick and it is easy to store. Instead of storing the user's password as plain text, which is open for anyone to read, it is stored as a hash which is impossible for a human to read.

Hashing is also used to verify the integrity of a file after it has been transferred from one place to another. To ensure the transferred file is not corrupted, a user can compare the hash value of both files. If they are the same, then the transferred file is an identical copy.

Types of Hashing

Hashing algorithms are just as abundant as encryption algorithms, but there are a few that are used more often than others. Some common hashing algorithms include MD5, SHA-1, SHA-2, NTLM, and Whirlpool.

MD5: This is the fifth version of the Message Digest algorithm. MD5 creates 128-bit outputs. MD5 was a very commonly used hashing algorithm. That was until weaknesses in the algorithm started to surface. Most of these weaknesses manifested themselves as collisions. Because of this, MD5 began to be phased out.

SHA-2: It was developed shortly after the discovery of cost-effective brute force attacks against SHA-1. It is a family of two similar hash functions, with different block sizes, known as SHA-256 and SHA-512. The primary difference between both is the word size; SHA-256 uses 32-byte words whereas SHA-512 uses 64-byte words. There are also modified versions of each standard, known as SHA-224, SHA-384, SHA-512/224, and SHA-512/256. The most commonly used SHA function today is SHA-256, which allows for plenty of protection at current computer processing levels. and that's the one we'll be using throughout this post.

Whirlpool: This produces a hash code of 512 bits for an input message of a maximum length less than 2²⁵⁶ bits. The underlying block cipher, based on the Advanced Encryption Standard (AES), takes a 512-bit key and operates on 512-bit blocks of plaintext. Whirlpool has been endorsed by NESSIE (New European Schemes for Signatures, Integrity, and Encryption), which is a European Union-sponsored effort to put forward a portfolio of strong cryptographic primitives of various types.

Weak Cryptographic Hash

Weak cryptographic hashes cannot guarantee data integrity and should not be used in security-critical contexts.

MD2, MD4, MD5, RIPEMD-160, and SHA-1 are popular cryptographic hash algorithms often used to verify the integrity of messages and other data. However, as recent cryptanalysis research has revealed fundamental weaknesses in these algorithms, they should no longer be used within security-critical contexts.

In the case of SHA-1, security researchers have achieved the first real-world collision attack against the SHA-1 hash function more than 20 years after it was first introduced, producing two different PDF files with the same SHA-1 signature. This shows that the algorithm's use for security-sensitive functions should be discontinued as soon as possible.

SHA-2 (Secure Hash Algorithm 2)

As we saw above, a hash function takes a cleartext password and turns it into enciphered text for storage. This ensures that if the password storage system is compromised, he/ she will not be able to find out the user password, as they are stored as hashes.

A common, yet secure hash function is SHA-512 with 64-bit words, which returns a 128-character string from any input no matter how long or short your input is.

Below are a few examples of what a hash looks like.

SHA-512(snubmonkey) = 1F99B23951F451867EF5A85CACE49D38225A0D2918708BAA565DCA7FBF4DE593CAE9834510844F046EFBA4B57D78239C246D557D880601ED10FA5589FD96943E
SHA-512(0123456789qwerty) =
F0DA08796A73EF9AB4303AECF9DB9B241CF7F482DD3305BB310494DF9EF0B1AA6C0A8295EA2A1F64AE7FDB6EB68542041C93E1209C4BAD8C9F59875BE92F4E3D
SHA-512(SnubmOnkey) = E168999B03F49A3AA7157A66A2A219861958F4B47D56129DB0B5DC1F7F59A5022ED4CB3AB2F94517478534060EE14550DFD8EF34DA461B4A5E319C13699C3686
SHA-512(snubmonkey) =
1F99B23951F451867EF5A85CACE49D38225A0D2918708BAA565DCA7FBF4DE593CAE9834510844F046EFBA4B57D78239C246D557D880601ED10FA5589FD96943E

From the above examples, we can learn a lot about hashes:

Small changes have a big impact – Take a look at examples 1 and 3. We capitalized the letters "s" and "o". Despite these two adjustments, the second output looks nothing like the first.

The output length never changes – Here, example 2's input is far longer than the other instances, it produces an output of the same length (128 characters). We could input an entire encyclopedia into the SHA-512 hash function and we would still get a 128-character string as the output.

Repeatable – An input will always give the same output when hashed using the same function as in examples 1 & 4.

Hard to reverse – Even if a hacker knows the method used to generate a hash, reversing that procedure and generating the password is nearly impossible. In fact, it's so hard that attempting millions of different combinations to get the same result (brute force attack) is usually faster than the calculations needed to reverse the hashing process.

Password hashing and access granting

Let's have a look at how hashing actually works:

Step 1 – A user creates their username and password.
Step 2 – That password is put through a hash function and the hash is stored in the database.
Step 3 – A user logs in, enters their password.
Step 4 – That password is run through the same hashing function as was used before.
Step 5 – The server/system checks this hash against the one stored for the user in the database.
Step 6 – If and only if the two hashes match, the user is granted access.

Salting

"Hashed and salted" is a term used to describe passwords. Before each password is hashed, a unique, random string of characters is added to it. Typically, this "salt" is placed in front or at the end of each password.
The use of unique salts means that common passwords shared by multiple users – such as “1234567890” or “I love you” – aren’t immediately revealed when one such hashed password is identified – because despite the passwords being the same the salted and hashed values are not.

Large salts also help to prevent some types of attacks on "hashes", such as the rainbow tables (Rainbow tables are tables of reversed hashes used to crack password hashes) or logs of previously cracked hashed passwords.

What happens when both users Superman and Batman, choose the same password for their user accounts? – Of course, the hash values of their passwords is going to be the same since both the hashing algorithm used was the same which messes up the user authentication system. That's where salting comes in handy.

Salting of Superman’s password where the unique salt value is Rfu7gh3G:

SHA-512 hash (kryptonite + Rfu7gh3G) =
2416AD4B7FF4EC1E7C2EABE83CAC27F7122CC774096AA69655C511E9C3C0AEBAF50D8C534B64D6C5309CAA9904BA090B6DBB7B41D1A94C40C288A5FC76C88D28

Salting of Batman’s password where the unique salt value is Yz5yuuj:

SHA-512 hash ( kryptonite + Yz5yuuj) =
B31B3C7C582E0C318BA8CA4835FCA5B641D2283F5AE597B11AFE3AB2E9B83E583E77B1E3889C90014698BA3A16DF6B43E6EE08E6C2ACEF31587E6EBE5F08FCFA

Location and Content of the Password File

The password content is saved in the /etc/shadow file on Ubuntu. This file can only be written to by the root user. Along with the hashed password, this file also holds content like username, password change date, expiry date, etc. in colon (:) separated format.

$ sudo grep kryptonite /etc/shadow

OUTPUT

kryptonite:$6$n4wLdmr59ptB8zWG$4.YWKc2kv10JSB2jzaKcEUIL43u98NgijLhWljd9W7NaoipM.oRrmRdOSpqyN/y4Nf6rilzlja1GlDpud2zXl1:18912:0:99999:7:::

Despite its length, that line is quite simple to read. The first two fields in the lines of this colon-separated file store:

the username (kryptonite)
the password hash (including the hashing method used) in a $id$salt$hashed format

That $6$ portion of this string represents the hashing algorithm used.

$1$ stands for MD5
$2a$ stands for bcrypt
$2y$ stands for bcrypt
$5$ stands for SHA-256
$6$ stands for SHA-512

The value between the second and third $ sign represents the salt that is used for hashing; here: $n4wLdmr59ptB8zWG$

The value after the third $ sign represents the actual hashed password; here: $4.YWKc2kv10JSB2jzaKcEUIL43u98NgijLhWljd9W7NaoipM.oRrmRdOSpqyN/y4Nf6rilzlja1GlDpud2zXl1

The following numeric fields (18912:0:99999:7:::) represent:

the date of the last password change in a "days since the epoch" format (18912)
the minimum required days between password changes (0)
the maximum allowed days between password changes (99999)
that the password will never expire!
the number of days in advance to display password expiration message (7)
the number of days after password expiration to disable the account (not set above)
the account expiration date (not set above)
an empty field means that the account will never expire.
a reserved field (not set above)
this field is reserved for future use!

Is hashing sufficient to keep passwords safe?

Now that we know that hashes are the same length regardless of the password we choose, we might be tempted to pick a short, easy-to-remember password. We should, in fact, do the exact opposite. Choosing a password for your online accounts is no different than choosing a password for your system. The password you choose is crucial to keep your information safe and secure.

Once a hacker gets your hashed' password from one of your favorite websites, he/she will put combinations of characters into a hashing function until a hash that matches yours is created.
Since the functions themselves are well known, password hackers can easily calculate hashes for known words and other commonly chosen combinations. They then match the cracked passwords against these dictionaries.
These dictionaries go far beyond simple words. They contain word lists in the form of dictionary words, common passwords, iterations of common passwords, exposed passwords, the practice of changing letters for numbers (e.g. 1 instead of l). They can also contain passwords that used to be hashed but have been subsequently cracked because they were stored in a weak password hashing algorithm.
This means weak passwords can be broken very quickly.

Remember these strong password best practices:

Use a combination of at least eight letters, numbers, and symbols
Do not use sequential numbers or letters
Never reuse that password on other websites
Combine different unrelated words in your password or passphrase
Make it long, nothing shorter than 15 characters, more if possible.
Use two-factor authentication (2FA) … but try to avoid text message codes

Now, let's go to this hash-cracking website where you can put in a hashed version of a password, and see if it will crack it; telling you the password.

Example password: milkway123
Hash in SHA-512: 823AB07D68B45EE959669A08D2997ED281CC5CFECEABEBF191643F2C5269BFA5877597B3B6AFC331FE11C111499AD509FB5F59FDFC074B194EFF8DA58F27D232
vs
Example password: MiLkwAy123
Hash in SHA-512: 43AA88B18E46BDC0E2F48AEC8CC2918E9234F41F42B3F7292CF522A0A23A2B06D0B09417590CF5A690861565E583BD0A2D311C00CB33F552C5973399056C4813

As you can see, the configuration of the password also makes a difference.
A truly random eight-character password will be more secure than an eight-letter dictionary word because brute-force attacks use dictionaries, names, and other lists of words as fodder.