Introduction to hashing and checksums on Linux Systems.
Let’s say you have to download a file from the Internet you know the www-. This file is highly sensitive and it’s important that you receive exactly the file that the sender is trying to send you but imagine if one of the following occurs:
- A hacker breaks into the site and replaces the original download with their own malicious download.
- There is an error in the file transfer and it’s accidentally modified.
How do you possibly know that the file you downloaded is exactly the same as the one promised?
The answer lies in a process known as "hashing".
In this article, we’ll take a look at what hashing is, how to hash a file in Linux, and a live example of how it works.
What are the goals of cryptography?
Cryptography actually has three main goals:
- Confidentiality - to keep the file content from being read by unauthorized users
- Authenticity - to prove where a file originated
- Integrity - to prove that a file has not changed unexpectedly
Integrity is what we are interested in here.
In this context, integrity means to prove that data has not changed unexpectedly.
Proving integrity is useful in many scenarios:
- Internet downloads such as Linux distributions, software, or data files
- Network file transfers via NFS, SSH, or other protocols
- Verifying software installations
- Comparing a stored value, such as a password, with a value entered by a user
- Backups that compare two files to see whether they've changed
so, What the heck is a cryptographic hash?
A cryptographic hash is a checksum or digital fingerprint derived by performing a one-way hash function (a mathematical operation) on the data comprising a computer program (or other digital files).
Any change in just one byte of the data comprising the computer program will change the hash value. The hash value is, therefore, a unique fingerprint for any program or other digital files.
Whereas encryption is a two-way function, hashing is a one-way function. While it’s technically possible to reverse-hash something, the computing power required makes it unfeasible. Hashing is one-way and no, you can't decrypt it, because it isn't encrypted, it's hashed.
Now, whereas encryption is meant to protect data in transit, hashing is meant to verify that a file or piece of data hasn’t been altered—that it is authentic. In other words, it serves as a check-sum.
What kind of hash cryptography might you use with Linux?
In Linux, you're likely to interact with one of two hashing methods:
- MD5
- SHA256
These cryptography tools are built into most Linux distributions, as well as macOS. The difference is in the mathematics involved, but the two accomplish similar goals. They are not, however, interchangeable.
SHA256 generates a bigger hash and may take more time and computing power to complete. It is considered to be a more secure approach. MD5 is probably good enough for most basic integrity checks, such as file downloads.
1. Let us manually generate checksums
I will walk you through a very easy scenario which purpose is to determine whether a file has changed.
First, open your favorite text editor and create a file named data.txt
with a line of text that reads: Original secret information
$ nano data.txt
$ cat data.txt
Original secret information
Next, let us generate the hash for a file in a directory using sha256sum:
sha256sum data.txt > checksum
ls -l
checksum data.txt
cat checksum
413ffe8ae14bb7e14c0e29b78049b1f869524d1357cd3dd493315ac18e37ddff data.txt
2. Verify File Integrity
Let’s use the hash stored in the checksum file to verify the integrity of the data.txt file that we’ve just hashed:
sha256sum --check checksum
data.txt: OK
Next, let’s modify the information contained in data.txt to simulate a failed test. We’ll use the sed command to replace "secret" with "secrets":
sed -i 's/secret/secrets/' data.txt
we check the file’s integrity again:
sha256sum --check checksum
data.txt: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
2.1. Dealing With Multiple Files
Let’s add another entry. We’ll do this by adding a simple text to a new file, generating the digest for that new file, and appending it to the checksum file:
echo "Here is my password: 123" > data2.txt
sha256sum data2.txt >> checksum
We now verify the integrity of all the entries in the checksum file, it processes each one of the entries, telling us which files fail the test, and which pass:
sha256sum --check checksum
data.txt: FAILED
data2.txt: OK
sha256sum: WARNING: 1 computed checksum did NOT match
Note: Hashing confirms that data has not unexpectedly changed during a file transfer, download, or another event. This concept is known as file integrity. Hashing does not tell you what had changed, just that something has changed. Once hashing tells you two files are different, you can use commands such as diff
to discover what differences exist.
Common Hashing Algorithms
MD5 – MD5 is another hashing algorithm made by Ray Rivest that is known to suffer vulnerabilities. It was created in 1992 as the successor to MD4. Currently, MD6 is in the works, but as of 2009, Rivest had removed it from NIST consideration for SHA-3.
SHA – SHA stands for Security Hashing Algorithm and it’s probably best known as the hashing algorithm used in most SSL/TLS cipher suites. A cipher suite is a collection of cipher and algorithms that are used for SSL/TLS connections. SHA handles the hashing aspects. SHA-1, as we mentioned earlier, is now deprecated. SHA-2 is now mandatory. SHA-2 is sometimes known as SHA-256, though variants with longer bit lengths are also available. It was also developed by the United States National Security Agency, which makes us question its integrity.
RIPEMD – A family of cryptographic hashing algorithms with a length of 128, 160, 256, and 320 bits. It was developed under the framework of the EU’s Project Ripe by Hans Dobbertin and a group of academics in 1996. Its 256 and 320-bit variants don’t actually add any additional security, they just diminish the potential for a collision. In 2004 a collision was reported for RIPEMD-128, meaning RIPEMD-160 is the only algorithm from this family worth its salt.
TIGER – A fairly new algorithm that is beginning to gain some traction with file-sharing networks and torrent sites. There are currently no known attacks that are effective against its full 24-round variant.
Well, that' it 🐒.
Thanks for reading...