Welcome back to another installment in our series on steganography. After introducing Digital Steganography Basics, we just finished looking at a modern form of hiding text within text using the snow technique in the post titled Digital Text Based Steganography. Let’s move on to talk about binary files now, as they represent a more popular choice due largely to their substantially larger carrying capacity.
Binary files can obviously be grouped by type: images, audio files, video files, executables, other binary files. For the purposes of steganography however, it’s useful to distinguish between lossy and lossless files. Binary files have a tendency to be large in this age of smartphone videos, and many popular file formats utilize compression to reduce the disk space and bandwidth required to store and transmit them. Lossy file formats find ways to compress the files, losing some portion of the information in the process.
The technique for today, LSB encoding, is commonly used with lossless file formats. The classic example used to describe this technique is a bitmapped image, typically stored on disk with a .BMP extension. Other examples of lossless image formats include PNG, TIFF and RAW. WAV audio files are usually lossless, and FLAC format is lossless but uses compression too. There aren’t really any lossless video formats, since video files can easily get too large to be practical.
The reason this technique is explained using bitmapped image files is that this file format is simple. .BMP files represent images as a grid of pixels, with a color value stored for each pixel. That’s an easy model to work with.
Image formats usually represent the images in 24-bit color. What this means is that 24 bits are used to store the red, green and blue values – one byte for each. So each pixel gets red, green and blue color values ranging from 0-255 decimal or 00 to FF in hex. A color value of 0x00 is actually 00000000, i.e. 8 bits all of which are zero. 0xFF on the other extreme is 11111111 – 8 bits that are all ones.
We also need the concept of most significant bit (MSB) and least significant bit (LSB). In the next paragraph we try to describe why changing the LSB has such a minimal impact visually while changing the MSB has a dramatic impact on the image from a human observer’s perspective. If you already understand this or want to avoid the one and only section with math in it, feel free to skip this next bit.
These three bytes describing the color value can each be any of the 256 possible values since there are 8 bits and 2 possibilities per bit – 2 to the 8th power being 256. A binary number like 0101 (base 2) can be converted to base 10 (decimal) by expanding it as follows:0101 (base 2) = 0 x 2^3 + 1 x 2^2 + 0 x 2^1 + 1 x 2^0 = 0 + 4 + 0 + 1 = 5 (base 10)
Similarly in our 8-bit numbers, the leftmost digit is zero or one 1s, and the rightmost digit is zero or one 128s (2^7) So it follows that changing the rightmost bit alters the value by one, and changing the leftmost bit changes it by 128. When these represent colors in a range from 0-255, changing the color value by 1 is not enough to notice by visual inspection. Changing it by 128 dramatically changes it and is easily noticed with the naked eye.
Great, we’ve covered all the math we need today! We refer to the leftmost bit that substantially changes the color value as the Most Significant Bit (MSB), and the rightmost bit which only affects a tiny change as the Least Significant Bit (LSB). As you might have guessed this LSB technique involves changing the rightmost bits in these color values. Now let’s talk about methods of doing that.
LSB encoding has the notable characteristic of being easy to detect by computer programs. As the number of bits used to encode the hidden file increases there is noticeable degradation of the cover file but the amount of data that can be concealed increases. Given this tradeoff, when using this method typically the least number of bits possible is used for the relative file sizes. Embedding a larger file requires using more bits. This tradeoff is significant too; LSB1 changes color values by one, LSB2 changes them by two, but LSB3 changes the color values by four and LSB4 changes the values by eight.
LSB encoding is popular because the carrier file can conceal relatively large amounts of data with little visible distortion to the carrier file. Note the impressive carrying capacity shown above for the given file size. Using the four rightmost bits, you can add so much data that it’s almost equal in size to the cover image. LSB is relatively poor in terms of robustness however. For example images will almost certainly have the data damaged by cropping, resizing and similar operations.
Detection of images with concealed content hidden with LSB is easy, as mentioned previously. A visual inspection will flag the image as suspicious if too much data is stuffed into an image. Similarly there are techniques like creating a grayscale image from just the least significant bits that is quite effective at suggesting there may be content written into those bits – particularly when the size of the hidden data is large relative to the size of the cover image as shown below.
In fact looking at histograms is effective also. They tend to show bunched up lines where images have data embedded inside. There are probably many other methods that will work to detect LSB, but there are techniques beyond what we described to do a better job hiding the data as well.
A simple LSB enciding scheme as described earlier will simply cycle through consecutive pixels changing values until the secret has been embedded – and then stop. A smarter way is to distribute the hidden data across the image, more effectively distributing the change.
A more stealthy approach still, provided there is excess capacity, is to find the best areas of the image in which to hide the data. An area where there is little to no variance in color from pixel to pixel will not hide data as effectively as an area where there is already considerable variation in color from one pixel to the next.
There are probably better ways of obfuscating what is going on here too. Hiding the fact that information is being hidden in images via LSB is tricky. This is what motivates people to use other techniques that we’ll cover in future, so stay tuned!