Title : A Steganography Improvement Proposal
Author : cjm1
.oO Phrack 49 Oo.
Volume Seven, Issue Forty-Nine
10 of 16
A Steganography Implementation Improvement Proposal
by: cjm1@concentric.net
[ For those of you who do not know, steganography is cryptographic
technique that simply hides messages inside of messages. The sender composes
an innocuous message and then, using one of many tactics, injects the secret
message into it. Some techniques involve: invisible inks, character
distortion, handwriting differences, word/letter frequency doping, bit
flipping, etc... The method the author discusses hinges upon a well known
steganographic implementation, low-order bit flipping in graphic images. -d9 ]
Steganography is a technique for hiding data in other data. The
general method is to flip bits so that reading the low-order bit of each of
8-bytes gets one a character. This allows one to use a picture or a sound
file and hide data, resulting in a small bit of hopefully unnoticeable noise
in the data and a safely hidden cache of data that can later be extracted.
This paper details a method for making steganographically hidden data more
safe, by using pseudo-random dispersion.
Ordinarily, if someone suspects that you have data hidden in, say, a
GIF file, they can simply run the appropriate extractor and find the data. If
the data is not encrypted, it will be plain for anyone to see. This can be
ameliorated by using a simple password protection scheme, hiding the password
in the GIF as a header, encrypting it first with itself. If someone does not
know the password, they cannot extract the data. This is of course reasonably
safe, depending on the encryption scheme used, and I recommend it. But, the
hidden data can be made even safer.
Pseudo-random dispersion works by hiding a password, and a seed for a
random-number-generator in the encrypted header. then, a random number of bytes
are passed by, before a low-order bit is flipped.
To do this, one must first calculate how many bytes a bit can take up
for itself. For instance, to hide an 800 character message in a GIF would
mean each character needs 8 bytes (8 bits per character, 1 byte per low-order
bit), so you need 6,400 bytes of data to hide the message in, 8 bytes per
character. Let's say we have a GIF that is 10 times this size: 64,000 bytes.
Thus we have 80 bytes per character to hide data in. Since each bit takes a
byte, we have 10 bytes per bit to hide data in! Therefore, if we take a
pseudo-random number between 1 and 10, and use that byte to hide our low-order
bit in, we have achieved a message dispersed through the GIF in a pseudo-random
fashion, much harder to extract. A message in which each byte has a bit which
is significant to the steganographically hidden message can be extracted with
ease relative to a message in which there are 10 possible bytes for each bit
of each character. The later is exponentially harder to extract, given no
esoteric knowledge.
A slight improvement can be made to this algorithm. By re-calculating
the number of available bytes left for each bit after each bit is hidden, the
data is dispersed more evenly throughout the file, instead of being bunched up
at the start, which would be a normal occurrence. If you use pseudo-random
number generator, picking numbers from 0-9, over time, the values will smooth
to 5. This will cause the hidden message to be clustered at the beginning
of the GIF. By re-calculating each time the number of available bytes left
we spread the data out throughout the file, with the added bonus that later
bits will be further spread apart than earlier ones, resulting in possible
search spaces of 20, 30, 100, or even 1,000 possible bytes per bit. This too
serves to make the data much harder to extract.
I recommend a header large enough for an 8 character ASCII password,
an integral random-number seed, an integral version number, and an place
holder left for future uses. The version number allows us to tweak the
algorithm and still be able to be compatible with past versions of the
program. The header should be encrypted and undispersed (ie: 1 byte per
bit of data) since we haven't seeded the random-number generator yet for
dispersion purposes.
It is useful to make the extractor in such a way that it always
extracts something, regardless of the password being correct or not. Doing
this means that it is impossible to tell if you have guessed a correct password
and gotten encrypted data out, or merely gotten out garbage that looks like
encrypted data. Use of a password can also be made optional, so that none is
necessary for extraction. A simple default password can be used in these
cases. When hiding encrypted data, there is no difference to the naked
eye between what is extracted and what is garbage, so no password is
strictly necessary. This means no password has to be remembered, or
transmitted to other parties. A third party cannot tell if a real password
has been used or not. It is important for safety purposes to not hide the
default password in the header if no password is used. Otherwise, a simple
match can be made by anyone who knows the default password.