Friday, January 30, 2015

Compression Using DeflateStream and GZipStream

The .NET System.IO.Compression namespace provides two general purpose compressions streams –  DeflateStream and GZipStream.
Both these compressions streams use popular compression algorithm which is similar to that used by the ZIP format. The difference is that GZipStream writes an additional protocol at the start and at the end which includes a CRC to detect  errors. The GZipStream also conforms to a standard recognized by other software.
Both DeflateStream and  GZipStream allow reading and writing, with the below provisos:
  • Always write to the stream when compressing.
  • Always read from the stream when decompressing.
DeflateStream and GZipStream are both decorators –  compress or decompress data from another stream which is supplied in the construction. In the below sample, the code compresses and decompresses a series of bytes, using a FileStream as the backing store:
using (Stream s = File.Create ("compressed.bin"))
using (Stream ds = new DeflateStream (s, CompressionMode.Compress))
for (byte i = 0; i < 100; i++)
ds.WriteByte (i);
using (Stream s = File.OpenRead ("compressed.bin"))
using (Stream ds = new DeflateStream (s, CompressionMode.Decompress))
for (byte i = 0; i < 100; i++)
Console.WriteLine (ds.ReadByte()); // Writes 0 to 99
Note that even with the smaller of the two algos, the compressed file is 241 bytes which is more than twice the size of the original file. Thus, compression does not work well with dense  nonrepetitive binary files. Compression is instead a better fit for files such as text text files as shown in the example below:
string[] words = "The quick brown fox jumps over the lazy dog".Split();
Random rand = new Random();
using (Stream s = File.Create ("compressed.bin"))
using (Stream ds = new DeflateStream (s, CompressionMode.Compress))

using (TextWriter w = new StreamWriter (ds))

for (int i = 0; i < 1000; i++)
w.Write (words [rand.Next (words.Length)] + " ");
Console.WriteLine (new FileInfo ("compressed.bin").Length); // 1073
using (Stream s = File.OpenRead ("compressed.bin"))
using (Stream ds = new DeflateStream (s, CompressionMode.Decompress))
using (TextReader r = new StreamReader (ds))
Console.Write (r.ReadToEnd()); // Output below:
lazy lazy the fox the quick The brown fox jumps over fox over fox The
brown brown brown over brown quick fox brown dog dog lazy fox dog brown
over fox jumps lazy lazy quick The jumps fox jumps The over jumps dog..
In the above sample, DeflateStream efficiently compresses to 1,073 bytes which is just slightly over one byte per word.

No comments:

Post a Comment