Using a preset deflate dictionary to reduce compressed archive file size

Question

I have a requirement where text files are send from one location to other. Both location are in our control. The nature of content and the words that could appear in this are mostly the same. Which means, if I keep the delate dictionary in both location once, there is no need to send it with file. I have been

Accepted Answer

Below are the specific answers I found along with example codes.1. Can we generate and use custom deflate dictionary from a preset of words?Yes, this can be done. A quick example in python will as below:import zlib#Data for compressionhello = b'hello'    #Compress with dictionaryco = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello)compress_data = co.compress(hello) + co.flush()2. Can we send a file without the deflate dictionary and use local one?Yes, you can send just the data without dictionary. The compressed data is in compress_data in above example code. However, to decompress you will need the zdict value passed during compression. Example of how it is decompressed:hello = b'hello'  #for passing to zdict  do = zlib.decompressobj(wbits=-zlib.MAX_WBITS, zdict=hello)data = do.decompress(compress_data)A full example code with and without dict data:import zlib#Data for compressionhello = b'hello'#Compression with dictionaryco = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello)compress_data = co.compress(hello) + co.flush()#Compression without dictionaryco_nodict = zlib.compressobj(wbits=-zlib.MAX_WBITS, )compress_data_nodict = co_nodict.compress(hello) + co_nodict.flush()#De-compression with dictionarydo = zlib.decompressobj(wbits=-zlib.MAX_WBITS, zdict=hello)data = do.decompress(compress_data)#print compressed output when dict usedprint(compress_data)#print compressed output when dict not usedprint(compress_data_nodict)#print decompressed output when dict usedprint(data)Above code doesn&#8217;t works with unicode data. For unicode data you have to do something as below:import zlib#Data for compressionunicode_data = 'റെക്കോർഡ്'hello = unicode_data.encode('utf-16be')#Compression with dictionaryco = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello)compress_data = co.compress(hello) + co.flush()...JS based approach references:How to find a good/optimal dictionary for zlib &#8216;setDictionary&#8217; when processing a given set of data?Compression of data with dictionary using zlib in node.js

Advertisement

Answer