I have a requirement where text files are send from one location to other. Both location are in our control. The nature of content and the words that could appear in this are mostly the same. Which means, if I keep the delate dictionary
in both location once, there is no need to send it with file.
I have been reading about this last 1 week and experimenting with some available codes such as this & this.
However, I am still in dark.
Few questions I still have:
- Can we generate and use custom deflate dictionary from a preset of words?
- Can we send file without the deflate dictionary and use local one?
- If not gzip, are there any such compression library that can be used for this purpose?
Some references I stumbled upon so far:
- https://medium.com/iecse-hashtag/huffman-coding-compression-basics-in-python-6653cdb4c476
- https://blog.cloudflare.com/improving-compression-with-preset-deflate-dictionary/
- https://www.euccas.me/zlib/#zlib_optimize_cloudflare_dict
Advertisement
Answer
Below are the specific answers I found along with example codes.
1. Can we generate and use custom deflate dictionary from a preset of words?
Yes, this can be done. A quick example in python will as below:
import zlib #Data for compression hello = b'hello' #Compress with dictionary co = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello) compress_data = co.compress(hello) + co.flush()
2. Can we send a file without the deflate dictionary and use local one?
Yes, you can send just the data without dictionary. The compressed data is in compress_data
in above example code. However, to decompress you will need the zdict
value passed during compression. Example of how it is decompressed:
hello = b'hello' #for passing to zdict do = zlib.decompressobj(wbits=-zlib.MAX_WBITS, zdict=hello) data = do.decompress(compress_data)
A full example code with and without dict data:
import zlib #Data for compression hello = b'hello' #Compression with dictionary co = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello) compress_data = co.compress(hello) + co.flush() #Compression without dictionary co_nodict = zlib.compressobj(wbits=-zlib.MAX_WBITS, ) compress_data_nodict = co_nodict.compress(hello) + co_nodict.flush() #De-compression with dictionary do = zlib.decompressobj(wbits=-zlib.MAX_WBITS, zdict=hello) data = do.decompress(compress_data) #print compressed output when dict used print(compress_data) #print compressed output when dict not used print(compress_data_nodict) #print decompressed output when dict used print(data)
Above code doesn’t works with unicode data. For unicode data you have to do something as below:
import zlib #Data for compression unicode_data = 'റെക്കോർഡ്' hello = unicode_data.encode('utf-16be') #Compression with dictionary co = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello) compress_data = co.compress(hello) + co.flush() ...
JS based approach references: