Skip to content Skip to sidebar Skip to footer

Why Does Python Zipfile Not Give The Same Output .zip File Size As Command-line Zip?

Here is the size of the file generated by zip: $ seq 10000 > 1.txt $ zip 1 1.txt adding: 1.txt (deflated 54%) $ ls -og 1.zip -rw-r--r-- 1 22762 Aug 29 10:04 1.zip Here is a

Solution 1:

It turns out (checked in python 3) that when ZipInfo is used, writestr() will not use compression and compresslevel of zipfile.ZipFile.__init(). This an example of bad API design. It should have been designed whether ZipInfo is used, compression and compresslevel from the constructor are always used.

When passing a ZipInfo instance as the zinfo_or_arcname parameter, the compression method used will be that specified in the compress_type member of the given ZipInfo instance. By default, the ZipInfo constructor sets this member to ZIP_STORED.

Because of this, there is basically no compression in the python code shown on the original post. Therefore, the file size generated by the python code is large.

Another problem of this API design is the parameter compression from the constructor is the same as compress_type of .writestr() but they are not named the same. This is another poor design. There is no reason to give different names for literally the same thing.


Post a Comment for "Why Does Python Zipfile Not Give The Same Output .zip File Size As Command-line Zip?"