# Why would my zipped file be the same size as the uncompressed files?

A 398MB directory was only compressed to 393MB using 7Z and Normal ZIP compression. Is this normal? If so, why do people continue to use ZIP on Windows?

Replay

If you're compressing things that are already compressed (AVI, JPEG, MP3), you won't gain much other than packing everything in a single file.

Compression works by looking for repetitive patterns inside the items to compress. Also because you do not want to lose any data while compressing your files, the compression must be lossless(*).
Now with that in the back in your head, think about the way files (items) are stored on a computer. At the lowest level, they are all just a bunch of 0's and 1's.

The question can thus be transformed to: "How can I represent a bunch of 1's and 0's in a more compact way than the original representation?"

So lets start from the beginning, how can you compact the normal representation of a single bit (a single 1 or a single 0)?
The answer is really easy: you can't!... a single bit is represented in the most compact manner possible.

Fair enough, let us take a bigger example, how would you compress a binary string like 0111 0111 0100 0111?
Well because we already know that looking at the individual bits won't help us at all, we know that we have to look at a bigger scale. For example, let's take 4 bits at a time. We now see that the binary string "0111" will occur 3 times in the example, so why don't we represent that with a single bit: 0? but this still leaves 0100 in the dark, so let us represent that with "1"
We know have compressed the original to: "0010"

That's really good! However this is just the basic of basics of the "Huffman encoding algorithm", and in the real world it will be a little more complicated than that (and you would also need to store a table with the encoding information in it, but that's a bit to far for answering this question).

Now to really answer your question: why can't all data be compressed that good?, well let's take another example: "0001 0110 1000 1111", if we would use the same technique as above we would not be able to compress the data (no repetition is found), and thus would not benefit from compression...

(*) there are of course exceptions on this. The most known example of this is the compression used for MP3 files. here some information about the sounds will get lost while converting it from the raw, original file, to the MP3 format, this compression is thus lossy. Another example is the .JPG format for images

The process of compressing takes repeatable patterns and tokenizes them to shorter patterns. The output is then mostly non-repeatable and therefore cannot be compressed by much, if at all.

From the Limitations section of the Wikipedia article on Lossless Compression:

Lossless data compression algorithms cannot guarantee compression for all input data sets. In other words, for any (lossless) data compression algorithm, there will be an input data set that does not get smaller when processed by the algorithm. This is easily proven with elementary mathematics using a counting argument. ...

Basically, it's theoretically impossible to compress all possible input data losslessly.

Is this normal?

No. Not with "normal" files. What kind of files were you compressing? If they were already compressed, e.g. they are JPGs, GIFs, PNGs, videos or even other zip files, then they won't be compressed much by any algorithm. If you try compressing Text, XML, uncompressed BMP, source code etc. files, zip will provide good compression, but probably not the absolute best.

Why do people continue to use ZIP on Windows?

One reason is that there is nice zip handling built into the system - you can right click anywhere and create a new zip file, then drop stuff into it. You can just double click a zip file and it opens like a folder. You can copy stuff out of it and sometimes even use it in place. You don't need to install WinZip or 7z or any other program. I usually recommend people don't.

In a zip archive containing many files, each file is compressed independently. If there is a great deal of similarity between the files, then a different tool might give much better compression.

For example, tar.gz joins the files together, then compresses the results. Likewise a "solid" rar file makes use of similarities between files.

The downside of tar.gz or a solid rar is that you can no longer extract a single file from a large archive without decompressing the archive up to where the file you want is.

In the past, winrar has done a better compression job using the .rar algorithm, than the .zip one.

After much research and problems, I may have found a solution to sending large files through an email. I needed to attach several pictures, some PDFs, and some documents for a closeout package. I wanted it to be simple and clean for my boss to open. The files were large. Before sending, the pictures were about 32 MB and rest of the pdfs and text were around 5 MB... well over gmail's 25 MB limit.

Here is what I did... I downloaded and opened Picasa. I googled "how to resize pictures picasa windows 7". I selected my pictures that I needed in Picasa and clicked file: Export to a folder: and left everything the way it was except I put resize to 1600 instead of 800. Once I had the pictures exported to a file on my desktop, I selected all the pictures at one time (marqueeing them I believe is the correct term), right clicked on them, clicked on Send to: and clicked on Compressed file... This created an extra file with all the pictures I selected in it and the file is compressed (though I don't think it resized them again.) Then I put my pdfs and text documents into the compressed file. They did compress some. My total MB for the compressed file was from over 30MB to 12MB. So I opened up my gmail, clicked compose, and attached the compressed file that had all my resized pictures and documents in it and send it.... wa la! I could never have figured it out without all the tips others put on this site and others so i thought I would write some help for once!

*Lastly, the reason I created the compressed file to put everything in EVEN though my pictures had already been resized to a good size from Picasa, was not to resize them smaller (because I don't think it did) but that was the only way I could send everything and my boss have it in one file. I tried to select all the files and keep the pictures in their own separate file (uncompressed) and it didn't attach any pictures.

Category: windows Time: 2008-08-30 Views: 1