Compression Comparison Guide

Discussion in 'Reviews & Articles' started by peaz, Nov 30, 2002.

  1. peaz

    peaz ARP Webmaster Staff Member

    "For an eternity, the ZIP format has reigned supreme. However, all good things must make way for better things. Today, we will compare the ZIP format against 10 of its most promising competitors! Who will take the crown in data compression? Enter and find out!"

    Please feel free to comment or discuss about this article.
     
  2. Felix187

    Felix187 Newbie

    great job!

    great job... i found the article to be quite informative. makes you wonder why we're all still using ZIP when it seems like SBC is the way to fly these days. or RAR, even.
    also, i've got a request for the next revision: namely, a few different compressors to include.
    personally, i'm interested in gzip, i wonder how it does. also, the Mac compression .hqx is of interest to me.
    i'd appreciate it if you'd look at these compression schemes for an update.
    thanx again for a great read!
    -Felix
     
  3. Worf

    Worf Newbie

    UNIX compression programs?

    Funny - the 'big three' compression programs on UNIX were missing - compress (ick - anyone ever use these anymore?), GZip (the most ubiquous, often in the form of tarballs (.tar.gz or .tgz)), and bzip2 (again, with tarballs, .tar.bz2).

    Now, all tend to require the use of tar with them, which is why GNU tar includes options to automatically compress with GZip (-z), compress (-Z), or bzip2 (-j or -I, depending on the version - go figure).

    Would be nice to see if these "foreign" formats hold up - tar.gz in particular since WinZip can handle them natively (although, naively - it isn't the best at viewing them, but it does handle them).

    (I use Linux/UN*X systems daily, so my Windows machines often have Win32 versions of these utilities installed so I can view them without winzip).
     
  4. peaz

    peaz ARP Webmaster Staff Member

    Hmm, i do agree that we should also include other formats from other platforms. However, all of us rojakpotters are windows based users, therefore, i think this is why Adrian did not included those formats in this guide.

    However, if those formats can be used in the windows platform please do tell us!

    I'm a non-unix user so i'm not very sure bout this. :D

    Cheerz
     
  5. Adrian Wong

    Adrian Wong Da Boss Staff Member

    Hello everyone! ;)

    I guess I should have mentioned it... but the guide was only meant to cover data compressors that work on Microsoft Windows, specifically Windows XP. That's why UNIX and Mac data compressors were not included in the guide. If I were to open up the field to include data compressors from other operating systems, it would be way too complex for me.

    The reason I wrote the guide was to examine the performance of the various data compressors available for the Windows platform. This will allow many of our readers to directly compare the performance of the data compressors over a variety of datasets.

    I never had the intention of testing formats that are only supported by other operating systems as, for most part, it would not mean much to the average Windows user.

    But if any of those formats can be handled natively in Windows, I would love to test them out for the next revision. Like peaz, I'm not familiar with UNIX. :)
     
  6. eitje

    eitje Newbie

    gzip & bzip2

    gzip executable (command line):
    http://www.gzip.org/#exe

    bzip2 executable (command line):
    http://sources.redhat.com/bzip2/#bzip2-latest

    in my own research (for some products my company produces), i've found gzip libs work faster than bzip2, but bzip2 libs are half again the size of what gzip and zip create.

    also: interesting review. i'm not sure that the "compressor efficiency rating" really does you any good. as you said in the review, you can't always take what it's telling you @ face value... and that's usually a sign of a poor method of measurement....

    finally: i was having connectivity issues with your site throughout the review. not so bad, all i had to do was hit refresh, right? well, sometimes, it'd kick me back to the front page and kill my back button history. so i'd have to jump directly to the place in the article where i'd just been, but THAT kicked me back to the frontpage as well. so i'd have to re-enter the article, then travel through all the pages i'd just read, because the dropdown to jump to where i'd just been kicked me back to the main page too. now we're talking pain in the ass. fyi. :)

    congratulations on being back! hope you're able to stick around for a while! :)

    [EDIT: on a windows platform, i'll tend to use UltimateZip (http://www.ultimatezip.com/) for all my archive DEcompression needs. it supports DEcompression in a ton of formats... and DEcompression spped of various utilities is something i'd like to see examined in the next revision as well! :)]
     
  7. Adrian Wong

    Adrian Wong Da Boss Staff Member

    Hello eitje,

    Thanks for the links. Will check them out.

    Well, the Efficiency Rating was done out of curiosity. It's NOT a measurement. It's merely an arbitrary value created from the speed and the compression rating. What I meant by not taking it at face value is that a high efficiency rating does not mean the compressor is better.

    Often, the compression performance of the data compressor is more important than the speed at which it compresses (within a reasonable limit). For some people, however, speed is more important. But the user will never choose a compressor based on its efficiency rating.

    The efficiency ratings actually serves to prove a point - compression speed is also important. Most of the designers of such software placed too much emphasis on compression performance.

    Of course, compression performance IMHO is very important but so is compression speed. I think some emphasis should be given towards compression speed. And I hope the efficiency rating will serve to remind designers of the newer data compressors that while they may get the oohs and the ahhs with their powerful compression algorithms, they cannot humble the old boys (ZIP and ARJ) when it comes to speed (and efficiency).
     
  8. peaz

    peaz ARP Webmaster Staff Member

    Re: gzip & bzip2

    Yeah, It's great to be back.

    Anyway, I'm the one in charge of the back-end of the site. You mentioned that kicks you to the main page when you press the drop down jump menu? This doe snot seem to happen on my PC here. Anyway. I would like to know what version of IE or Netscape that you are using. Also, do you use any sort of Internet privacy blocking software? It is known that Norton's Internet Security that has the ability to block browsers from knowing the previously accessed site does not work very well with our site. Now, the reason is this. To prevent leeching, we have entered codes in our pages to check the previous accessed. If it's detected that those pages are not called by our own main pages, it loads up the main page. This seems to be happening in your case.

    Please tell me more as I would like to know if anything else seems to be the problem.

    Thanks for pointing this problem out. We know it's a pain in the @$$ but to prevent bandwidth leeching, we had no choice.
     
  9. eitje

    eitje Newbie

    Re: gzip & bzip2

    yep. following the links works fine, and now that i'm not getting 403.9s anymore, that's not as much of a hassle. on shorter articles. but when i use the dropdowns, two panes turn into "Action Cancelled" pages, then jump to the homepage.

    ewwwww. doe snot. ;)

    IE 5.00.3315.1000
    SP2; Q323759

    we have a firewall running on our router, but no privacy software.
     
  10. peaz

    peaz ARP Webmaster Staff Member

    Re: gzip & bzip2

    Bare with me, I'm known to type lots of typos... :D

    Hmm your problem seems a little weird. I couldn't in anyway simulate it here. Well, I'll continue to try to work on this.

    Cheerz
     
  11. snn47

    snn47 Newbie

    Nice work, interesting read and you even found/tested a few formats I didn't know about:).

    However I do have to disagree with your conclusion.

    I summed it up in

    Which compression-program/-format for archiving which file(s)/-format http://ehome.compuserve.de/snn47/_GNU-DIY-Benchmark/GNU-Bench_v0-1.pdf
    and with input from others and a lively discussion at http://forums.overclockers.com.au/showthread.php?s=&threadid=89560

    What time, how efficient is compression, can packed archives be repaired

    You tested only various different file formats with medium sized files. not folders with various but relatvely small files .....
     
  12. snn47

    snn47 Newbie

     
  13. elh

    elh Newbie

    Adrian: which film did you use to test the DivX compression, at first the numbers that SBC gave seemed riddiculuosly high, but I gave it a try. As I expected I could not not reproduce the extraordinary compression ratios on divx files... I tried the same settings u've used as well as multitude of others, but none of them helped me to go past 97% compression ratio :( If anyone has ideas regarding this, tell me...
     
  14. Adrian Wong

    Adrian Wong Da Boss Staff Member

    Hello snn47,

    Evaluation of data compressors is very difficult. Especially since different compressors have different levels of compression available. That's why I chose to use only the maximum compression level. At least we will know what's the best compression we can see from each of them.

    As for the test files, I prefer to use real-world files, instead of arbitrarily created "benchmark files". I did not specifically look for medium-sized files. I just chose as wide a range of filesizes as I could but we have to be realistic. Certain file formats like WAV will produce very large files. Of course, I could always use the beeps and burps WAV files cause they would be really small. But in real working environments, I doubt anyone would be working with lots of tiny WAV files. Again, the emphasis is on working files, not specially created benchmark files.


    Hello elh,

    I used two mini-clips created by Chai. I can send you the files if you wish to reproduce the test results. I assure you the results are accurate as far as the test files go. Just tell me how to send the files to you. They are two DIVX files weighing about 50MB in total.

    But fret not because no matter what DIVX files we use, they are used for all the compressors so if a particular DIVX file is very compressible, the other data compressors should benefit too.
     
  15. Worf

    Worf Newbie

    Only Windows XP Compressors? Stuffit is a *MAC* compressor!

    Stuffit is actually a *MacOS* based compression utility - it has several advantages over the other Mac-based compression utilities out there. Namely, it was excellent at putting all the data in the data fork, so no special BinHex or MacBinary conversion was needed (helpful to preserve other metadata, but not necessary). I know there was a Windows version of it, but always discounted it as its 'native' environment was a Mac (for the same reason most Windows users don't try tar-gzip or tar-bzip2).
     
  16. snn47

    snn47 Newbie

    Hello Adrian,

    thank you for your reply.

    I agree with the use of real-world files, however the one we might choose might favor one algorythmn more then others, unless it's very complex.

    For my Graphic tests I scanned a LH-Boarding Slip into Tif-Format, and converted this into all other formats.

    All files used should be availble if someone wants to redo/verify results, which would require at least one CD and might pose a problem for WAV-files due to copyright, but it would allow to make tests compareable.

    I have to disagree with just using a single large file.

    The main use of single files compression is when I want to compress a large file onto a disk/CD and avoid having to split it. Here it's important to know what is the best compression format/setting for the task.

    However if you want to archive folder/volumne containing many different file formats e.g. like internet downloads (html, gif, jpeg..) then it's the efficiency for small files of the various formats and the overall effeiency that counts.

    I opt normally for efficiency rather then short compression time, since on may Laptop and slower drives the compression time is often compensated by slower drives access.


    As for reference files maybee there is a way to multiply fragments of a picture, video into larger resolutions, sizes for archieving the referenc data.

    But this would require the willingness of more then one site to generate a unique applicable test source.
     
  17. Adrian Wong

    Adrian Wong Da Boss Staff Member

    Hello snn47,

    Actually, the files I used were not specifically created for the tests. All of them were either obtained online from public domain sources or were taken from our own PCs (our work files). As such, they should accurately reflect real-world files.

    Also, I don't mind anyone making use of our test files as some kind of standard. I can write a CD of the entire test compilation for whoever's interested. All the files in the compilation are either public domain or belong to us so there's no copyright issue.

    As for favouring one algorithm over another, well, I frankly have no idea what to say to that. To do that, I will have to do quite a bit of research to see how I can tweak the files to favour one compressor over another. Unfortunately, I really do not have the time for such hanky-panky. Hehe...

    Frankly, irrespective of the results, I'm still keeping to my favourite data archiver. I only did this comparison out of interest. Just to see how well they all perform. There's really more to data compressors than just the performance. But for those who are interested to see how well they perform, well, that's what the comparison is for.

    Again, if you wish to verify the results, I would be happy to let you download our files. We have nothing to hide. :)

    Hmm... it's not really useful to create multiple small files. That would just be skewing the results in favour of certain compressors. Some files are inherently small while others are, by nature, very large. For example, GIF and HTML files tend to be small (less than 500kb) while DIVX movies are large (more than 20MB).

    IMHO, it's best to use working files because they will accurately reflect the situations you will encounter in the real world. Even if some files are large, that's okay. What we want to know is - if compressor X is better than compressor Y in the comparison, it must be better than compressor Y when I actually compress the same files. There's really no point (inaccurate even) to specifically break down large files to smaller sizes.
     
  18. peaz

    peaz ARP Webmaster Staff Member

    Just thought that I should share this experience i had with compressors...

    I use to do lots of Flash, and once, I had to submit a few versions of a particular Flash file for approval. The only difference between the files are a few text changes, all the graphic elements remains the same. So, this set of files should be very compressible in my opinion. Single file, maybe not, but the whole set... YES!

    Well, I tried compressing them with WinZip 8 and WinRAR 2.5 (it was quite a while ago)

    The result was sort of expected. WinZip treats each file as a distinct file and compresses them seperately also! As the files was not very compressible(already compressed in nature) the whole zip file was quite huge.

    However, with WinRAR, the result was interesting. I used the highest possible settings, with the multimedia option checked. WinRAR manages to see that all the files are almost the same and therefore, the first file was compressed as much as it could, then it 'seems' to store the other files difference only!

    As such, the total compressed RAR file size was only a little bit larger than a single filesize of that set of files!!! :D

    Interesting!
     
  19. elh

    elh Newbie

    First of all, thanks for the response ans sorry as I was kind of offline for a week or so..
    Anyway, have you actually tried compressing "commercial" divx? Just a casual 700MB film? I tried every setting on several films, but as I said before - it's no breakthrough, am I right? I see no other application of SBC, winrar has far more developed interface etc. So the only use of SBC could be packing ~1GB divx's on a single CD, at least for me.
    Anyway, I asked you the question, because I might be wrong, maybe there is some basic error. I typed the following:
    sbc c -m3 [-b1 / b63] -hn archive film.avi
    This should give the best comprresion ratio, right? I'd love to download the files from you, but unfortunately I'm still on the dial-up, so I'd rather not :(

    Regards,

    Elh
     
  20. Chai

    Chai Administrator Staff Member

    We can't use 'commercial' DIVX for obvious copyright issues, so we have to use our 'home videos' to do the testing, just in case we might need to distribute the testing files.

    But the problem is, I was not able to use the maximum compression when compressing the videos when compressing in Adobe Premier. So the videos can still be compressed a little more.

    The 'commercial' DIVXs use maximum compressiong with VBR, and to optimize the file size and quality, they have to go thru a few more passes, instead of just one in Adobe Premier.

    So that's the reason why the 'commercial' DIVXs are not as compressible as the test DIVX files we used during testing. You can't compare your video's compression ratio with our test file.
     

Share This Page