filelist compression

Use this forum to flesh out your feature request before you enter it in <a href="http://dcpp.net/bugzilla/">Bugzilla</a>.

Moderator: Moderators

Locked
taltamir
Posts: 9
Joined: 2003-05-13 17:41
Location: Hell (Coppell, Texas, U.S.A.)
Contact:

filelist compression

Post by taltamir » 2005-11-07 03:02

Ok, is it legally feasable to use rar to compress the file list? bz2 is nice for on the fly compression while uploading, but its just not as well compressed as a rar file (if something can beat rar at compressing it, it would be preferable ofcourse). My file list is relatively small, and it weighs at 500kb. I often see people with file lists over 5mb... and that takes forever to download when the average upload speed on DC is 5kbs

Just recompressing my file list from bz2 as a rar I reduced it by 5.36%
I will also be suggesting this to any other DC team (I hope it spreads to ALL DC incarnations).
Just imagine if every user on DC was to reduce his file list by 5%... thats at least a few seconds saved per instance of someone downloading your file list... which happens VERY often (match file list on alternate sources... so, every time you find an alternate source, or any source for that matter). I am getting giddy just thinking about the total speedup of P2P networks worldwide....
I do not have a superman complex; for I am God, not superman.

Todi
Forum Moderator
Posts: 699
Joined: 2003-03-04 12:16
Contact:

Post by Todi » 2005-11-07 05:47

Test, using winrar, Best compression:

filelist.xml - 30,7 MB (32196785 bytes)

filelist.xml.bz2 - 9,86 MB (10345989 bytes)
filelist.rar - 8,71 MB (9142474 bytes)
flliest.zip - 11,3 MB (11913300 bytes)

--

filelist_2.xml - 3,86 MB (4050702 bytes)

filelist_2.xml.bz2 - 1,18 MB (1244806 bytes)
filelist_2.rar - 1,13 MB (1187715 bytes)
filelist_2.zip - 1,39 MB (1468179 bytes)

--

Conclusion: Rar is a little better, although i think the decision to use bz2 was based more on it's open soure qualities than it's compression ones. Rar is most likely hard to use in an open source project without issues.

Itanium
Posts: 17
Joined: 2005-11-02 07:38
Location: Spain

Post by Itanium » 2005-11-07 06:08

A good alternative to reduce clients bandwidth load could be set by default after DC installs the options Automatically match queue for auto search hits as Disabled and Automatically search for alternative download locations as Enabled.

That could be enought to keep updated of sources our queues while improving speed at all.
Trance... my way of life

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Re: filelist compression

Post by GargoyleMT » 2005-11-07 10:03

taltamir wrote:Ok, is it legally feasable to use rar to compress the file list? bz2 is nice for on the fly compression while uploading

It's not on the fly, the list is created on demand at most once every 15 minutes. It would be too CPU intensive to use it on the fly. Rar archives are a bit smaller, but if you wanted a more reasonable suggestion, 7Zip is smaller and more free. (I don't think it's a likely change, though -- it would mean support for three compression libraries in DC++, and 7Zip is very CPU intensive, both for encoding and decoding [I believe].)

However, there is a better solution than either yours or the user below: partial file lists. These are small XML lists of just a single directory (and perhaps its subdirectories), which would enable directory-targeted matching, and progressive browsing. DC++ has rudimentary support for it now, though the ADC protocol.

Itanium wrote:A good alternative to reduce clients bandwidth load could be set by ...

The existing settings help preserve hub bandwidth, which is a much more limited commodity than client bandwidth. More hub bandwidth means the ability to support more users, which means more files and sources.

taltamir
Posts: 9
Joined: 2003-05-13 17:41
Location: Hell (Coppell, Texas, U.S.A.)
Contact:

Post by taltamir » 2005-11-07 14:05

Todi your tests show even BETTER compression with rar, at 10% size decrease of bz2. 10% is by no means negligible. Which is why I suggest adding more compression formats, and using them on an ideality basis. In fact, I would suggest adding several open compression formats and through testing find which compress what kind of file types best, and use the best one per each file uploaded. (eg. zip compresses png files better then rar, which compresses jpg files better then zip and ace. ace sometimes compresses exe files better then zip and rar... something similar, but with open compressions rather then proprietry ones)

GargoyleMT, I know the list isn't compressed on the fly. What I meant was that bz2 is there to be used on the fly for file transfers, but its not ideal for using on file lists on which the highest rate of compression would be desireable.

As for 7z, my tests indicate it to NOT compress aswell as rar. 7z is using an older zip format with gigantic dictionaries. While winrar and winzip and winace all take at MOST 2MB of ram, 7z takes hundreds of megs of ram, and if you run out of ram the compression fails! (it starts using hdd scratch which slows it down to nothing, it will burn out your hdd before finishing a compression, not to mention it could take YEARS to compress on the HDD rather then minutes).

I don't really see anything wrong with having DC support more compression formats. I can see why you wouldnt want to implement it because it would require work. I don't see a reason NOT to accept it if someone chooses to implement it. Putting it on the list of "possible features" for someone to code wouldn't go amiss.

Todi, I was thinking more of ace and 7z and obscure formats... I did test zip and ofcourse it did not compress well (did you know that winzip 8 compresses better then winzip9, and that winrar makes smaller zip files then both of the above mentioned?). But thanks for running the tests, they just showed that my estimation of 5% savings was too frugal, 5-10% it is.

Itanium, not only would that waste hub bandwidth, but it will decrease the amount of sources found, and having more sources would result in faster download then compressing the file lists further would.
In fact, I wouldn't want to download a partial list either. the chance of missing sources is too high...


The only problems with implementing rar compression are:
1. Possible legality of it... so we should ask rarlabs for permission.
2. Non opensourceness, but I think we can live with it.
3. It will increase the program size slightly - which would be neglegible
4. It will require coding - which is why i put it up here for volunteers to grab...
I do not have a superman complex; for I am God, not superman.

Pothead
Posts: 223
Joined: 2005-01-15 06:55

Post by Pothead » 2005-11-07 17:44

taltamir wrote:The only problems with implementing rar compression are:
1. Possible legality of it... so we should ask rarlabs for permission.
2. Non opensourceness, but I think we can live with it.

1. Not gonna happen.
2. No it cannot be lived with, see 1.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2005-11-07 17:45

taltamir wrote:7z is using an older zip format with gigantic dictionaries.

This is a sidenote, but I thought that 7Zip achieved its best compression with LZMA (which incidentally is used in the DC++ installer [since it is a NSIS package]). Wikipedia's entry on LZMA doesn't indicate that it is derived from any of the algorithms used in ZIP.

The memory and CPU requirements would make it a poor fit for DC++, I think. But its license at least makes it possible.

taltamir wrote:In fact, I wouldn't want to download a partial list either. the chance of missing sources is too high...

In its worst case, PFL means one search per directory. Thus it's a bit wasteful compared to matching, but it is more of a compromise. Having an option to request PFL or a full list is a worthy option, in my opinion.

taltamir wrote:2. Non opensourceness, but I think we can live with it.

If the encoder and decoder are not available under a GPL compatible license, only the DC++ copyright holder, arnetheduck, can make an exception to the license so that everyone can link the code in.

It looks like the decoder is available, but I'm not sure it is license compatible. From what I've seen, the encoder may not have source code available at all.

Locked