Upcoming: ZLib block compression
Moderator: Moderators
-
- The Creator Himself
- Posts: 296
- Joined: 2003-01-02 17:15
Upcoming: ZLib block compression
From extensions.txt:
Feature: GetZBlock
Usage: Instead of $Get and $Send, use "$GetZBlock <start> <numbytes> <filename>|" where <start> is the 0-based (yes, 0-based, not like get that's 1-based) starting index of the file used, <numbytes> is the number of bytes to send and <filename> obviously is the filename. The other client then responds "$Sending|<compressed data>", if the sending is ok or "$Failed <errordescription>|" if it isn't. If everything's ok, data is sent until the whole uncompressed length has been sent.
$Filelength or a similar command is never sent because it's not possible to know from the beginning how many bytes will be sent, instead, check how many decompressed bytes you've received and act accordingly. $Sending is needed to be able to distinguish the failure command from file data. Only one roundtrip is done for each block though, minimizing the need for maintaining states.
Compression: Compression is done using ZLib (v 1.1.4 in DC++ 0.21's case), using default compression level. The compression level can of course be changed by the implementator to reduce CPU usage, or even just store compression in the case of non-compressible files, but it is up to the compressor to decide.
Comments?
Here are some things to decide: When downloading, how should we choose the block size? Which files do we compress, and which do we just pass through zlib to get the adler32 integrity checking?
I'm also considering a similar function, GetCRC32Block and perhaps a clean GetBlock that work more or less the same...what about these?
Feature: GetZBlock
Usage: Instead of $Get and $Send, use "$GetZBlock <start> <numbytes> <filename>|" where <start> is the 0-based (yes, 0-based, not like get that's 1-based) starting index of the file used, <numbytes> is the number of bytes to send and <filename> obviously is the filename. The other client then responds "$Sending|<compressed data>", if the sending is ok or "$Failed <errordescription>|" if it isn't. If everything's ok, data is sent until the whole uncompressed length has been sent.
$Filelength or a similar command is never sent because it's not possible to know from the beginning how many bytes will be sent, instead, check how many decompressed bytes you've received and act accordingly. $Sending is needed to be able to distinguish the failure command from file data. Only one roundtrip is done for each block though, minimizing the need for maintaining states.
Compression: Compression is done using ZLib (v 1.1.4 in DC++ 0.21's case), using default compression level. The compression level can of course be changed by the implementator to reduce CPU usage, or even just store compression in the case of non-compressible files, but it is up to the compressor to decide.
Comments?
Here are some things to decide: When downloading, how should we choose the block size? Which files do we compress, and which do we just pass through zlib to get the adler32 integrity checking?
I'm also considering a similar function, GetCRC32Block and perhaps a clean GetBlock that work more or less the same...what about these?
-
- Posts: 506
- Joined: 2003-01-03 07:33
-
- The Creator Himself
- Posts: 296
- Joined: 2003-01-02 17:15
obviously. that's written in extensions.txt, that all extensions have a supports part as well...ivulfusbar wrote:i wouldn't mind a $Support GetZBlock or something similiar in the handshake also.
zlib is many times faster, requires a lot less memory per (de)compression stream, and generates data faster (you have to feed a lot of data into bzip2 before you get any compressed data out of it, the same for decompression) (i tried with bzip2 first =)ender wrote:Why ZLib and not BZip2 ?
Re: Upcoming: ZLib block compression
this is opening up the way for multi-user download iianm,
which is good :)
I'd rather have dc sucking bandwidth than cpu.
/sed
which is good :)
I don't know anything about compression and block-sizes, but I'd like to point out that if the user may specify this, there should be a low and high bound to it. I.e. min 1k, max 1mb or something (values taken from /dev/random ;) ), otherwise bw or cpu would be suffering, no?arnetheduck wrote:Here are some things to decide: When downloading, how should we choose the block size?
I think I'd have to go with the uploading client having a list of extensions which it may compress, instead of a list of extensions not to compress.arnetheduck wrote:Which files do we compress, and which do we just pass through zlib to get the adler32 integrity checking?
I'd rather have dc sucking bandwidth than cpu.
/sed
http://dc.selwerd.nl/hublist.xml.bz2
http://www.b.ali.btinternet.co.uk/DCPlusPlus/index.html (TheParanoidOne's DC++ Guide)
http://www.dslreports.com/faq/dc (BSOD2600's Direct Connect FAQ)
http://www.b.ali.btinternet.co.uk/DCPlusPlus/index.html (TheParanoidOne's DC++ Guide)
http://www.dslreports.com/faq/dc (BSOD2600's Direct Connect FAQ)
-
- The Creator Himself
- Posts: 296
- Joined: 2003-01-02 17:15
The main value comes from the adler32 that's included in the zlib compression (like crc32, but faster)...and I'll probably make some sort of dynamic control whether to compress or not...record which files compress well and which don't, or maybe even change compression level on the fly if it's not compressing any good...
But adding zlib introduces another src dependency and compression won't buy you much (on mp3, divx, xvid, ogg or mpc etc which I imagine are the main file types exchanged). To me it sounds like feature bloat.
Simply add adler32 or crc32 code. There are many optimized ones available for free.
OTOH its kind of cool with dynamically adaptive compression so I can understand you...
Simply add adler32 or crc32 code. There are many optimized ones available for free.
OTOH its kind of cool with dynamically adaptive compression so I can understand you...
I agree with coma in that most content is already compressed with lossy compression, which is far more effective than zlib's deflate method. Compressed file xfers would be an excellent feature for uncompressed content (eBooks and wavs are the examples that come to mind), and the few uncompressed graphics files out there, but wouldn't work at all for most graphics, video and music, which I'd IMAGINE most people are downloading with DC++. However, one of the great things about zlib's compression algorithm is that it pretty much never expands the size of the data, so all you'd lose is CPU cycles. Therefore, it wouldn't ever HURT to add some kind of auto-compression functionality.
One solution would be to allow users to configure compression based on either search file type (no for video, audio, compressed, yes for documents, executables, etc) or by filename extension (no for avi, zip, rar - yes for txt, nfo, sfv, doc, wav, au, smi, sub, no for everything else). The ability to specify a compression ratio in the settings would also be nice.
I think an easier and surefire way to implement the determination of whether to use a compressed file xfer or not would be to add another context menu item - in addition to Download, you'd also have "Compressed Download" - that way the user is in full control of whether or not their downloads are compressed. If they choose to use compression on an already compressed file, it would be their fault that they were wasting CPU and saving no bandwidth. As with my proposal above, the compression level could be in the settings. However, unlike the other two proposals, you'd have to save whether or not an xfer was a compressed dl in the dowload queue or resumed file transfers wouldn't use compression when they started up again.
I think a much more useful area for compression would be in hub <-> hub communication. I don't know if there is functionality in the hub software for sharing search and/or chat messages, but if there is, compression could help quite a bit (if it's not already there).
However, if the inter-hub communcations protocol is anything like the client <-> client protocol, compression may be tricky... compressed communication works best if your protocol uses a block-based protocol with the message length in the header, so that individual messages can be easily removed from the compressed datastream.
But if this can be pulled off, this would be great for servers with plenty of additional CPU but a lack of available bandwidth. With CPU speeds growing much faster than available (affordable) bandwidth, this situation is probably quite common.
FWIW, I've experimented with bzip2 compression with network servers, but came to the same conclusion as arne did - bzip2's memory and CPU usage are too great as compared to zlib, which gives an adjustable memory footprint (although at the expense of compression ratios) I don't know how much of a negative this would be for client <-> client communication, but it would be a big problem for hub <-> hub commucation.
Sorry I got off-topic, but I think compression is a very interesting topic with respect to p2p protocols.
One solution would be to allow users to configure compression based on either search file type (no for video, audio, compressed, yes for documents, executables, etc) or by filename extension (no for avi, zip, rar - yes for txt, nfo, sfv, doc, wav, au, smi, sub, no for everything else). The ability to specify a compression ratio in the settings would also be nice.
I think an easier and surefire way to implement the determination of whether to use a compressed file xfer or not would be to add another context menu item - in addition to Download, you'd also have "Compressed Download" - that way the user is in full control of whether or not their downloads are compressed. If they choose to use compression on an already compressed file, it would be their fault that they were wasting CPU and saving no bandwidth. As with my proposal above, the compression level could be in the settings. However, unlike the other two proposals, you'd have to save whether or not an xfer was a compressed dl in the dowload queue or resumed file transfers wouldn't use compression when they started up again.
I think a much more useful area for compression would be in hub <-> hub communication. I don't know if there is functionality in the hub software for sharing search and/or chat messages, but if there is, compression could help quite a bit (if it's not already there).
However, if the inter-hub communcations protocol is anything like the client <-> client protocol, compression may be tricky... compressed communication works best if your protocol uses a block-based protocol with the message length in the header, so that individual messages can be easily removed from the compressed datastream.
But if this can be pulled off, this would be great for servers with plenty of additional CPU but a lack of available bandwidth. With CPU speeds growing much faster than available (affordable) bandwidth, this situation is probably quite common.
FWIW, I've experimented with bzip2 compression with network servers, but came to the same conclusion as arne did - bzip2's memory and CPU usage are too great as compared to zlib, which gives an adjustable memory footprint (although at the expense of compression ratios) I don't know how much of a negative this would be for client <-> client communication, but it would be a big problem for hub <-> hub commucation.
Sorry I got off-topic, but I think compression is a very interesting topic with respect to p2p protocols.
Brian
Many encryption libraries include compression support; I see no reason to include a compression-but-not-encryption mode, especially for client-client connections. Of course, if one writes one's own encryption stuff, one has to include one's own compression calls as well, but seeing as people more competent with cryptography than, I suspect, just about anyone here still managed to produce SSL v2.0 with vulnerabilities they needed SSL v3.0 and TLS to fix, I'd be wary of inventing yet another key exchange and encryption protocol.
Coologic:
You're absolutely right about not inventing your own key exchange protocols... it's far too easy to make mistakes that experienced cryptographers can find weaknesses in.
Adding SSL / TLS encryption to existing TCP/IP connections is a no-brainer with a library such as OpenSSL, where you just initialize a few variables, create the SSL contexts then wrap all your socket calls with SSL_* equivalents. Retrofitting code to use these functions is quite easy, although supporting both SSL and cleartext (which would obviously be necessary) is a little harder.
That being said, is there really a good reason to do this? I can see the purpose for CERTAIN things, like client <-> client private messages, but implementing encryption in search messages & file lists isn't really feasable without redesigning the DC protocol. Making a full cryptographically secure file sharing / chat system is a huge amount of work, and really requires the entire architecture being designed from the ground up with cryptography in mind. Silc (silcnet.org I think) is an example of an encrypted chat system, and the key management code makes up a good part of the code. Add public file sharing / searching to the mix, and you have the makings of a very complex project.
Since compression is much easier to implement (one small library, 5 or 6 function calls) and is already part of the DC specification, I have a feeling that except in sensitive aspects of the DC system, you won't see encryption in DC++ (at least until the protocol gets expended)
You're absolutely right about not inventing your own key exchange protocols... it's far too easy to make mistakes that experienced cryptographers can find weaknesses in.
Adding SSL / TLS encryption to existing TCP/IP connections is a no-brainer with a library such as OpenSSL, where you just initialize a few variables, create the SSL contexts then wrap all your socket calls with SSL_* equivalents. Retrofitting code to use these functions is quite easy, although supporting both SSL and cleartext (which would obviously be necessary) is a little harder.
That being said, is there really a good reason to do this? I can see the purpose for CERTAIN things, like client <-> client private messages, but implementing encryption in search messages & file lists isn't really feasable without redesigning the DC protocol. Making a full cryptographically secure file sharing / chat system is a huge amount of work, and really requires the entire architecture being designed from the ground up with cryptography in mind. Silc (silcnet.org I think) is an example of an encrypted chat system, and the key management code makes up a good part of the code. Add public file sharing / searching to the mix, and you have the makings of a very complex project.
Since compression is much easier to implement (one small library, 5 or 6 function calls) and is already part of the DC specification, I have a feeling that except in sensitive aspects of the DC system, you won't see encryption in DC++ (at least until the protocol gets expended)
Brian
-
- The Creator Himself
- Posts: 296
- Joined: 2003-01-02 17:15
1) compression is not in the protocol (yet =)
2) I just had a look at zlib and saw that you can adjust compression ratio dynamically...so, what I'll probably do is compress a few blocks, if it compresses well, go on compressing, if not just pass through with 0 compression for a few megs, then perhaps try compressing again for a bit (an iso with different files in it...?)...
3) Block sizes...this is an area for experimentation, and probably a good advanced user setting...it'll probably end up around a meg as default setting (since the data stream has to be stopped for each block for some conversation between the clients, we don't want this to be done too often)
4) code bloat, FYI the feature/zlib is already compiled in (all 0.21+ releases), it's just not enabled yet...
2) I just had a look at zlib and saw that you can adjust compression ratio dynamically...so, what I'll probably do is compress a few blocks, if it compresses well, go on compressing, if not just pass through with 0 compression for a few megs, then perhaps try compressing again for a bit (an iso with different files in it...?)...
3) Block sizes...this is an area for experimentation, and probably a good advanced user setting...it'll probably end up around a meg as default setting (since the data stream has to be stopped for each block for some conversation between the clients, we don't want this to be done too often)
4) code bloat, FYI the feature/zlib is already compiled in (all 0.21+ releases), it's just not enabled yet...
Oops, I thought that your initial post was quoting the DC protocol spec. My bad.arnetheduck wrote:1) compression is not in the protocol (yet =)
Conversation? Do you mean the dynamic block resize code? (ie: Sender to receiver: turn up the compression) There still should be some way for the client to specify whether or not they want to use compression on a per-download basis, whether it's by media types (video, documents, etc) with whether or not to try compression with them (or allowing them to specify a default compression level) OR by adding an item to the context menu, giving the user the option to either download or compressed downloadarnetheduck wrote: 3) Block sizes...this is an area for experimentation, and probably a good advanced user setting...it'll probably end up around a meg as default setting (since the data stream has to be stopped for each block for some conversation between the clients, we don't want this to be done too often)
Users should also be able to disable compressed uploads (from their client) so that users with shitty CPUs won't have their client taking up 90% of their CPU. This could be done by allowing a user to specify a maximum compression level (with max of 0 being uncompressed), or by allowing them to turn off the feature altogether.
As for code bloat, until DC++ starts including gigantic activex controls like the web control, DC++ will not even enter the realm of being bloated (IMO). The author took the time to code this in a lower level langauge so that adding features like on-the-fly zlib compression won't hurt performance. And, since bzip2 and zlib both are freely distributable, there isn't any issue of having to download additional dependencies.
I look forward to checking out the new functionality!
Brian
Maybe it should be never enabled and both zlib and bzip2 could be better dynamically linked and distributed as binaries... bzip2 is distributed in form of DLL, I don't know about zlib. I'm getting tired of seeing MS VC++ to check those subprojects on every Build.arnetheduck wrote:4) code bloat, FYI the feature/zlib is already compiled in (all 0.21+ releases), it's just not enabled yet...
Btw. I'm also can't see any rationale for GetZBlock command. Very little percentile of what I'm download can be compressed. And I bet it is not just my case.
In the age of super-boredom/hype and mediocrity/celebrate relentlessness/menace to society --KMFDM
The compression itself isnt the major benefit, actually. I could well be without that, and only have a getBlock command, which gives nice possibilities for interleaving commands and data, and a well-defined way to transfer chunks (end transfers).yilard wrote:Btw. I'm also can't see any rationale for GetZBlock command. Very little percentile of what I'm download can be compressed. And I bet it is not just my case.