Carl-Adam Brengesjö wrote:
why lock it to a single hash algorithm?
Because searches are expensive. Because hashing is expensive. Because
some algorithms (uuhash, crc32, compound md4) have weaknesses that
fundamentally undermine their purpose (uniquely identifying files).
Because some algorithms don't lend themselves to incremental
you can't know the hash of the file you're looking to unless you have
downloaded a *.sfv (what algorithm? is it called sfv, or?) or MD5 (or
simlar) file telling it.
MD5 and CRC32 are crude. If we support a single hash, as three clients
already do, sites such as ShareReactor, ShareLive, or FileNexus will pop
up with direct download links for the DC supported hash, allowing our
users to start a download from a hash.
Here's a rough overview of how BCDC does hashing (some of these steps
are only visible when it's advertising itself as BCDC): It crawls your
entire share in a low-priority thread (it completes as the client is
running, it doesn't block startup) hashing files and adding the full
hash tree to a database. When it returns a search result, it will
replace the hub name field with TTH:<hash>. When connecting to another
client, it includes TTH in its $Supports list; when a fellow TTH
supporting source is found (when downloading files in the user's queue),
it gets the full hash tree (once) using a new client to client command:
$GetMeta. This tree can be used to verify both segments and the whole file.
If I seem brief... it's because I am. We've had plenty of excellent
discussion in this on the DC Dev public and private hubs, as well as on
the DC++ forum
) over the
course of the last year. Tiger Tree Hashes were chosen for some unique
properties of the root file hash and of the tree of segment hashes.
This is really a Good Way(tm) to do hashing... otherwise I wouldn't
bother bringing it up.