Re: [dcdev] File hashing
Fredrik Tolf
2003-11-30 5:08
Direct Connect developers

Todd Pederzani writes:
> > On Saturday, November 29, 2003, at 07:57  PM, [email protected] > wrote:
> > > I've implemented it in my DC++ mod (BCDC++).
> >
> > It stores hashes of files it's hashed, so it needs only recalculate > > them when they change.
> >
> > Just for information use, for those who can't go to Sedulus' website > and download the source (which admittedly I have not yet done either), > BCDC uses Merkle hash trees and the Tiger algorithm to hash files.  > This the primary scheme that Shareaza uses.  A little more information > can be found in the THEX 0.3 draft:
> > http://www.open-content.net/specs/draft-jchapweske-thex-02.html

Thanks for the link. I'll read it soon (must ... go ... to ... bed).

> On the DC++ forum and in the public/private hubs we've talked more > about the subject, including vague musings over the possibility of > hashing files according to other schemes, to perhaps benefit from > Sharereactor, etc.  There's a bit of history in the area of > thinking/talking about hashes.
> > Fredrik, what exactly did you want to know? ;))

Well, I just wanted to know the thoughts on making it faster.

If you want to know my thoughts, the only solution I've managed to
think of is hashing, say, the first 64 kBs or so of each
file. Admittedly, that is far from an optimal solution, but combined
with file size comparison, I have thought that it just might be better
than having to wait a week or so for my share to be hashed... ;-) (in
fact, with my HDD transfer speed of 30MBs/s it would "only" take just
a little more than an hour)
I've also played a bit with the idea that the whole file could be
hashed "on-demand" in the client<->client protocol. I've almost
decided to reject that idea, though, since it would still take a
minute or so to hash a fairly large file, especially on low-end
computers. It would be good in the way that the result of that
operation could be cached, though.

I'll check out the DC++ forums, though.

> (and what do you work on, if you don't mind me asking)

I was writing on dcpro (http://sf.net/projects/dcprod), but I've given
it up for a better implementation (I didn't really like the name
either...) that isn't public yet.

Fredrik Tolf

DC Developers mailinglist