hashing / multi-source downloading ...

Archived discussion about features (predating the use of Bugzilla as a bug and feature tracker)

Moderator: Moderators

Locked
phylacides
Posts: 1
Joined: 2004-01-16 23:35

hashing / multi-source downloading ...

Post by phylacides » 2004-01-16 23:44

although i'm quite new to the forums, i'm almost certain that this has been brought up hundreds of times before. the one thing dc++ doesnt have is multisorce downloading. and, as kazaa has shown, it isn't worth anything if you don't guarentee file integrity.

ive also been a rather loyal follower of shareaza, and by far its best feature is its hashing. i'm not familiar with it on a technical level, i believe its called tiger-tree hashing, but it allows any part of any file to be verified, therefore allowing for supurb multi-source downloading.

so, my question: obviously, the addition of a feature like hashing would be a huge step, though some may argue one with substantial benefit. how does everyone think about this?

-chris =D

BSOD2600
Forum Moderator
Posts: 503
Joined: 2003-01-27 18:47
Location: USA
Contact:

Post by BSOD2600 » 2004-01-17 00:04

BCDC, and a few other clients support file hashing (although BCDC has the best implementation of it).

Multi-source downloads has been brought up many times, use SEARCH! Take a look at this thread for an idea on how the developers feel about it.

Twink
Posts: 436
Joined: 2003-03-31 23:31
Location: New Zealand

Post by Twink » 2004-01-17 00:37

there are some mods with multisourcing, thou I think most get it from reverse connect

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Re: hashing / multi-source downloading ...

Post by GargoyleMT » 2004-01-19 19:45

phylacides wrote:i'm not familiar with it on a technical level, i believe its called tiger-tree hashing, but it allows any part of any file to be verified, therefore allowing for supurb multi-source downloading.
Good thought.

Tiger tree hashes are used in BCDC.


You can use other hash schemes to do incremental verification, tiger is just a nice way to do it it, since the hash of the hash of segments (of any size) is the same as the hash of the whole file (not segmented). eDonkey 2000, for instance, does incremental verification too, but with 9mb chunks, and they use (compound) MD4, which isn't recommended for new software, as it has vulnerabilities.

ClubBarf
Posts: 2
Joined: 2004-01-26 08:45

Multi-source will soon be pointless anyway...

Post by ClubBarf » 2004-01-26 09:11

Personally I would like to see hashing implemented in DC++, but not because of multi source downloads. At the moment it's possible for an ISP to cache P2P traffic from all major networks except DC - because all the other major p2p's use hashing (no point in caching something if you can't be sure it's not got any errors...)

The reason I'm taking so much of an interest is that I don't particularly want DC left behind as the big P2P networks all get cached by ISP's. With 70% of internet traffic being P2P, ISP's are beginnning to realise that by caching, they can relieve traffic and pay for the cache hardware in a month or two with the reduced traffic costs. And once your ISP's cache lets you max your download bandwith with most downloads, multi-source or not, are you gonna stick with a network that doesn't give you that massive advantage?

Pwwwwweeeeease can we have a standardised form of hashing implemented in DC++??? The other DC projects are likely to follow if DC++ or NMDC (or both) implements something like BCDC's hashing.

The P2P world is about to change, bigtime. Let's not have DC getting left behind... :wink:
When tempted to fight fire with fire, take note that the Fire Brigade usually uses water...

ClubBarf
Posts: 2
Joined: 2004-01-26 08:45

Post by ClubBarf » 2004-01-26 12:36

Dunno about the US, but I know that over here, in Europe, ISP's are going a bit mad for the caches - the ones who've actually seen them in action, anyhoo. The legal position is a little different over in the US, but many ISP's may still go for it, because these systems pay for themselves very quickly - even if you have to take it all down again in 6 months time, the kit will have saved a fair amount of cash.

That said, since the caches are transparant, unless someone inside the company starts to tell tales, the RIAA etc won't have any actual proof of what the ISP is doing, especially if they're all doing it.

Oh, and just to clarify the original post - I wasn't saying we need hashing to be implemented in DC++ - I know that's being looked into already. What I'm saying is I would like to see a standard set that NMDC and DC++ can both agree on and implement, or at least one follow the other, so the whole DC community can follow along.

The RIAA is about as clued up on P2P as M$ is on cracking anyhew - they know of it, they know they want to stop it, but frankly, it's all a bit too much like hard work. They're only dangerous because they have more money than we do. Just slap a few students with a bill for a few billion dollars - that'll make the world realise they're not stupit... :?
When tempted to fight fire with fire, take note that the Fire Brigade usually uses water...

megla
Posts: 22
Joined: 2003-01-03 11:26

Post by megla » 2004-01-27 18:56

ClubBarf wrote:Dunno about the US, but I know that over here, in Europe, ISP's are going a bit mad for the caches - the ones who've actually seen them in action, anyhoo. The legal position is a little different over in the US, but many ISP's may still go for it, because these systems pay for themselves very quickly - even if you have to take it all down again in 6 months time, the kit will have saved a fair amount of cash.

That said, since the caches are transparant, unless someone inside the company starts to tell tales, the RIAA etc won't have any actual proof of what the ISP is doing, especially if they're all doing it.

Oh, and just to clarify the original post - I wasn't saying we need hashing to be implemented in DC++ - I know that's being looked into already. What I'm saying is I would like to see a standard set that NMDC and DC++ can both agree on and implement, or at least one follow the other, so the whole DC community can follow along.

The RIAA is about as clued up on P2P as M$ is on cracking anyhew - they know of it, they know they want to stop it, but frankly, it's all a bit too much like hard work. They're only dangerous because they have more money than we do. Just slap a few students with a bill for a few billion dollars - that'll make the world realise they're not stupit... :?
The payload should be very similar to any web traffic. Or is it illegal with web caches too in US?

McDC
Posts: 16
Joined: 2003-01-04 05:08

Post by McDC » 2004-01-30 15:17

First of all if ISP uses a large cache does this effect my upload bandwidth in any way?

The problem with multi source downloading and rar releases is that of bandwidth ratio. In Sweden a large number of people will now be able to get either 8/0.8 or 2/0.8. In the past this was no real problem cause with large 700MB files maximizing dl bandwidth took a lot of effort , time and hd-space.

If implemented you have to balance bandwidth the ratio.

synOs
Posts: 7
Joined: 2004-02-14 18:04

Post by synOs » 2004-02-14 18:12

Todi wrote:
Changelog wrote:-- 0.307 --
* Fixed full name being displayed in user commands with submenus
* Bloom filters dramatically improve search efficiency (if you have a large share)
* Merkle trees and tiger hashing added
* Auto match results shown in status bar
Are you saying that hashing is implemented for sure in 0.307 (release date is around when?)? Doesnt hashing gigs and gigs of files take a really long time? Im all for file hashing btw, file integrity, or the lack thereof, is one of the biggest problems with DC++ right now.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-02-14 18:59

synOs wrote:Are you saying that hashing is implemented for sure in 0.307 (release date is around when?)?
It'll be released when it's finished... I hope you expected no other answer - they'd all be made up (by guessing) anyhow.
Doesnt hashing gigs and gigs of files take a really long time? Im all for file hashing btw, file integrity, or the lack thereof, is one of the biggest problems with DC++ right now.
This will be roughly the same as BCDC++ - I don't have logs with Arnetheduck's new hashmanager, but here:

Code: Select all

2004-01-19 08:30: Hashing file: (( First File ))
2004-01-19 19:27: Hashing file: (( Last File ))
That's for my share, which is roughly 37,000 files and 375 gigabytes. About 30 of that is a share mounted over the network.

And that may have been with the arne's hashmanager hashing files at the same time. They're all cached in a XML file and .DAT

Code: Select all

 Directory of C:\Program Files\DC++

02/13/2004  08:43p          60,817,408 HashData.dat
02/13/2004  08:44p           7,774,839 HashIndex.xml
               2 File(s)     68,592,247 bytes
Any more questions?

synOs
Posts: 7
Joined: 2004-02-14 18:04

Post by synOs » 2004-02-14 19:52

Will hashing be forced on DC++ load and download? Will servers have the option of blocking users without hash data? If the downloaded file is corrupt when compared against the hash code will DC++ allow one to download only certain parts of the corrupt file to fix it or will the entire file have to be downloaded again?

Thank you.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-02-14 20:55

synOs wrote:Will hashing be forced on DC++ load and download?
It's an option under Advanced. Hashing is done in a background thread, so will not block like the initial file list creation does when you start DC++. There's no hashing on demand, so someone requesting a file will simply not get a hash if it hasn't been calculated yet. (Eventually, there's no support of hashes in client to client communication right now.)
Will servers have the option of blocking users without hash data?
Maybe. It has to advertise itself in client to client communications. Arne could make a $Supports item that's sent to ExtendedProtocol hubs that would allow such a hub to know if the user currently has the option on.
If the downloaded file is corrupt when compared against the hash code will DC++ allow one to download only certain parts of the corrupt file to fix it or will the entire file have to be downloaded again?
Very likely the former. It looks like with the values that arne has chosen, you'll be able to get corruption-resolution down to 64 kilobytes, or 1/512th of the file, whichever is bigger. Verifying downloads is one of the unimplemented parts of the current hash system. You will only be able to download a specific segment of a file if the remote end has "Safe and Compressed Transfers" enabled.
Thank you.
You're welcome.

Kras
Posts: 2
Joined: 2004-02-14 23:24

Post by Kras » 2004-02-14 23:36

Thanks! :D This is fantastic news for hubs where there are lots of duplicate files having different size, quality and names! Would it be possible to have an option to filter out from Search, ADLSearch and file lists those files that one already has? Would it also be possible to have a private list of "banned" files, for example of inferior quality, that are also filtered out?

Thank! :D

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-02-15 09:08

Those requests go far beyond hashing... Please let the hashing get finished before you pile more on the (potential) to-do list.

Sure, it would be possible (if hashes end up in file lists) to eliminate files the user has in his share.
A private list of banned files seems like it would be pointless if searches returned meta-information. Presumably you could sort by codec, resolution, or bitrate and achieve the same thing.

synOs
Posts: 7
Joined: 2004-02-14 18:04

Post by synOs » 2004-02-15 19:21

I think it would be good, once hashing is finished, to keep a list of file size/hash codes on one's computer and compare that to files of the same size in one's share list. It seems that a lot of people change the names of files so it's difficult for others to search for them and thus save their bandwidth. e.g. They will change FileIWantToHide.avi to fiwth.avi or something else that a normal search will not find. I wonder if there's a way to catch people that do this. Maybe it's impossible. It's really annoying though.

synOs
Posts: 7
Joined: 2004-02-14 18:04

Post by synOs » 2004-02-15 19:26

Actually, how about an option simply to find a file with the exact same byte size and/or hash code as the one you want, therefore removing the problem of file name altogether? Im not exactly sure how hash codes work.

Twink
Posts: 436
Joined: 2003-03-31 23:31
Location: New Zealand

Post by Twink » 2004-02-15 22:25

synOs wrote:Actually, how about an option simply to find a file with the exact same byte size and/or hash code as the one you want, therefore removing the problem of file name altogether? Im not exactly sure how hash codes work.
I'm fairly sure that this is a planned feature, it is one of the major advantages of implementing hashes, so I think we'll see this eventually.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-02-18 11:02

BCDC 0.306 allows people to request a file by hash instead of by filename. I imagine that arnetheduck will implement something similar in 0.307+ as well.

I initially thought you were talking about searching by hash, which is... logical. :wink:

Locked