Hashing suxx...

Archived discussion about features (predating the use of Bugzilla as a bug and feature tracker)

Moderator: Moderators

Locked
°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Hashing suxx...

Post by °^Quicksilver^° » 2004-12-04 16:51

.. for small files. Otherwise it is great!
But how about not including the tth in the filelists for files that are really small.

Nobody ever searches for them via tth.

So how about just not hash everything in the smallfile category, or hash it on the fly when the small file is requested to enshure integrity? Or how about having one filelist with tths of the small files for answering searches, but the to other users uploaded filelist just misses the tth fo smallfiles!
May be even files smaller than 500Kb could be left out for hashing so shares with a lot of jpegs would get smaller.
I think that could downsize a lot of huge filelists without to strong sideeffects.

Please think about it and comment.
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

Xan1977
Forum Moderator
Posts: 627
Joined: 2003-06-05 20:15

Post by Xan1977 » 2004-12-04 17:00

It can still be used for identifying duplicates and for searching for alternates, regardless of file size. So no.

°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Post by °^Quicksilver^° » 2004-12-04 18:17

Have you ever searched for an alternate source of a smallfile?
I mena this little files you gat an extra slot for you don't need alternates for them!

Also identifing duplicates is not the main function of DC++. But also for this there would be work around by just having a filelist with the hashed smallfiles and one without. So dc can identifie duplicates, but not has to upload the list with the tths for smallfiles .
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

TheParanoidOne
Forum Moderator
Posts: 1420
Joined: 2003-04-22 14:37

Post by TheParanoidOne » 2004-12-04 18:56

°^Quicksilver^° wrote:Have you ever searched for an alternate source of a smallfile?
Yes.
The world is coming to an end. Please log off.

DC++ Guide | Words

°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Post by °^Quicksilver^° » 2004-12-05 15:39

Ok then let's think you are not the only one for this. Still I think we would benefit from this. Searching for smallfiles is something rare. But donloading filelists is something done quite often. And huge loads of jpeg are as much a Problem for the filelist size as rarsharing.
But getting rid of rar sharing will need alot of features (multiple source download, upload from unfinished files)

Getting rid of the space small files take in filelists through tth is much easier.
Hash them but don't include their tths in the filelist when uploaded!
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-12-05 16:16

Solution: share .cbr/.cbz files instead of folders like us smart manga sharers do. I suppose if it's just a ton of random pics then you shouldn't do this, but I was under the impression they were sets.

°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Post by °^Quicksilver^° » 2004-12-05 17:07

no not sets Mangas are here normally also rared in easy to handle 300Mb packages with 5 Books. But if I watch at my share let me see:
Videos in rar file 4000 files
Normal Videos: 1200 files
Mp3: 2000 files
Progs: 400 files
Games (partially rar): 2400 files
Images: 2000 files
Doks:1300 files
misc: 200 files


so not sharinfg tth would get a lot of the doks the images if we imagine tth makes after compression one third of the filelist. Well doesn't look like a big impact on the share but imagine for someone with less videos.
It still is amazing for me that my images and docs make about 0.5% of my share but consume 25% of my filelist.
Well this feature request would make my list about 10% smaller others filelist may be 15% or even more.
10% doesn't sound like much ? I think it is a lot!
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

DeathStalker77
Posts: 38
Joined: 2003-11-15 22:10
Location: Hell

Think About It ....

Post by DeathStalker77 » 2004-12-07 11:41

Quicksilver, if all your argument is about is smaller filelist sizes, then it fails.

Filelist size is nothing. Hashing is the way things are, and the way they will be. More and more hubs will be banning the older, non-TTH clients (thankfully!)

You talk about being able to match small files (like JPGs) - I will *guarantee* you that the error rate - meaning *wrong* file is downloaded - is FAR greater without hashing - it's only going by filesize and name. Hashing makes "exact" files "exact".

People complain about the speed of hashing - DC will hash ~200GB an hour on a P4 2.8ghz system (your milage may vary). That is *not* an unreasonable speed.

I for one, use Match TTH on *every* search initially, maybe YOU don't search for files (even small ones) with TTH, but the vast majority of users do.

In addition, since the newer clients can no longer connect to the older (306 and back) clients, they will be gotten rid of even faster.

Bottom line, live with it or find another P2P program.

°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Post by °^Quicksilver^° » 2004-12-07 12:08

Hmm may be you misunderstood the "Hashing suxx" title that was just as provocation

1. In my hub I set a date to 1.1.2005 then alle clients without tth are banned
so yes I want tth

2. smallfile is normally not needed to find alternates because you automatically get a slot for it

3. I think you have better chances to match a properly named small file than a big movie, because from old dc times there are movies around with same name same size and 10 different tth.

4. Also not uploading the tths for the smallerfiles doesn't make it impossible to search for it via tth.

5. I never copmplained about hashing speed. Was that ever thematic of this thread?
I think you just read the headline and not really read the thread!
So please read before you post! I am not a newb that complains about hashing.
I just want a bit smaller filelists without to negative sideeffects.
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

DeathStalker77
Posts: 38
Joined: 2003-11-15 22:10
Location: Hell

Understood =)

Post by DeathStalker77 » 2004-12-07 14:01

Ok, maybe I did go a bit overboard in my response (been "battling" the hashing issue in too many hubs) - I know you didn't mention the hashing time, but I thought I'd throw that in for good measure :lol:

I congratulate you on the non-TTH ban - I wish more hubs would implement them that quickly.

I understand more clearly your issue with the small files now - and with that understanding, I agree: if the filesize falls below the threshold for mini-slots, then indeed, it should not be hashed.

As I understand it, the mini-slot size for DC++ is 64kib (fulDC allows you to adjust that to whatever size you wish). If this was adjusted to maybe 100kib, I think that would allow the automatic download of a fairly good percentage of image/doc/html files.

Personally, I've not done a breakdown of filelists to see how much of the size is taken by the hash signatures.

In retrospect, I think a better, more appropriate, Subject could have been chosen :wink:

I think this is an issue that should definitely be posted to Bugzilla.

Respectfully,

--- DeathStalker

°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Post by °^Quicksilver^° » 2004-12-07 17:01

1. I would have done this none tth client ban earlier, but I wanted some stable alternate for my users. And now since 0.667 is really stable also for passive users, i think it is time for this for every hub.

2. Basically I wanted here some cracks that know the protocol very well.
I wanted them for a statement if it is possible to use two filelists one with tth and one without tth for smaller files.
For small files it would already be great not to upload the tth each time.
But to use this with a bit bigger files (how about everything lower than 250Kib) and upload the tth of this files on request to make them searchable via tth.
I don't know if this is possible without protocol changes.
Or can a tth root be requested for a file from a user?

Only leaving tths out for <64k files would help a bit , but most jpegs are in between 50kib and 300kib so their tths also enlarge filelists alot. Thats what I want to discuss here before sending a request to bugzilla.
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2004-12-08 04:03

I must say this thread is stupid.

1) DC++ do support a filelist without TTHs, if you want it.. download it.

2) DC++ do support a filelist with TTHs, if you want it.. download it..

What is your problem? Almost everything you request is available..

Should DC++ also support something inbetween? Hell no.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Post by °^Quicksilver^° » 2004-12-08 13:22

ok never thought about sending the old raw to get a filelist, is it that you are talking about?
Ok nice idea, but no benefit for the average user. I also don't know if dc will automatically get the tth from the other person if I downloaded the list without tths, to be able to search for alternates.

Now imagine the average user in dc doesn't even know what tth is
even more users don't know how to download a filelist without tth from someone (like me too)!
[Please no don't tell me something about educating them, there is a limit to education and a filesharer should be easy to use imho]

So this thread was something for everyones benefit not just for the cracks. And I think yes a way for in between would be nice if you wanna call it like this, an easy way in between.
You might think of this thread as stupid, I would think of it as more stupid to post at bugzilla before discussing it, also there are stupider threads availabel.
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2004-12-09 01:43

maybe not everyone knows howto, but... if you want it enough you will.

Its better if file integrity is preserved between filetransfers than the opposite. This is of more importance to the community as a hole. TTH helps in this.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

°^Quicksilver^°
Posts: 18
Joined: 2004-10-05 08:26
Location: Want my ip? just ask!

Post by °^Quicksilver^° » 2004-12-09 12:47

Right fileintegrity is important. But fileintegrity can also be ashured via tth even if the tth is not in the filelist, by sending it for the smaller files after request.
Well but I am beginning to see another negative sideeffect, sending requests for tths that are not in the filelist would be negative for the hubs performance(upload) which is even more important to keep this low.

Have you any reasons against no tth just for files <64Kib ? For this size tcp errorprotection seems to be good enough, to enshure integrity without tth, or not?
Imagination sets the spirit free,
Into distant lands of fantasy,
Close your eyes and you will see,
Within your mind there lies the key.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2004-12-09 14:31

Ofcourse i have, the 64kib minislots limit is not part of protocol and should not become a part of the protocol.

there-is-no-protocol-ly'ers ;))
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-12-12 16:45

°^Quicksilver^° wrote:Have you any reasons against no tth just for files <64Kib ? For this size tcp errorprotection seems to be good enough, to enshure integrity without tth, or not?
Doesn't the checksum only cover the TCP header?

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-12-12 17:05

RFC 793 wrote:The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header and text. If a
segment contains an odd number of header and text octets to be
checksummed, the last octet is padded on the right with zeros to
form a 16 bit word for checksum purposes. The pad is not
transmitted as part of the segment. While computing the checksum,
the checksum field itself is replaced with zeros.

The checksum also covers a 96 bit pseudo header conceptually
prefixed to the TCP header. This pseudo header contains the Source
Address, the Destination Address, the Protocol, and TCP length.
This gives the TCP protection against misrouted segments. This
information is carried in the Internet Protocol and is transferred
across the TCP/Network interface in the arguments or results of
calls by the TCP on the IP.
Cliff's notes: no, Garg

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-12-13 12:09

PseudonympH wrote:Cliff's notes: no, Garg
Thanks. The RFC seems to have left "text" undefined, though it seems that if it's not the header, it would naturally mean the data payload.

ZoneAlarm must have been corrupting packets and touching up the checksum, from all of the corruption reports we've experienced.

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-12-13 15:18

It's also possible it intercepts the packets right after the kernel has stripped the IP and TCP headers but before it's passed on to the application.

Locked