Autosearching by TTH only in .403

Gasp · Post by **Gasp** » 2004-07-09 15:14

Don't you think that there should be an option to ignore the hashes when autosearching? (I'm, not sure how it's done in .401) Among people's shares on most of the hubs I'm on many files does not have hashes at all and, most of all, even if they have, the number of different hashes for the "same" files is dramatic.
Therefore, I'd rather risk a little corruption, especially as most of common file formats have error correction built in, than limit my sources by 70% or more. For a year I've been using DC now it has happened only once or twice that I downloaded a corrupted file. It's gonna take some time for hashes to spread so widely that one could rely on them solely.
However, I understand that if users are not forced to rely on them just now, the process of "unifying" the hashes for most of the copies is going to take longer time.

I hope I didn't miss something out here but I could find any such option.

TheParanoidOne · Post by **TheParanoidOne** » 2004-07-09 15:21

Gasp wrote:on most of the hubs I'm on many files does not have hashes at all

In this case, DC++ will use the older method of matching by file name and size. Any hashes encountered will be ignored (as far as I can remember).

Gasp · Post by **Gasp** » 2004-07-09 17:34

TheParanoidOne wrote:
Gasp wrote:on most of the hubs I'm on many files does not have hashes at all
In this case, DC++ will use the older method of matching by file name and size. Any hashes encountered will be ignored (as far as I can remember).

I supposed so but the issue is still there considering how much the hashes vary today.

Post by **GargoyleMT** » 2004-07-09 19:59

Gasp wrote:I supposed so but the issue is still there considering how much the hashes vary today.

They don't vary, there are a lot of corrupted versions, each with their own TTH.

This is one of the benefits of hashing, to see the corruption inherent in the system.

Gasp · Post by **Gasp** » 2004-07-10 05:02

GargoyleMT wrote:
Gasp wrote:I supposed so but the issue is still there considering how much the hashes vary today.
They don't vary, there are a lot of corrupted versions, each with their own TTH.

This is one of the benefits of hashing, to see the corruption inherent in the system.

Of course they are corrupted versions. But as I said

I'd rather risk a little corruption, especially as most of common file formats have error correction built in, than limit my sources by 70% or more.

That's the point. The TTH differences does not necessarily mean that the file is totally screwed up, usually the corruption is minor. In some cases matching only by TTH would sabotage autosearching as such - I've just searched for a quite common avi and got about 15 sources (all same size etc.). ALL of them with different hashes or no hashes at all.
I want an option for all these people who can live with those usually unnoticeable errors.

PseudonympH · Post by **PseudonympH** » 2004-07-10 22:56

Yes, but there are those of us (me) that want to prevent the 15 hashes for the "same" file situation. Hence, you get it from one guy and keep the hash consistent, so we have 14 singles and 2 of the same. Then the next person gets the version that 2 of you have because more sources is better, and we now have only one popular copy out there instead of many different ones.

cologic · Post by **cologic** » 2004-07-10 23:39

I want an option for all these people who can live with those usually unnoticeable errors.

Not only don't I want that option for myself, but I don't want anyone else to have it either. I value file integrity, as it increases the health of the DC network. (Well, more precisely, I don't want others to reshare such files, but since by the time it's in a potential share preventing such is unfeasible, I'll settle for the TTH checking during queuing.)

average joe · Post by **average joe** » 2004-07-11 08:39

TTH is awesome, only a fool would wanna turn that off. I hope it clean up the shares after a while. Now if people left releases intact and did a SFV check on their downloads I'd be really happy.

Only in a perfect world, right?

Post by **GargoyleMT** » 2004-07-11 19:52

Gasp wrote:ALL of them with different hashes or no hashes at all.
I want an option for all these people who can live with those usually unnoticeable errors.

Eventually, DC++ may have the option to add exact size but different hashes to a file as a source, if it already has the TTH leaves to verify them. In this case, as soon as a source (they can't be a source unless they have the same file) sent data that didn't match the TTH leaves, they'd be removed.

If this happens at all, it is a long way off.

What you want is to go back to pre-0.307, where there are no hashes. This is not the direction that the DC network (or at least DC++ and ADC) is headed.

bode_jr · Post by **bode_jr** » 2004-08-01 18:10

I am inexperienced in the DC use but I am facing a problem that seems to be of definition of the program:
if an archive that you have partially copied will have the using rejection of hub you does not obtain to continue the copy of this archive, exactly that it has located it in another one hub and another user.
In my opinion this is not good.

Todi · Post by **Todi** » 2004-08-02 01:14

Could you rephrase that?

Post by **GargoyleMT** » 2004-08-02 10:58

bode_jr wrote:if an archive that you have partially copied will have the using rejection of hub you does not obtain to continue the copy of this archive, exactly that it has located it in another one hub and another user.
In my opinion this is not good.

Ok, here's my translation of what you said:

If I have a partially downloaded archive, and no user on the hub has the exact same file so that I might continue it, my only option is to find another user with the exact file in another hub. In my opinion this is not good.

How did I do?

cyberal · Post by **cyberal** » 2004-08-03 02:47

average joe wrote:Now if people left releases intact and did a SFV check on their downloads I'd be really happy.

Only in a perfect world, right?

me likes you

Guitarm · Post by **Guitarm** » 2004-08-03 03:38

GargoyleMT wrote:
bode_jr wrote:if an archive that you have partially copied will have the using rejection of hub you does not obtain to continue the copy of this archive, exactly that it has located it in another one hub and another user.
In my opinion this is not good.
Ok, here's my translation of what you said:

If I have a partially downloaded archive, and no user on the hub has the exact same file so that I might continue it, my only option is to find another user with the exact file in another hub. In my opinion this is not good.
How did I do?

Hehe, I hope you did good because I actually understood what you translated it to, let's see what the answer might be....

cyberal · Post by **cyberal** » 2004-08-03 06:02

The answer is... yes.. this is the way is has to be at this time.. the situation will improve later on when DC++ can match separete TTH leaves and pick together parts from different incomplete files.

Post by **GargoyleMT** » 2004-08-03 07:55

cyberal wrote:later on when DC++ can match separete TTH leaves and pick together parts from different incomplete files.

DC++ automatically match leaves? An average ~147mb file has nearly 600 leaves, but many more intermediate hashes. Searching for non-root hashes and matching them is not feasible.

I've talked about adding non-matching files in the past, but I've always intended it to be a solely manual undertaking.

cyberal · Post by **cyberal** » 2004-08-03 09:18

What I meant was, add sources based upon the old system with filename and size.. and then download the parts where the TTH leaves match with the partily downloaded file..

Post by **GargoyleMT** » 2004-08-03 20:02

cyberal wrote:What I meant was, add sources based upon the old system with filename and size.. and then download the parts where the TTH leaves match with the partily downloaded file..

If it's manual, that's fine. Otherwise, you've just doubled the amount of autosearches. Or halved the effective number of searches (if they're spread out identically).

311Sam · Post by **311Sam** » 2004-08-17 20:02

the problem i have is that if you change a song title in a mp3 then it changes the hash........

Xan1977 · Post by **Xan1977** » 2004-08-17 20:52

Then don't change the song title...

Or, if someone downloads an .MP3 that has an completely incorrect ID3 tag then change it, hash it with the proper tag and spread it around with its new hash. That way, the misslabled file will eventually drop out of circulation.

I don't download music, so I don't know if they're still using ID3 or if some other header information sceme has replaced it, but the concept is still valid.

Post by **GargoyleMT** » 2004-08-20 11:53

Yes, if you change the tag, the hash changes. The Gnutella network has experimented a bit with a possible solution to this: they have an audio-only SHA1 (one of their hash types). They strip out all tag data, and only hash the valid MPEG frames in the stream.

This *might* get into DC++ eventually, with ADC. It is a pretty ugly hack, and though it can be done for any audio file, only mp3s are in any way likely to get the treatment.

DC++

Autosearching by TTH only in .403

Autosearching by TTH only in .403

Re: Autosearching by TTH only in .403

Re: Autosearching by TTH only in .403

Re: Autosearching by TTH only in .403

Re: Autosearching by TTH only in .403

Re: Autosearching by TTH only in .403