TTH Hash to ignore metadata (particularly id3)

Use this forum to flesh out your feature request before you enter it in <a href="http://dcpp.net/bugzilla/">Bugzilla</a>.

Moderator: Moderators

Locked
thattommyhall
Posts: 2
Joined: 2006-03-19 16:31

TTH Hash to ignore metadata (particularly id3)

Post by thattommyhall » 2006-03-19 16:35

cant we make it so that the id3 bits are ignored (read as 0 should work)
then mp3 files that differ only in id3 tags will match (as they should, they really are the same file)
we could make sure that id3 data is the first to be downloaded, and just grab it from one person.

what are the arguments against ?

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2006-03-19 16:56

consistency with CRC32/SFV. Files with different id3-tags will generaly have different CRC32-checksum. And since mp3s are often associated with SFV, issues will occur. In fact, dc++ moves into a direction where files that doesn't past a sfv-check (if there is an associated sfv-file) will not be shared.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

kulmegil_

Post by kulmegil_ » 2006-03-20 06:19

How about adding support for md5 checksums also?

thattommyhall
Posts: 2
Joined: 2006-03-19 16:31

re: why not use md5

Post by thattommyhall » 2006-03-20 13:24

prob more cpu cycles, if TTH root is equivalent or similar to sfv as regards complexity/security.
still if the small .sfv files are dled from the same person that the tiny Id3 tag part of each mp3 comes from, compatability would be ensured.
is that a reasonable solution? i can see a lot of benifit, particularly when perhaps 50 files will have the same size and name but not the same TTH hash; some of them must be identical modulo metadata

mmmmm, seems TTH hashs are quite secure cryptographically (not a massive issue for me in this context)

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Re: TTH Hash to ignore metadata (particularly id3)

Post by GargoyleMT » 2006-03-26 17:11

thattommyhall wrote:cant we make it so that the id3 bits are ignored (read as 0 should work)

That solution wouldn't work too well with ID3v2 tags, which have variable length. The Gnutella network has done something like this, I think they call it "audio-only-sha1" (if you wanted to look up the discussion on the_gdf [their mailing list]). I'm not sure it's a good idea, and would certainly be a pain to implement (do you know how many tag formats there are out there, or how confusing ID3v2 is?). It's a noble goal, but I'm not sure it's practical.

FarCry
Programmer
Posts: 34
Joined: 2003-05-01 10:49

Post by FarCry » 2006-03-26 18:25

A plugin interface would not necessarily be much easier to add, but more practical for that kind of functionality. 3rd-party plugins could then add meta-information to the file list, like an "audio hash" in this case, keeping the basic raw file hash intact, but adding the ability to search supporting clients in a more accurate manner.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2006-03-27 08:17

<offtopic> MPEG-7 Is a better solution than id3 </offtopic>
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

3josh
Account Disabled Due to Policy
Posts: 3
Joined: 2006-09-21 18:13

Another way to deal with MP3 meta data and integrate sfvs

Post by 3josh » 2006-09-21 18:34

There's another 'standard' for dealing with crc's on MP3's addressing just this problem. I'm not sure why it hasn't become popular as this is a major problem with sharing MP3's imho.

It's .sv files (sound verification). Some sfv programs have started supporting it as well (either as a different file, or in .sfv files - embedded in the comments I think). There's some opensource code that computes both in one scan in this program:
http://mp3bookhelper.sourceforge.net/he ... onSFV.html

As for DC++'s sfv integration - that seems pretty easy if this is adapted. Simply have any .sv checksums (or .sv embedded in .sfv) take priority over the sfv results.

Also - I don't know much about how TTH's work - but if something like this was done, maybe a new 'code' might be in order (TTM's? :) so that clients can easily know if the feature is supported on both sides - and increase the hash space.

Somewhat off topic - but it might also be interesting to do something similar for files inside archives (eg .tar .zip). In the MP3 audiobook hubs, there are religious wars over if the book should be shared as a tar or as individual MP3's. This could handily solve this issue at the cost of some major cpu resources (obviously would need to be an option).

-3j

Pasqualle
Posts: 21
Joined: 2006-05-06 14:43

Post by Pasqualle » 2006-09-22 12:52

i really like the idea, to calculate TTH without mp3 tags. but the variable tag lenght is really a big problem, as the same part of the song does not start at the same position in the file. it would require different sharing approach like
now: give me file block starting at position 15
new sharing implementation: give me part of the mp3 file starting at 30 seconds..

but, i think such sharing will not be implemented..

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2006-09-29 19:27

Pasqualle wrote:i really like the idea, to calculate TTH without mp3 tags. but the variable tag lenght is really a big problem, as the same part of the song does not start at the same position in the file. it would require different sharing approach like
now: give me file block starting at position 15
new sharing implementation: give me part of the mp3 file starting at 30 seconds..

but, i think such sharing will not be implemented..
That's what the "audio-only-sha1" was all about. The sharing application had to understand the tag, and calculated a SHA1 hash with only that data. If someone requested that file, it would figure out the correct offsets, considering the tag (and thus wouldn't send the tag ever... since it's not part of the "file").

Not sure it's worth it. It may make a limited amount of sense to share both the file with and without metadata. However, that doubles the amount of autosearches done for queued mp3 files... which I think is a pretty weighty consideration.

Locked