TTH Hash to ignore metadata (particularly id3)
Moderator: Moderators
-
- Posts: 2
- Joined: 2006-03-19 16:31
TTH Hash to ignore metadata (particularly id3)
cant we make it so that the id3 bits are ignored (read as 0 should work)
then mp3 files that differ only in id3 tags will match (as they should, they really are the same file)
we could make sure that id3 data is the first to be downloaded, and just grab it from one person.
what are the arguments against ?
then mp3 files that differ only in id3 tags will match (as they should, they really are the same file)
we could make sure that id3 data is the first to be downloaded, and just grab it from one person.
what are the arguments against ?
-
- Posts: 506
- Joined: 2003-01-03 07:33
consistency with CRC32/SFV. Files with different id3-tags will generaly have different CRC32-checksum. And since mp3s are often associated with SFV, issues will occur. In fact, dc++ moves into a direction where files that doesn't past a sfv-check (if there is an associated sfv-file) will not be shared.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.
-
- Posts: 2
- Joined: 2006-03-19 16:31
re: why not use md5
prob more cpu cycles, if TTH root is equivalent or similar to sfv as regards complexity/security.
still if the small .sfv files are dled from the same person that the tiny Id3 tag part of each mp3 comes from, compatability would be ensured.
is that a reasonable solution? i can see a lot of benifit, particularly when perhaps 50 files will have the same size and name but not the same TTH hash; some of them must be identical modulo metadata
mmmmm, seems TTH hashs are quite secure cryptographically (not a massive issue for me in this context)
still if the small .sfv files are dled from the same person that the tiny Id3 tag part of each mp3 comes from, compatability would be ensured.
is that a reasonable solution? i can see a lot of benifit, particularly when perhaps 50 files will have the same size and name but not the same TTH hash; some of them must be identical modulo metadata
mmmmm, seems TTH hashs are quite secure cryptographically (not a massive issue for me in this context)
-
- DC++ Contributor
- Posts: 3212
- Joined: 2003-01-07 21:46
- Location: .pa.us
Re: TTH Hash to ignore metadata (particularly id3)
That solution wouldn't work too well with ID3v2 tags, which have variable length. The Gnutella network has done something like this, I think they call it "audio-only-sha1" (if you wanted to look up the discussion on the_gdf [their mailing list]). I'm not sure it's a good idea, and would certainly be a pain to implement (do you know how many tag formats there are out there, or how confusing ID3v2 is?). It's a noble goal, but I'm not sure it's practical.thattommyhall wrote:cant we make it so that the id3 bits are ignored (read as 0 should work)
A plugin interface would not necessarily be much easier to add, but more practical for that kind of functionality. 3rd-party plugins could then add meta-information to the file list, like an "audio hash" in this case, keeping the basic raw file hash intact, but adding the ability to search supporting clients in a more accurate manner.
-
- Posts: 506
- Joined: 2003-01-03 07:33
Another way to deal with MP3 meta data and integrate sfvs
There's another 'standard' for dealing with crc's on MP3's addressing just this problem. I'm not sure why it hasn't become popular as this is a major problem with sharing MP3's imho.
It's .sv files (sound verification). Some sfv programs have started supporting it as well (either as a different file, or in .sfv files - embedded in the comments I think). There's some opensource code that computes both in one scan in this program:
http://mp3bookhelper.sourceforge.net/he ... onSFV.html
As for DC++'s sfv integration - that seems pretty easy if this is adapted. Simply have any .sv checksums (or .sv embedded in .sfv) take priority over the sfv results.
Also - I don't know much about how TTH's work - but if something like this was done, maybe a new 'code' might be in order (TTM's? :) so that clients can easily know if the feature is supported on both sides - and increase the hash space.
Somewhat off topic - but it might also be interesting to do something similar for files inside archives (eg .tar .zip). In the MP3 audiobook hubs, there are religious wars over if the book should be shared as a tar or as individual MP3's. This could handily solve this issue at the cost of some major cpu resources (obviously would need to be an option).
-3j
It's .sv files (sound verification). Some sfv programs have started supporting it as well (either as a different file, or in .sfv files - embedded in the comments I think). There's some opensource code that computes both in one scan in this program:
http://mp3bookhelper.sourceforge.net/he ... onSFV.html
As for DC++'s sfv integration - that seems pretty easy if this is adapted. Simply have any .sv checksums (or .sv embedded in .sfv) take priority over the sfv results.
Also - I don't know much about how TTH's work - but if something like this was done, maybe a new 'code' might be in order (TTM's? :) so that clients can easily know if the feature is supported on both sides - and increase the hash space.
Somewhat off topic - but it might also be interesting to do something similar for files inside archives (eg .tar .zip). In the MP3 audiobook hubs, there are religious wars over if the book should be shared as a tar or as individual MP3's. This could handily solve this issue at the cost of some major cpu resources (obviously would need to be an option).
-3j
i really like the idea, to calculate TTH without mp3 tags. but the variable tag lenght is really a big problem, as the same part of the song does not start at the same position in the file. it would require different sharing approach like
now: give me file block starting at position 15
new sharing implementation: give me part of the mp3 file starting at 30 seconds..
but, i think such sharing will not be implemented..
now: give me file block starting at position 15
new sharing implementation: give me part of the mp3 file starting at 30 seconds..
but, i think such sharing will not be implemented..
-
- DC++ Contributor
- Posts: 3212
- Joined: 2003-01-07 21:46
- Location: .pa.us
That's what the "audio-only-sha1" was all about. The sharing application had to understand the tag, and calculated a SHA1 hash with only that data. If someone requested that file, it would figure out the correct offsets, considering the tag (and thus wouldn't send the tag ever... since it's not part of the "file").Pasqualle wrote:i really like the idea, to calculate TTH without mp3 tags. but the variable tag lenght is really a big problem, as the same part of the song does not start at the same position in the file. it would require different sharing approach like
now: give me file block starting at position 15
new sharing implementation: give me part of the mp3 file starting at 30 seconds..
but, i think such sharing will not be implemented..
Not sure it's worth it. It may make a limited amount of sense to share both the file with and without metadata. However, that doubles the amount of autosearches done for queued mp3 files... which I think is a pretty weighty consideration.