Collection of files, SFV-replacement etc.

Technical discussion about the NMDC and <a href="http://dcpp.net/ADC.html">ADC</A> protocol. The NMDC protocol is documented in the <a href="http://dcpp.net/wiki/">Wiki</a>, so feel free to refer to it.

Moderator: Moderators

Locked
mirza
Posts: 6
Joined: 2003-04-27 07:30
Contact:

Collection of files, SFV-replacement etc.

Post by mirza » 2003-10-18 20:23

Here's my idea. A file. A file containing information about other files. SFV you think? Almost there.. I think XFV (eXtendable File Verificator) should be more suitable.

It wouldn't only be used for verification. I think the main purpose would be this:

Let's say I want to download an album. That would be a set of music files, maybe some image files for the covers.. and maybe a text file with some info about the album.

Now, I could manually look up each file and hope they're all part of the same collection.. But we all hate doing that.

What if I could download one single file. One single file containing information about the others. And then my client could read that information and automatically get the other files.

Something like "Mirza - Greatest Hits (2003).xfv" could look like:

Code: Select all

<? xml ?>
<collection title="Mirza - Greatest Hits (2003)">
   <file name="01 Eat me.ogg" size="1234567" crc32="something" sha1="something" />
   <file name="02 My daddy likes you.ogg" size="7654321" md5="something" />
   <file name="Cover - Front.png" size="123456" foo="bar" />
   <!-- etc -->
</collection>
The client could read the file, and search the network using any of the forms of verification it can. Direct Connect doesn't support hashing (yet :)) so I guess it would have to rely on the size and somehow the filename to get the correct files..

I'm not asking anyone to implement this into DC++ or anything. (That will probably have to wait until hashing is implemented) I'm posting here to see what you people think about it. Is it a good idea? Is XFV a suitable name? Has it already been done? (I can't seem to find anything.. and even if it has, I'd like to see it used.. I'm so sick of searching for files)

If I can get a "That's a good idea!" I'm planning to construct a decent website, make some software to easily read and write the files, get the layout of the xml into shape (make some sort of standard) and start advocating this to clients like Shareaza etc and release groups.

Suggestions are welcome!

Twink
Posts: 436
Joined: 2003-03-31 23:31
Location: New Zealand

Post by Twink » 2003-10-18 20:59

am i missing something here?

1) music from the same album really should be in one directory, and if it's not its the person you're browsings fault for not sorting it that way (well they may have a good reason I guess, but most likey they just dont sort music?) so if its because they're lazy do you think they're gonna take the time to make one of these files?

2) Although it would be good for verification as you already pointed out hashing is being made, which is fair superior.

umm did this have any other purposes?

mirza
Posts: 6
Joined: 2003-04-27 07:30
Contact:

Post by mirza » 2003-10-19 05:31

No no no, people wouldn't have to make all of these files. Hopefully they'd ship with releases. :)

Or I could simply get one from some website and then my client could get the files from the collection.. People I'm downloading from don't have to have a copy of the file.

Maybe I shouldn't have posted at 2 am..

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Yes

Post by GargoyleMT » 2003-10-19 15:34

This definintely has merit. For groups that put out mp3 releases, this is a better choice than a m3u for sure. And moving people away from CRC32 is a good move as well.

Have you taken a look at biti.com's database? If you're going to focus on audio releases, make sure to take note of their full file SHA1 hash, as well as the mp3-only SHA1 hash. I religiously retag albums (few meet my standards), and having the tag affect the checksum is... rather pointless. At that point to determine corruption, you pretty much have to use a tool such as EncSpot.

Also look into the LAME music CRC, perhaps that's suitable. (Who doesn't use LAME for mp3s nowadays?)

I suppose this has certain advantages for non-audio files too.

If you somehow include something that can replace the NFO, and get a program on par with QuickPAR (no pun intended, really), it might take off, at least in the limited world of hubs specializing in releases on DC.

mirza
Posts: 6
Joined: 2003-04-27 07:30
Contact:

Post by mirza » 2003-10-22 13:13

Well actually, the audio was just an example. But now that you mention retagging.. Some sort of audioHash (like the one bitzi.com uses) would be a good idea.

Btw, does anyone know how segmenting works?
<Segment Start="123" Length="4567890" SHA1="hash" ... />
Would that be enough to describe one?

BSOD2600
Forum Moderator
Posts: 503
Joined: 2003-01-27 18:47
Location: USA
Contact:

Post by BSOD2600 » 2003-10-22 14:17

Since you mentioned audiohash, I know http://www.musicbrainz.org/ uses this (I use this program a LOT for my retagging needs)
MusicBrainz is a user maintained community music metadatabase. Music metadata is information such as the name of an artist, the name of an album and list of tracks that appear on an album. MusicBrainz collects this information about music and makes it available to the public so that music players can retrieve information about the music that is playing. For instance, an audio CD does not contain the name of the artist, album or a listing of the tracks. A music player can use the physical characteristics of an audio CD to lookup the correct metadata for the CD and show it to the user during playback.

MusicBrainz also takes this concept one step further in applying it to digital audio files like MP3 files and Ogg/Vorbis files.....The MusicBrainz solution for this is the MusicBrainz Tagger, a Windows application that uses acoustic fingerprints (TRMs) to semiautomatically identify tracks in your music collection and then write clean and accurate metadata to your music files.
The TRM technology they are using comes from http://relatable.com.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2003-10-24 19:21

BSOD2600 wrote:The TRM technology they are using comes from http://relatable.com.
There's a competitor to TRM, but I can't recall it at the moment... I should've saved info on my HD somewhere, maybe it'll turn up. ;)

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Post by sandos » 2003-11-01 20:12

It seems shareaza uses collection files:

http://forums.shareaza.com/showthread.p ... eadid=4242

It also seems limewire does, havent found any good info on that though. The formats are said to be incompatible, a very, very bad thing IMO.

Remember that DC is much better at structuring data than gnutella is.

Locked