some ideas (not mine!) to discuss ...

A private forum for us Super-Humans, I even trust you to be able to edit your own posts =)

Moderator: Moderators

Locked
Zc
Posts: 7
Joined: 2003-01-04 06:07

some ideas (not mine!) to discuss ...

Post by Zc » 2003-02-21 14:10

Segmented downloads/compression
If I have all understand good, it seems when you will implement Zlib compression stuff, you can request a particular part of file and adler32 will be a crc ... so, you have solved correctly segmented downloads and added a on fly compression.

Metadata
I haven't seen a good way for this time but this feature can be very interresting.

Rating
All stuff was tell about that, but I thing rating must be for one hub (communauty) and not a part of client itself. So it's for DCH++ ;) (btw Sapporo comment on that is very interressting ... for begining)

It's the more interessing ideas I have seen, but if you have another we can discuss about it (we are here for that, no?!)

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: some ideas (not mine!) to discuss ...

Post by sarf » 2003-02-21 15:20

Zc wrote:Segmented downloads/compression
If I have all understand good, it seems when you will implement Zlib compression stuff, you can request a particular part of file and adler32 will be a crc ... so, you have solved correctly segmented downloads and added a on fly compression.
No, not directly (but indirectly you are correct). There are currently no feature for searching for adler32 values of files which would be needed - otherwise you'd have to download parts of the file and verify them.
Zc wrote:Metadata
I haven't seen a good way for this time but this feature can be very interesting.
Yes, but it needs to be handled in a generic, thoroughly thought out manner. For all I know it could be done with a $GetMetaData command or somesuch, though, to keep all clients happy.
Zc wrote:Rating
All stuff was tell about that, but I thing rating must be for one hub (communauty) and not a part of client itself. So it's for DCH++ ;) (btw Sapporo's comment on that is very interesting ... for [a] beginning)
The rating system does not have to be one rating server per hub. In fact, there are times when that is unwanted and rather stupid. What is necessary are ways in which to grant benefits to highly rated clients, and the rest will (speaking with an infinite timeframe in mind) come.
Zc wrote:It's the more interesting ideas I have seen, but if you have another we can discuss about it (we are here for that, no?!)
Well, I hope so!

Sarf
---
Time is a spiral, space is a curve, I know you get dizzy but try not to lose your nerve

Zc
Posts: 7
Joined: 2003-01-04 06:07

Post by Zc » 2003-02-21 16:55

sarf wrote:The rating system does not have to be one rating server per hub
but rating must be by community, all comunity don't have same rules and hubs aren't interessed by the same quality of client

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Re: some ideas (not mine!) to discuss ...

Post by GargoyleMT » 2003-02-21 19:43

Zc wrote:Segmented downloads/compression
If I have all understand good, it seems when you will implement Zlib compression stuff, you can request a particular part of file and adler32 will be a crc ... so, you have solved correctly segmented downloads and added a on fly compression.
pDC++ is working on this feature - I don't think he'll submit it back to the main DC++ on his own though. It seems pretty primitive so far.... The first thing I'd do is work on:
Metadata
I haven't seen a good way for this time but this feature can be very interresting.
Hashes. Hashes. Hashes and MP3 info.

It seems that an XML of sorts might be really nice to give information on a file - hash, hash of just the audio part of mp3s (so people who retag [like me] can still be alternate sources), possibly ID3 tag information, AVI codecs, bitrates, image resolution, etc. It seems like there might be a huge range of info that people will want or that emerging file formats might like, so XML seems to make a bit of sense. I've never done protocol design, so if someone with actual experience or whatever can comment...

And I think that it should just be a command listed in $Supports and would normally be used prior to transfering a file - normally to get the file hash for verification. But you could, way way down the road, support a feature like "get ID3 tags from this person and retag my files" or "what tags do people with this mp3 have on it?" :-D

I think when hashes are supported, a "get byte range" feature should be supported too. Someone asked about it in a thread about encryption, but I think it got lost....

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Re: some ideas (not mine!) to discuss ...

Post by sandos » 2003-02-21 22:00

Hashes. Hashes. Hashes and MP3 info.

It seems that an XML of sorts might be really nice to give information on a file - hash, hash of just the audio part of mp3s (so people who retag [like me] can still be alternate sources), possibly ID3 tag information, AVI codecs, bitrates, image resolution, etc. It seems like there might be a huge range of info that people will want or that emerging file formats might like, so XML seems to make a bit of sense. I've never done protocol design, so if someone with actual experience or whatever can comment...
There are lots of interesting metadata to extract: Relatable fingerprint for audio, identifies even differently encoded mp3:s of the same song, I tried this, I was very impressed that it worked. Its used by the musicbrainz intiative, and bitzi has support for it. http://www.musicbrainz.org/

There is one problem though: Clients doesnt gain anything on giving out metadata, the cpu and memory consumption will raise by using that, so it might be an idea to only give out metadata that the other client has, this way all clients will want to implement as much metadata as possible. Its evil and ugly, but...

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Post by sarf » 2003-02-23 09:05

Zc wrote:but rating must be by community, all comunity don't have same rules and hubs aren't interessed by the same quality of client
No. The rating server stores all information about a transfer. You, as a client, then selects what parts you wish to mark as important.
There is thought to allow OPs to artificially "bump" ratings on users that are especially appreciated in their hub, but it would be an option for the requesting client to include bumped rating.

More discussion about the rating system should be done in that thread.

About hashes and metadata... Hashing files should be general. Yes, it would be nice to have "fingerprinted" MP3/OGG/PNK files, but since DC is built with small building blocks (hubs) rather than a global building block, people are likely to have the same file as other people. This is why "search for alternates" work at all in the current system.

Retagging MP3-files is evil, but not as evil as some of the tagging that people do <shudder>. :)

As to "get ID3 tags from this person" and such features they can all be implemented client side (simply "request part X of file Y and do Z with it") - this does not have to be implemented in the protocol and thus should stay very far away from the protocol lest it gets it grubby little fingers whacked with a Mallet of Infinite FeatureCreeping.

Sarf
---
What do you want me to do, learn to stutter?

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2003-02-23 13:44

sarf wrote:About hashes and metadata... Hashing files should be general. Yes, it would be nice to have "fingerprinted" MP3/OGG/PNK files, but since DC is built with small building blocks (hubs) rather than a global building block, people are likely to have the same file as other people. This is why "search for alternates" work at all in the current system.
Retagging MP3-files is evil, but not as evil as some of the tagging that people do <shudder>. :)
But it doesn't work very well in the VGM hub I'm a part of, at least not for MP3s. There are ~ 60 - 80 people with ~ 1.5 - 3.5 tb of music, and you're lucky if there are three sources for the same file. Everyone has their own styles of tagging, and, quite honestly, since some of the releases by the "official" mp3 groups contain simple typographical errors, re-tagging is necessary. I also use iTunes for my iPod, and it relies heavily on correct (or at least uniform) tags to organize music, and more generally, function properly.
As to "get ID3 tags from this person" and such features they can all be implemented client side (simply "request part X of file Y and do Z with it") - this does not have to be implemented in the protocol and thus should stay very far away from the protocol lest it gets it grubby little fingers whacked with a Mallet of Infinite FeatureCreeping.
Well, I'm certainly not going to count on anyone else implementing this. I think that DC can be a good network for trading MP3s, but to be great for trading them, some things may be needed. The tag exchange is one of the more out there ideas, but fingerprinting just the audio portion of the mp3s is not. WinMX does this to an extent, by ignoring the last 128 bytes of the MP3. Fetching the variable length, prepended, ID3v2 tags would be very difficult to do client side.

BTW I envisioned something like this being exchanged between clients, when asking for meta-information:

<SharedFile Filename="whatever.mp3" artist="some damn person" album="steal this album" year="2010" filehash="some hash value" audiohash="another hash value" audiorange="2000-5000000" ... >

That's it. With XML, if someone doesn't want to parse tags (or there hasn't been enough spare CPU cycles to do it), you could leave those fields out. This could also be largely in the same format as the meta-information cache on disk, absent the real file path and modification times (and perhaps search hit count, download count, and bytes trasferred [for multi-source downloads]). Re-tagging wouldn't be elegant, because that'd be another dependency (libmad's tagger, perhaps, since libid3 doesn't support v2.4.0 tags yet). But it's not outrageous, at least to me.

Honestly, I can't think of another file type other than MP3 that has so much end-user modification involved (while keeping the actual data untouched). MP3s are the driving force behind many P2P applications, and I think they deserve a little special attention.

arnetheduck
The Creator Himself
Posts: 296
Joined: 2003-01-02 17:15

Post by arnetheduck » 2003-02-24 04:37

Well, I've considered switching to xml for some time, it's quite easy to do, and then I could add whatever I want to the file listing without breaking old clients, the only thing that might concern me is that loading an xml file probably takes 3-4 times as long as the ones used now...that's without any data added...

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Post by sarf » 2003-02-24 15:22

GargoyleMT wrote:[snipped quote]
But it doesn't work very well in the VGM hub I'm a part of, at least not for MP3s. There are ~ 60 - 80 people with ~ 1.5 - 3.5 tb of music, and you're lucky if there are three sources for the same file. Everyone has their own styles of tagging, and, quite honestly, since some of the releases by the "official" mp3 groups contain simple typographical errors, re-tagging is necessary. I also use iTunes for my iPod, and it relies heavily on correct (or at least uniform) tags to organize music, and more generally, function properly.
I am sorry to hear this, but I do not want feature/protocol bloating being introduced into DC++. It is better to solve this outside the protocol, because, once more, what if people a) disables this and b) uses a non-DC++ client.

This would (in my opinion at least) be better if solved by another application who took an audio "fingerprint" and retrieved the information from some kind of global server. This way, nothing would be needed to add to DC++ and you could still get your audio files with the correct tags et cetera.

Yes, I know, creating a whole new program/project is a bit daunting but imagine how much you would gain by it - a P2P independent retagging system!
GargoyleMT wrote:[snipped quote]
Well, I'm certainly not going to count on anyone else implementing this. I think that DC can be a good network for trading MP3s, but to be great for trading them, some things may be needed. The tag exchange is one of the more out there ideas, but fingerprinting just the audio portion of the mp3s is not. WinMX does this to an extent, by ignoring the last 128 bytes of the MP3. Fetching the variable length, prepended, ID3v2 tags would be very difficult to do client side.
Hmmm... well, you could make few special cases for MP3 files in DC++, I guess, to handle the ID3v1 tags (which are, I think, appended to the end... or am I wrong in this, too?). If you make a some more code in the client it could scan the file for ID3v2 tags too, I think.

However, never forget that while you might download and upload your fully legal MP3 files, others might be sending documents (PDF/DOC/TXT/LIT), movies and even (gasp!) pictures.
Thus, your problem is a problem, but it should be solved along with all the others. If we keep an audio "fingerprint", why not do the same for the different streams in AVI files? What about the chunks in JPEG files? And so on and so forth...
This would quickly bloat DC++ since it would become a content manager in addition to being a filesharing application... and I dislike combination programs - they seldom do anything as good as program coded for just one purpose.
GargoyleMT wrote:[snipped example]
That's it. With XML, if someone doesn't want to parse tags (or there hasn't been enough spare CPU cycles to do it), you could leave those fields out. This could also be largely in the same format as the meta-information cache on disk, absent the real file path and modification times (and perhaps search hit count, download count, and bytes trasferred [for multi-source downloads]). Re-tagging wouldn't be elegant, because that'd be another dependency (libmad's tagger, perhaps, since libid3 doesn't support v2.4.0 tags yet). But it's not outrageous, at least to me.
Where/when would this information be transmitted?
GargoyleMT wrote:Honestly, I can't think of another file type other than MP3 that has so much end-user modification involved (while keeping the actual data untouched). MP3s are the driving force behind many P2P applications, and I think they deserve a little special attention.
I can think of a few, among them video files which are being ripped to different formats (I don't want to download a movie in MPEG format which I already have as an AVI/DIVX file, for instance).
The redeeming quality of MP3/audio files are that they are (relatively speaking) small, much smaller than video files, while at the same time being large enough to stop the GIF/JPEG/PNG "megadirectory" thingy where thousands of pictures are shoved into one directory - simply put, such a directory would be... very large.
By the by, are we forgetting about the OGG files? The WMA files?
I dislike adding features that "cry out" for even more features to be added.

If this was done as a plugin of some sort I would have no problem with it (even if the plugin was bundled with the default distribution of DC++) but it should be kept and developed seperate from the main DC++ code and arnetheduck. I would prefer him to work on my nifty features, thank you very much. :lol:

Adding "only" MP3 file handling would lead to yet more feature requests "hey, why don't you support the Nifty FileFormat?! its really k00l! totally! fix it plz".
I dislike making the feature request spammers tasks even easier than they are.

Give me one file format independent meta-data solution, and I'm all for it. It could even be supported by DC++ but powered by plugins files written by people whom live and breathe their own private file format.

Sarf
---
Oh dear, I've gone and inflated my ego.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2003-02-25 12:05

Before this gets too much more off topic (too late), let me explain... no, sum up.

Meta information could be collected from shared files by a separate thread that uses something like libextractor (google, it looks weak at the moment, but it can be extended). This could be used to agument the shared file listing (and added to a cache that gets written to disk, so scanning doesn't happen every startup). Any client that supported meta-information would add a tag in $Supports, as well as a client to client command. The remote client could request the meta-information on a specific file, then the remote side would send a little XML structure with the meta-information in it.

This information would be useful, and would, at the minimum, I'd hope, have a Tiger Tree Hash (dunno how many leaves to include) of the file. Then the transfer could proceed as normal, except now the remote client could verify that the file data it got was non-corrupt, as the transfer goes on. It might be nice to have a "cancel" and "get byte range" command too, but... later,

Now: MP3s. Once encoded, people can and do modify the MP3 file without altering the audio data inside. Even just adding (or removing) a RIFF header (80 bytes or so) at the beginning of a file (perhaps so it will play on a portable player) makes someone a non-viable alternate source for people downloading the same MP3. I think this is a shortcoming. If DC++ recognized the audio data inside as an immutable piece of data, and that the tags on the file are transitory, it could generate a hash for just the audio data (in addition to the more typical full file hash). If someone was already downloading the MP3 and had the file (and audio data portion) hash, even if the local copy was re-tagged, DC++ could, if smart enough, act as an alternate source. This requires a little bit of leg work, like keeping track of the start and end of the audio data, as well as (on the remote end) keeping track of the tags that were on the original source mp3.

Unlike other files, RARs, ISOs, AVIs, the data in the MP3 file doesn't get messed with when it's tagged. The _closest_ I can think of is a commented GIF/JPEG (one supports comments) - the image isn't recompressed. So I can't see any reason why people will (validly) request this treatment for other file types.

Anyway, if clients will give out meta-data for any file they're sharing, and you can retrieve hits based on searching for a hash, you could then "gather" meta-data from other clients and use it, inside or outside the client, to modify your source. A simplistic example is that eMule will let you see the remote filenames of anything you're currently transferring, and let you rename your local copy inside the application. Allowing fetching of remote meta-information seems like a pretty "neat" thing, and if people get used to it, it might just become an indispensible feature of DC++.

I think that if someone has a big enough patch that goes into DC++, they should maintain it... If it was Arne's "baby", it wouldn't have needed to be coded by someone else. :wink:

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-04 10:35

I've played a bit with hashing; as of a week and a half ago or so, I'd gotten as far as this. I'm further at this point, but unfortunately I hadn't had much time to work on it, so it's only now significantly developing (i.e. being integrated into a client) again.

As far as XML filelists... I'd just say it should be transparently compressed with the bzip2 library or zlib; I'm not sure how readily the current DC++ XML parser can be extended to handle this, but if it can, great. Support for metadata would be nice too, though of lesser importance to me.

Hashes should still be stored in a separate file, as a client won't request a list of all hashes, but a hash of one file. I'm so far using an eMule-like format for it so far, but compressed XML would be nice, if it were easy within DC++ to use a compressed XML stream, but doesn't seem to hold any compelling advantages (the format is extensible through metalists).

Locked