Data Reparation

Archived discussion about features (predating the use of Bugzilla as a bug and feature tracker)

Moderator: Moderators

Locked
PsychoBrat
Posts: 1
Joined: 2004-03-02 00:11
Location: Australia
Contact:

Data Reparation

Post by PsychoBrat » 2004-03-02 00:22

Hi,

I had a look through the forums but didn't find anything about this, so sorry if its been suggested before.

Data integrity can sometimes be a bit of a problem, so I was thinking a "repair from {source}" feature would be a good idea. For example, even if you downloaded the file correctly off someone else and their version was corrupt, you would be able to repair your download off someone elses copy.
It would work by checksumming reasonably sized chunks of the file then comparing them against checksums of chunks of the target user's file. Maybe if it determines a chunk to be incorrect, it could then in turn checksum smaller chunks of that segment to minimise the amount of data that needs to be retransferred when the corrupt part is identified.

Potential issue: This would obviously only work with other users with DC++, as there would be some work at their end.

I know the concept isn't that complicated to understand, so its likely that the team would want to create their own system if they were to implement it, however if not...

There is an open-source piece of software called ZIDRAV that already allows something extremely simmilar to this to be done, but through fairly manual methods (user creates a checksum, sends it to another user, they use that checksum along with a checksum of their own version to create a patch file that that user then sends back and the first user uses to patch their file.... *phew* lol). I have already asked the creator and he didn't object to having his code integrated into DC++ if it was wanted.
Website: http://sourceforge.net/projects/zidrav/

There is also another piece of software called fr1x that has been released under the GNU GPL and does a simmilar thing, but users a server application that the host user can add a list of files to, then clients can connect and use it to repair any of the files in the list.
Website: http://www.r1ch.net/projects/fr1x/

Cheers,
-jEFF

Smirnof100
Posts: 19
Joined: 2003-05-06 22:00

Post by Smirnof100 » 2004-03-05 02:15

ya... what if the person you are checksumming against has a corrupt version of the file... it will go through chunk chunk chunk find a chunk that is different and then "fix" your good file with the corrupted one... tell me a good way around that. :twisted:

You could fix it by say getting 3 checksums from 3 ppl with the same sized files and choosing 1 of the users with matching checksums... and that could use one of the extra "filelist slot"

And a better way than resonably sized chunks would be cut in half, find bad part, cut in half, find bad part, cut in half, find bad part... in my opinion anywayz... then your data would be get 2 checksums, compare, get 2 checksums, compare, get 2 checksums, compare... anywayz...
Build a Man a Fire, and he will be Warm for a day.
Set a Man on Fire, and he will be Warm for the Rest of his Life.

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-03-06 03:06

Well, since Tiger Tree Hashing (TTH) will be coming along in the next version (though it might not be stable...) it should be possible to request the entire hash tree for a file (we could call it $GetTTHFull or something). Then we verify to check for corruption while the table was being transferred, i.e. recreate the hash tree from the bottom rung and see if it comes out the same (if the client feels like adding in this safeguard...). Then we use $GetZBlock or whatever to get the missing pieces. I remember reading that the resolution would be 64k or 1/512th of the file, whichever is bigger. Therefore, even for movies, you'll only have to redownload a little more than a meg for each corrupted segment.

As for making sure that the person you're repairing from has an uncorrupted copy, I think we should leave that one up to the user who to verify off of... though it could use other sources provided they have the exact same file, which one could check by searching by the root TTH.

And thanks for drawing me out of lurking mode and getting me to register. :)

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-03-06 03:09

I spend 15 minutes editing that to make it as readable as possible and to make sure the grammar is nice, then double post. Today is not looking good for me. :?

joakim_tosteberg
Forum Moderator
Posts: 587
Joined: 2003-05-07 02:38
Location: Sweden, Linkoping

Post by joakim_tosteberg » 2004-03-06 04:09

PseudonympH wrote:I spend 15 minutes editing that to make it as readable as possible and to make sure the grammar is nice, then double post. Today is not looking good for me. :?
*Fixed, I don't balme you for doublepost by misstake with the current status of the forum, hopefuklly speed will bwe regained soon.

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-03-06 23:32

2nd pass: looking at one of the other threads, it seems BCDC++ already uses something like $GetMeta TTH filename. It would make sense to do it this way since it's more general. Of course, having a lot of complete hash trees being transferred around would kind of cut into bandwidth, seeing as for files >=32 megs the tree would be 24k in size (smaller than most bz2 file lists, but not insignificant)

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-03-11 20:07

PseudonympH wrote:As for making sure that the person you're repairing from has an uncorrupted copy, I think we should leave that one up to the user who to verify off of...


Bitzi.com lets you look up files based on their tiger tree hash:

DCPlusPlus-0.307.exe

Notice, in particular, the rating.

If we collaborate with the folks at Bitzi, we can have voting and meta-data submission inside DC++. This is where the awesomeness begins.


PseudonympH wrote:the tree would be 24k in size

I believe 24k is a maximum size for a tree depth of 9 (1019 values, including the root, plus a 24 byte/value). Arne's GetMeta will only transfer the lowest level of the tree, which will be 12k for a depth of 9 (2^9*24).

Xan1977
Forum Moderator
Posts: 627
Joined: 2003-06-05 20:15

Post by Xan1977 » 2004-03-11 20:49

GargoyleMT wrote:Bitzi.com lets you look up files based on their tiger tree hash:

DCPlusPlus-0.307.exe

Notice, in particular, the rating.

If we collaborate with the folks at Bitzi, we can have voting and meta-data submission inside DC++. This is where the awesomeness begins.

This is the coolest thing I've seen in a long time. Wow, awesomeness factor level overload.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-03-11 21:15

Xan1977 wrote:This is the coolest thing I've seen in a long time. Wow, awesomeness factor level overload.

There are reasons to get excited about hashing. Like magnet links.

If anyone's afraid of kazaa like fakes, this is a potential solution, or at least a temporary edge in the war.

Wisp
Posts: 218
Joined: 2003-04-01 10:58

Post by Wisp » 2004-03-17 03:37

GargoyleMT wrote:
PseudonympH wrote:As for making sure that the person you're repairing from has an uncorrupted copy, I think we should leave that one up to the user who to verify off of...


Bitzi.com lets you look up files based on their tiger tree hash:

DCPlusPlus-0.307.exe

Notice, in particular, the rating.

If we collaborate with the folks at Bitzi, we can have voting and meta-data submission inside DC++. This is where the awesomeness begins.

Image

just a couple questions..

- how will those magnet links look like? My idea is an URL with a couple parameters, the only required parameter is the hash code, but you could also add the "filename" tag as a parameter so that dc++ knows what filename it should put in the download queue (or does dc++ already knows what filename to use only by reading the hash code?)
- Will the meta-data also mean mp3 bitrate? The way i would like it is one general "info" field for each file, and the filelist-parser could add all the info about a file there, so in case of mp3's you can add the bitrate and encoder there, or in case of movies you could add the resolution and codec, etc. Hope you have something similar in mind...

Well when these features are added, dc++ is allmost perfect in my opinion 8-), the only things i would like more is a google-like search and some minor interface changes (like grouping search and hubs windows)

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-03-17 16:24

No, the filename is not included in the hash code, although it could just search by that hash (since it needs to find sources for it anyway...) and use the first filename that matches (simple) or the most popular (more complex). Metadata such as mp3 bitrate is possible, yes, but I have no idea how arne is planning on implementing GetMeta, so I can't tell you if it will be in there.

Basically, it is possible, but whether it will happen or not is a different story. The problem with these kinds of things is that the protocol wasn't built with this kind of extendability in mind, so everything like this is simply a hack on top of the existing protocol. Maybe if DCTNG and/or ADC ever get implemented it will be possible to do it without everything being so shaky.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-03-18 19:27

Wisp wrote:- how will those magnet links look like?

Magnet URIs are a defined standard: http://magnet-uri.sourceforge.net/


Wisp wrote:- Will the meta-data also mean mp3 bitrate? The way i would like it is one general "info" field for each file, and the filelist-parser could add all the info about a file there, so in case of mp3's you can add the bitrate and encoder there, or in case of movies you could add the resolution and codec, etc. Hope you have something similar in mind...

You've just re-described meta-data, something I've written about in a bunch of posts. =)

But yes, it would (likely) include mp3 data, avi lengths/codec/resolution, image resolution, and anything anyone wanted to contribute a patch for.

Wisp wrote:the only things i would like more is a google-like search and some minor interface changes (like grouping search and hubs windows)

Both of those are in the feature tracker, and get duplicated occasionally, they're unlikey to be forgotten. =)

Stacker
Posts: 23
Joined: 2004-04-30 00:15
Contact:

Post by Stacker » 2004-06-08 23:28

Are Magnet links possible in DC++ now?

Todi
Forum Moderator
Posts: 699
Joined: 2003-03-04 12:16
Contact:

Post by Todi » 2004-06-09 01:21

Afaik, not in vanilla. However i believe several clones have this capability. Not sure if it's the exact same implementation or if they've done it independently however.

Stacker
Posts: 23
Joined: 2004-04-30 00:15
Contact:

Post by Stacker » 2004-06-09 14:29

Not trying to sound ignorant or anything, but do you know of a site where I can get some information on this? On their current progress of the magnet link feature they are trying to implement? Or even where I can download their releases ?
I thought DC was just DC. lol

Todi
Forum Moderator
Posts: 699
Joined: 2003-03-04 12:16
Contact:

Post by Todi » 2004-06-09 14:46

Added magnet link handler, size and filename not needed hash is enough ;) ... magnet link format : magnet:?xt=urn:tree:tiger:TTTTTTTTTT&xl=999&dn=Filename.Ext
* Added magnet link copy to search frame.
* Added handling for magnet link in main chat ... opens search frame with hash search ;o)
* Added magnet link copy to directorylisting frame.


Reverse Connect (unsafe multi-source downloading, not recommended)
CZDC++
StrongDC++

Those are the one i know have the feature. However, they all have more or less bad features, so i can't really recommend using either of them...

Morainaki
Posts: 1
Joined: 2004-06-09 21:36

Post by Morainaki » 2004-06-09 21:38


Locked