The Unofficial Partial File Sharing Thread

PseudonympH · Post by **PseudonympH** » 2004-07-11 22:59

I've been giving some thought recently as to how to implement PFS. Obviously, I'm talking about doing it over ADC, since doing it over NMDC would be too much of a pain.

It seems a given that results from people who only have part of the file should only be done with TTH searches. There's no real path for them to return in the results, so it seems they should just return the TTH, size, and an argument showing which parts they have. I'm thinking this should be a set of bits, one bit for each chunk, with 1 meaning the person has it and 0 meaning he doesn't. This string of bits would then, as is standard convention, be base32 encoded. I suppose it's bad form to require support for TTHF to get partial files, but if the client can do PFS and TTH, it's not that much more to have support for that.

It seems like it might be a good idea to put some kind of flag (like a "PF1" argument) in the SCH to mean that the searcher can support partial results. However, this would mean that full sources that don't support PFS would ignore the search because they're required to if they don't recognize a field. This would effectively double the number of searches that would have to be made. I want to avoid this, so is there another way of doing it other than the naive way of always returning the results, confusing old clients that would then try to get stuff repeatedly and have to get sent "file part not available" errors?

cologic · Post by **cologic** » 2004-07-12 04:28

Define a chunk: is it constant sized across the network, per eDonkey? Within a file, as Bittorrent? Follow the granularities of the TTH tree for easy validation? If you use a TTH tree, will you assign a constant one bit per leaf node? Even if you do, that doesn't solve the problem if different uploading clients store the tree to differing depths. In that case, would you mandate that if a client's leafs are n*1024 bytes, the presence or lack of a "chunk" would be indicated by n bits rather than 1?

One could bypass a bitmap approach altogether, by storing ranges. They have their own problems, of course...

Just some thoughts...

Post by **GargoyleMT** » 2004-07-12 10:09

I think, at most, the percentage of the file that's done should be returned in the search result, as well as some flag that indicates that it's a partial result.

Your client will have to talk to the remote end periodically anyhow (to see if it got any new chunks), so the format and communication of which chunks are complete should go entirely in the client to client (direct) communication, not in the search result.

PseudonympH · Post by **PseudonympH** » 2004-07-12 23:28

cologic wrote:Follow the granularities of the TTH tree for easy validation? If you use a TTH tree, will you assign a constant one bit per leaf node?

Yes, that's what I meant.

Even if you do, that doesn't solve the problem if different uploading clients store the tree to differing depths. In that case, would you mandate that if a client's leafs are n*1024 bytes, the presence or lack of a "chunk" would be indicated by n bits rather than 1?

You can tell what resolution is being used by the length of the file and the length of the "what parts I have" data. I'd be very surprised if people didn't standardize on what DC++ is currently using, though.

I thought about using ranges, but that would give you a *lot* more data to transfer.

Lundis · Post by **Lundis** » 2004-07-13 07:05

This seems incredibly complicated and would require new things in the dc protocol (I think).
It wouldn't be compatible with older dc++ versions or other types of dc clients.
You can't compare the current TTH with hashes of partial files, this would require hasing of the parts. Bittorrent has this, so use that instead or just use rar volumes and get over it.

In the hubs I'm OP in I always kick incomplete downloads with no hesitation.
Maybe you're just looking for a reason to share that crap.

TheParanoidOne · Post by **TheParanoidOne** » 2004-07-13 07:17

Lundis wrote:This seems incredibly complicated and would require new things in the dc protocol (I think).

Re-read PseudonympH's first line.

Post by **GargoyleMT** » 2004-07-13 11:08

Lundis wrote:In the hubs I'm OP in I always kick incomplete downloads with no hesitation.
Maybe you're just looking for a reason to share that crap.

Maybe you should research a bit before coming to snap judgements? The gnutella network, the edonkey network, and the winmx network all have partial file sharing, and there's no possibility of thinking that a file is complete when it is not. WinMX returns percentage done in the search results, and makes them a different color. eDonkey and Gnutella both do source exchange between downloading clients, and the downloaders share their partial files, allowing them to get from each other faster. In both networks, clients do not report the partial as a search result (unless they've seen it complete lately, so you can't start a download that's orphaned with no complete sources).

Lundis wrote:You can't compare the current TTH with hashes of partial files, this would require hasing of the parts. Bittorrent has this, so use that instead or just use rar volumes and get over it.

Please don't confuse the root TTH hash with the hashes of the leaves (segments) of the file. The leaf hashes allow you to verify small segments of the file - something like (I'm not checking against the source right now) a minimum segment size of 64 kilobytes, and a maximum size of 1/512th of the file. Once data in one of those segments is completed, you can hash it and know if it's a good chunk or not. The system is functionally identical to Bittorrent.

PseudonympH · Post by **PseudonympH** » 2004-07-13 15:25

Thanks for doing it for me, guys.