Improved sources search

DolphinSI · Post by **DolphinSI** » 2004-12-10 06:25

I am not really sure if this option isnt already included, but it seems to me that it isnt.

I would like to see, that when DC++ itself searches an alternative source for a
download, looks for that one (if there is any) which has the most free slots of total
number of slots and the fastest connection type.

For short example: There are 4 sources available
1. 4 free slots out of 10 DSL
2. 5 free slots out of 5 T3
3. 1 free slot out of 15 T1
4. 6 free slots out of 6 Cable
and it automaticly selects the 2. source, because the download from it would be the
fastest.

If that's possible.

Shroom · Post by **Shroom** » 2004-12-10 06:49

The "fastest connection type" is unreliable, to say the least, since this is something each user sets on their own. It has nothing to do with their actual speed or their connection.

A problem that comes to mind about the "most free slots" first idea, is that the autosearch for alternatives has often added these sources quite a while before they are actually needed, so the number of free slots indicated is no longer accurate.

DolphinSI · Post by **DolphinSI** » 2004-12-10 07:50

The client should by my opinion find just one source with max free slots and use it.
Then when its lost, it finds another source on the same rule.

joakim_tosteberg · Post by **joakim_tosteberg** » 2004-12-10 07:52

That would require more searxhe to be sent to the hub.

DolphinSI · Post by **DolphinSI** » 2004-12-10 08:03

joakim_tosteberg wrote:That would require more searxhe to be sent to the hub.

You are trying to say, that it would slown down the hub?

Wisp · Post by **Wisp** » 2004-12-10 09:31

For picking the fastest source, i think my suggestion is better

DolphinSI · Post by **DolphinSI** » 2004-12-10 15:17

I have read your suggestion, but i think that the biggest problem is in informations which wouldnt be current and the procces should be then repeated over and over again.

For example DC++ would download 10-20 filelists or even more, and that would take a remarkable amount of time in meantime the rates of speed would also change, because of new uploads which would be started.

If the selected source would be dropped because of low speed,and the next sources would also be slow, the whole procces could not end at all.

Perhaps is a possible sollutions just in dropping very slow downloads and selecting alternatives on my principle (if there are any) without calculating.

The performance would be better and there would be any threat of an unending chain.

Shroom · Post by **Shroom** » 2004-12-10 15:37

I think that Wisp's suggestion is a real good one, and that it would improve the program. It's true that the speed in which DC++ did get the filelists, and sorted the users accordingly, could have changed considerably in the time passed between the program getting the filelist and actually switching to another source.

But this kind of sorting would at least be a qualified guess as to which users are fast, which in my opinion is better than the current system. At least it sorts out (or places last in available sources list) the really slow users.

DolphinSI · Post by **DolphinSI** » 2004-12-10 15:50

I also think that the suggestion is good, but it has to be set so that there wouldnt come to an unending chain or a procces which would last for too long.

The program should select just the sources with max free slots (for example 5), calculate the speed from all of them and in the end again check the slot ratios and pick the appropriate connection. By doing so it would pick the best connection which is currently available.

PseudonympH · Post by **PseudonympH** » 2004-12-10 21:59

Wisp wrote:For picking the fastest source, i think my suggestion is better

For picking the fastest source, I think your second idea is retarded.

Wisp · Post by **Wisp** » 2004-12-11 17:23

PseudonympH wrote:For picking the fastest source, I think your second idea is retarded.

I don't have a second idea yet

PseudonympH · Post by **PseudonympH** » 2004-12-11 21:33

My mistake; I skimmed it and thought second paragraph meant second idea. It's still a stupid kluge.

DolphinSI · Post by **DolphinSI** » 2004-12-12 03:12

PseudonympH wrote:My mistake; I skimmed it and thought second paragraph meant second idea. It's still a stupid kluge.

'm totally new to this forum, but since then i I have seen you three times saying that someones idea is stupid without explaining why or giving any kind of good argument.

Is that some kind of bad habbit of yours?

PseudonympH · Post by **PseudonympH** » 2004-12-12 10:16

a) nobody has taken me up on it yet
b) it's already been discussed ten thousand times

Post by **GargoyleMT** » 2004-12-12 16:51

DolphinSI wrote:For example DC++ would download 10-20 filelists or even more, and that would take a remarkable amount of time in meantime the rates of speed would also change, because of new uploads which would be started.

If the file lists were downloaded anyway, for queue matching, that would be ok. But intentionally downloading a file list just to see if I'm the fastest user is wasteful. (Plus, imagine if everyone was doing that and searched at [nearly] the same time.)

DolphinSI · Post by **DolphinSI** » 2004-12-13 03:07

Efcourse, i'm well aware of it. But i'm also aware of that that DC++ already downloads to many filelist which dont end beeing used.

For example: When i search for alternative source, the program starts to download a great number of filelist, without beeing told to. The filelists are beeing downloaded in my case 5-10 minutes, which is at least a remarkable amount of time turned to waste. Imagine now that you would like to search for the fastest downloads for 10 files. The filelist would be downloaded for 50-100 minutes. Thats a waste of time by my opinion.
The biggest problem is by users with large shares and slow upload speed, because of
almoust all used slots.

I myself (just like anybody else i think) pickup just the sources that have the most free slots and turning out to be the fastest..

That could also the program do. It should take less sources (3-5 max) with max slot ratio, calculate their speed and pick the best choice. It would be faster, because the program does pick more source etherwise, as it would in my suggestion. If the source would be cut, it would pick the free source on the same principle and there would be still less downloading.

DolphinSI · Post by **DolphinSI** » 2004-12-13 03:16

PseudonympH wrote:a) nobody has taken me up on it yet
b) it's already been discussed ten thousand times

a.) Calling someones idea stupid, is still a bad habbit. The idea is just not good.
b.) Discussion lead to answers or when not just to knowledge and more understanding of
the discussed problem. You can find a sollution of any problem, just by trying to find the
right way, not by just picking the right way at once.

Post by **GargoyleMT** » 2004-12-13 12:15

I think this feature is a hack to try to do what multisouce downloading accomplishes without any work (besides the complete overhaul required of many of DC++'s internals).

I'm more willing to work on a long term solution (multisouce) than a short term one (speed finding on existing sources). If you feel otherwise, feel free to contribute, just as I have.

Guitarm · Post by **Guitarm** » 2004-12-13 12:25

GargoyleMT wrote:I think this feature is a hack to try to do what multisouce downloading accomplishes without any work (besides the complete overhaul required of many of DC++'s internals).

I'm more willing to work on a long term solution (multisouce) than a short term one (speed finding on existing sources). If you feel otherwise, feel free to contribute, just as I have.

I'll agree on the multisource idea. If I understand it correctly there are some mods that have multisource implemented already. I don't have the knowledge to judge whether these are good implementations or not but maybe some of you guys can shed some light on this subject - What algos to use, how to implement it, and if there's certain arguments on why it hasn't been implemented before

Post by **GargoyleMT** » 2004-12-13 13:17

Guitarm wrote:I'll agree on the multisource idea. If I understand it correctly there are some mods that have multisource implemented already.

As far as I understand, all of the multisource clients use the reverse connect code. I'm not sure if that's 100% true. And I'm not sure if any of the users of that code have altered it significantly.

Guitarm wrote:What algos to use, how to implement it, and if there's certain arguments on why it hasn't been implemented before

Well, DC++ would reimplement it, regardless of the existing code. The Queue and Download Managers must be changed so that they can cope with ranges of files being complete, and not assuming 0 to (current file size) is contiguous. The files being downloaded will have to be open, yet being written to in several different places, at different rates. The actual download requests will need to be either chunked into fixed sizes (using ADCGET or $?Get*Block), or determined more dynamically. This will make people's log files fairly ugly, so some better method of logging might be desired (or have a default format that specifically includes the start position and range).

paka · Post by **paka** » 2004-12-27 20:54

Multi-source downloading has 2 major drawbacks:
1) more space (a lot!) for incomplete files is needed (unless you store the segments in separate files),
2) a higher total number of slots will be needed for the same bandwidth that the users have which is pointless and potentially increases the number of timeouts when more connections are used at the same time.

Of course I don't have to use it, but I like the idea of choosing the best source a lot more than multi-source downloading.

joakim_tosteberg · Post by **joakim_tosteberg** » 2004-12-28 02:49

Why would it require bigger incomplete files?

Post by **GargoyleMT** » 2004-12-28 12:52

paka wrote:1) more space (a lot!) for incomplete files is needed (unless you store the segments in separate files),

It requires no more space than the completed file + some bitmap that tells which chunks have been completed (which would probably be roughly the size of the TTHL (24k).

2) a higher total number of slots will be needed for the same bandwidth that the users have which is pointless and potentially increases the number of timeouts when more connections are used at the same time.

I don't see this. If the slots remain the same, they'll be occupied less time, because otherwise unused slots will reduce the amount of time needed to transfer a given file.

Of course I don't have to use it, but I like the idea of choosing the best source a lot more than multi-source downloading.

That's a quicker fix, but will probably result in modifications to the existing code that may be hard to navigate or understand. With a clean multisource implementation, the code should be more understandable.

paka · Post by **paka** » 2004-12-28 17:00

GargoyleMT wrote:
paka wrote:1) more space (a lot!) for incomplete files is needed (unless you store the segments in separate files),
It requires no more space than the completed file + some bitmap that tells which chunks have been completed (which would probably be roughly the size of the TTHL (24k).

That's exactly what I meant. Maybe I wasn't precise enough - I have a queue of about 150 GBs, about 10-20 GBs of free space on the download partition and I'm able to download successfully at the moment. Suppose you get the point now. With multi-segment downloading it's going to be impossible.

2) a higher total number of slots will be needed for the same bandwidth that the users have which is pointless and potentially increases the number of timeouts when more connections are used at the same time.
I don't see this. If the slots remain the same, they'll be occupied less time, because otherwise unused slots will reduce the amount of time needed to transfer a given file.

Usually better slots (with higher speed, of course) are occupied anyway. What I'm afraid of is the situation when hub admins change the slot rules to higher minimal numbers because the demand for slots may increase.

Of course I don't have to use it, but I like the idea of choosing the best source a lot more than multi-source downloading.
That's a quicker fix, but will probably result in modifications to the existing code that may be hard to navigate or understand. With a clean multisource implementation, the code should be more understandable.

I suppose so. Also it seems to me that implementation of best source selection algorithm should be easier after multi-source downloads are added. When these features are (hopefully) present, it would be great if multi-source and/or best source selection were optional, because of disk space consumption with multi-source on (as in the situation above). The solution could be to set the maximal number of segments to 1 and not to create - in this case - a file of its total size at the beginning of a download/when queueing.

BTW, the diffs of JDC++ 0.401 (http://jdcpp.free.fr/) with best source selection mod have a total size of 19 k. That is a bit, considering that it's 14 files modified, but still it's not that much. The work's been done, but the patch wasn't accepted due to formal problems (bottom of the site). I understand these, but it's a pity that noone has made a(n) (successful) attempt to comply with vanilla's code standard since then (I don't know C++ well enough myself).

PseudonympH · Post by **PseudonympH** » 2004-12-29 00:00

paka wrote:That's exactly what I meant. Maybe I wasn't precise enough - I have a queue of about 150 GBs, about 10-20 GBs of free space on the download partition and I'm able to download successfully at the moment. Suppose you get the point now. With multi-segment downloading it's going to be impossible.

However, with multisource, individual files will complete faster, so they spend less time in the temp directory. Besides, the multisource algorithm could be implemented in leapfrog fashion from the beginning of the file instead of random order, so we don't need to allocate the whole file up front.

paka · Post by **paka** » 2004-12-29 08:59

PseudonympH wrote:However, with multisource, individual files will complete faster, so they spend less time in the temp directory.

You're probably right. The question is: how much faster? 2 times? I don't think so. The total upload bandwidth will remain the same (unless ISPs change it) so multi-source can only optimise its use. With some slight manual management, I have usually no problems using up my 1 Mb/s.

Still I would have to reserve 150 GBs for a 150 GB queue (or even 50 GBs, if it was smaller due to faster downloads) and now I don't have to.

Besides, the multisource algorithm could be implemented in leapfrog fashion from the beginning of the file instead of random order, so we don't need to allocate the whole file up front.

Yup, so there is a solution. The question is: how difficult to implement will it be?

Post by **GargoyleMT** » 2004-12-29 12:14

paka wrote:Suppose you get the point now. With multi-segment downloading it's going to be impossible.

No, I don't understand. You've queued 150 GiB of information, but don't have space for all of it. If all of your sources were available now, you'd also run out of space...

Also, google for Sparse Files on NTFS disks...

Usually better slots (with higher speed, of course) are occupied anyway. What I'm afraid of is the situation when hub admins change the slot rules to higher minimal numbers because the demand for slots may increase.

That might happen, but I don't see why it would - hub owners are just as interested in users downloading quickly as the users themselves are.

I understand these, but it's a pity that noone has made a(n) (successful) attempt to comply with vanilla's code standard since then (I don't know C++ well enough myself).

Only the original coder can submit the patch to arne, since he's transferring copyright on it, so that DC++'s source is only under one person's copyright.

paka · Post by **paka** » 2004-12-29 18:04

GargoyleMT wrote:No, I don't understand. You've queued 150 GiB of information, but don't have space for all of it. If all of your sources were available now, you'd also run out of space...

But they aren't - this is the reality and this is why it is possible. Limited bandwidth also prevents from too speedy disk space consumption.

Also, google for Sparse Files on NTFS disks...

Yeah, my fault that I'm not using NTFS5, but NTFS3 which isn't capable of handling sparse files. What if I had to use FAT32 for some reason? But OK, that's true, we shouldn't support obsolete solutions.

That might happen, but I don't see why it would - hub owners are just as interested in users downloading quickly as the users themselves are.

But they do get influenced by users who demand more slots. I've simply seen that many times. And DC++ has a limit on transfer connection attempts made in a unit of time, AFAIK.

Only the original coder can submit the patch to arne, since he's transferring copyright on it, so that DC++'s source is only under one person's copyright.

True, but the author of this patch encourages himself to resubmit the patch (see the bottom of the JDC++ webpage). This is an implicit agreement to change the copyright to arne when someone else submits the modified patch.

Post by **GargoyleMT** » 2004-12-30 12:52

paka wrote:This is an implicit agreement to change the copyright to arne when someone else submits the modified patch.

If it were challenged in a court of law, would that hold up? I suspect it would not.

paka · Post by **paka** » 2004-12-30 19:35

GargoyleMT wrote:If it were challenged in a court of law, would that hold up? I suspect it would not.

You're probably right that if JDC++ mod's author really wanted to do some legal abuse, he might win a lawsuit. I read the statement as a good will to put the code he created in the main version of DC++.

DC++

Improved sources search

Improved sources search

Re: Improved sources search

Re: Improved sources search

Re: Improved sources search