Autosearch Tweak

Archived discussion about features (predating the use of Bugzilla as a bug and feature tracker)

Moderator: Moderators

Locked
jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Autosearch Tweak

Post by jbyrd » 2003-12-30 14:05

Alot of people (myself included) have been complaining about how many filelists are being downloaded as a result of the .305 autosearch feature.

I never would have complained if I would have known that the result of my whining would be this:

Code: Select all

-- 0.306 -- 
* Changed autosearch so that it only searches if less than 5 sources are online, this should stop galloping filelist downloads as well 
5 sources? Why 5 sources? This is such an arbitrary number, and it is really not that much better than the way we had it before...with 0 sources (if all 5 sources' slots are full). The key to autosearch is to find an alternate that is online [obviously] and has an open slot.

It is my suggestion, my plea, that this does not continue for versions >0.306 .

Here is what it should look like in the future:
Instead of setting an arbitrary number, I suggest we make it an option in the "downloads" tab of the settings window, exactly like the "upload slots" and "simultaneous downloads" selection.

This way the user will be able to set his priorities, and not have the client/developer setting them for him.

And would someone please explain to me exactly how downloading filelists lowers the strain on hubs? I understand that you obtain the filelist from the alternate source (which was found via the hub, anyway). And I am under the impression that subsequent searches also resort to looking in the filelists ( :?: )...which leaves me very confused, because it has to query the hub in order to find additonal results...unless by the grace of God an alternate source from file#1 also has file#2.

Please comment on the feature...as well as my question above.

-jbyrd
Hehe.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Re: Autosearch Tweak

Post by ivulfusbar » 2003-12-30 14:38

jbyrd wrote:
And would someone please explain to me exactly how downloading filelists lowers the strain on hubs? I understand that you obtain the filelist from the alternate source (which was found via the hub, anyway). And I am under the impression that subsequent searches also resort to looking in the filelists ( :?: )...which leaves me very confused, because it has to query the hub in order to find additonal results...unless by the grace of God an alternate source from file#1 also has file#2.

Please comment on the feature...as well as my question above.

-jbyrd
Its very simple: Consider a user downloading a mp3-album. If the user has no online users, the old system will go into auto-search mode. And individualy broadcast a search for every mp3 in his queue. This will cost "number of songs" * "number of users online" (more or less, not counting passive $SR et.c..)

In the match-queue example: You will go into auto-search-mode, search for the first song, get matches from n users. Then initiate n client<->client communication and have a very high probability of getting matches to all files in his queue. This will more or less costs "number of users online", since client<->client communication is a none-broadcasted message.

The match-queue is not perfect, and since it can very fast corrupt your queue, it is a little bit more restrictive in finding alternative sources than what auto-search-for alternative sources is, i.e. two different string-comparing algorithms are used. This can sometimes lead to you downloading the list over and over again. But it doesn't happen that often.

So basicly its the following which makes it work well;

The probability that the user you find one match in autosearch will have more matches and the extra none important bandwidth it uses since its not broadcasted is better than search over and over again without using the fact that people often download files which belong together and therfore possible alternative sources will have more than one match.

(searching is expensive, download filelists are not).
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2003-12-30 14:49

When it comes to the magic number five, i must say i prefer 3 or 7. They have higher symbolic power than 5 in most religons.

Having users fill one more number will probably confuse them more then help them. (i hope no one will mention the DF i did in this context a couple of days earlier *blinks*).
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2003-12-30 15:08

Well, it seems like the only place I was wrong was...
I wrote:unless by the grace of God an alternate source from file#1 also has file#2.
But, what good is a single source that is an alternate for, say 10 files? Well, I guess it depends how large those files are. But, if we are talking >100MB, then that single source is practically null in 9 of the other files because he can't upload *that* to you until he is finished uploading the *first* file you asked for.

I don't search MP3s very often at all, so I find that the excessive downloading of filelists isn't very beneficial for me. Oh well, no problem.

It seems to me (from the program behavior I've experienced) is that it initiates a search anyway. I was under the impression that the *number* of results returned wasn't a concern to the hubs, but that it was simply the act of initiating a search. If this is the case, then nothing is really being saved.

So, the excessive filelist downloads is the result of not being able to perform an adl search on previously downloaded filelists? Ahh. Well, I guess a major help would be the "ability to search old filelists" feature (yes, the one I was strongly against). Heck, at least that way, you could download the filelist every hour instead of 10X/hour.
Hehe.

norti
Posts: 34
Joined: 2003-10-22 14:42
Location: Hungary
Contact:

Post by norti » 2003-12-30 16:03

So, the excessive filelist downloads is the result of not being able to perform an adl search on previously downloaded filelists? Ahh. Well, I guess a major help would be the "ability to search old filelists" feature (yes, the one I was strongly against). Heck, at least that way, you could download the filelist every hour instead of 10X/hour.
Or another solution would be to have 2 different folders for filelists:
one for the usual (manual) downloaded ones and an other for the auto downloaded.
.: Norti :.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2003-12-30 16:12

Yeah, but it seems that the client doesn't have the ability to efficiently search multiple filelists if they are compressed (.bz2 format). The easier, the better, though. :)
Hehe.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2003-12-30 19:21

jbyrd wrote: But, what good is a single source that is an alternate for, say 10 files? Well, I guess it depends how large those files are. But, if we are talking >100MB, then that single source is practically null in 9 of the other files because he can't upload *that* to you until he is finished uploading the *first* file you asked for.
I answered the part about bandwidth use. And if you don't think a bandwidth use reduction from N*m to N is enough.. then i don't know what to tell you.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2003-12-30 19:24

jbyrd wrote: So, the excessive filelist downloads is the result of not being able to perform an adl search on previously downloaded filelists?
You can do this today already.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2003-12-31 08:13

Well, it's obvious that I don't fully understand the details of the protocol. :D

But, anyway, I would still like to have an option to choose how many alternate sources I want...instead of having some arbitrary preset that has no symbolic meaning whatsoever.
Hehe.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2003-12-31 09:19

jbyrd wrote:It seems to me (from the program behavior I've experienced) is that it initiates a search anyway. I was under the impression that the *number* of results returned wasn't a concern to the hubs, but that it was simply the act of initiating a search. If this is the case, then nothing is really being saved.
Here's a correction: if you search in passive mode, all of the search results will be routed through the hub, so the number of search search results do matter.

If matching someone's list against your queue results in more sources, that will translate into less auto-searches, which is definitely a win for the hub.

(Aside: Does anyone have the the stats for command breakdown for a typical DC hub handy? I think that might help...)

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Re: Autosearch Tweak

Post by ivulfusbar » 2004-01-01 05:26

ivulfusbar wrote:
This will cost "number of songs" * "number of users online" (more or less, not counting passive $SR et.c..)
Well, since the passive $SR is insignificant in auto-search replies since they are 1<-->1 and not broadcasts.

*grins* ,))
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2004-01-02 00:10

GargoyleMT wrote:If matching someone's list against your queue results in more sources, that will translate into less auto-searches, which is definitely a win for the hub.
Thanks for the clarification, Gargoyle. :)
ivulfusbar wrote:
jbyrd wrote:So, the excessive filelist downloads is the result of not being able to perform an adl search on previously downloaded filelists?

You can do this today already.
Then why do we have to download a new filelist every 10 minutes? It seems unlikely that it would be refreshed that often.


What I am afraid of is penalizing active users for the limits of passive users. I am aware that there are a whole lot of them in the hubs, and that hubs have limited resources. I just don't like the idea of creating a feature that revolves around the limitations of passive mode, leaving active users out in the cold.
Hehe.

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2004-01-02 04:56

I don't understand your last comment at all. I have no idea what you are talking about.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2004-01-03 02:01

ivulfusbar wrote:I don't understand your last comment at all. I have no idea what you are talking about.
Hopefully I can clarify (feel free to patch my holes in logic/fact).

I thought the problem with the autosearch feature (and the way it is now) was the strain on hubs. In particular, passive users provide an extraordinary burden because all results must be routed through the hubs. On the other hand, active users only require a search initiation or something along those lines...in other words they do not require every single search result to be sent through the hub.

Where did I get this crazy idea? Besides the fact that I actually "know" it...
GargoyleMT wrote:if you search in passive mode, all of the search results will be routed through the hub, so the number of search search results do matter.
...which leads me to believe that this is not the case for active users.

Now, if you're not disputing this fact, and just my reasoning (again, assuming that my facts are straight)...I would hate to see a feature be watered down because of passive users. I would personally rather reward active users for configuring correctly and lowering the burden on hubs.

How about this (if the other option is out of the question):
Increase the number of sources obtained when the priority level is increased.
i.e. "Normal"=10
"High" = 15
"Highest" = 20

:?:
Hehe.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-01-03 09:18

Automated searches for alternatives are the primary cause of bandwidth usage on hubs, whether active or passive. Passive ones are merely more draining (resource wise).

You have good points about the frequency of list downloading, and that's what the two originally quoted items from 0.306 are intended to do: stop so many lists from being downloaded.

Code: Select all

 -- 0.306 --
* Changed autosearch so that it only searches if less than 5 sources are online, this should stop galloping
  filelist downloads as well
* Auto-match queue is only done on exact match
Creating a scheme where you can tell whether a remote filelist is updated is a lot more work than the measures above.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2004-01-03 11:07

Gargoyle, I understand that the changes were implemented to decrease the number of filelist downloads. But, what I'm saying is that it shouldn't be at the expense of finding alternate sources.

My idea of a solution to the existing problem would be this:

If we are concerned with the number of filelist downloads, don't download one every 5 minutes.

If we are concerned with the strain on hubs, search for alternates less often. For my purposes, 5 minute auto searches can be excessive at times. Why not 10 minutes, or 15 minutes? Better yet, make it an option in the settings tab and let me choose how often I need to search for alternates.

Of course these settings should have lower limits to prevent excessive searches/filelist downloads. I just enjoy choosing when possible.
Hehe.

Locked