Distributed DC network

Technical discussion about the NMDC and <a href="http://dcpp.net/ADC.html">ADC</A> protocol. The NMDC protocol is documented in the <a href="http://dcpp.net/wiki/">Wiki</a>, so feel free to refer to it.

Moderator: Moderators

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

Distributed DC network

Post by seeSharp » 2003-04-19 10:28

Hi all!

I don't know wheter it was discussed earlier or not, but I didn't find anything on this forum about this topic.

Background: we are running a semi-closed HUB network, with 900+ users on the biggest hubs. We can't let in more users, we have tried Hacked NMDC HUBs, PtokaX and now a Linux HUB SW. It seems, that the network utilization goes to high with that many users. All of the hubs are DC++ only (registered users can use other clients to, for now DCGui).

Would it be possible to extend the DC protocol to support not only the current network topology (one HUB - many Clients) but a distributed one too?

My idea would be:
The client should connect the way it does know, and then it would receive 0-1-2-... other ip addresses where he should connect. Upon successful connection, he would then notify the HUB sw.

As we see, the most part of the outgoing network traffic is:
$Search commands, $Hello/$Quit/$MyNick, Main chat.

NW modells could be:
One HUB server for authenticating, user login/logout management,
One for searching, and chat.

This could then cut the traffic by half.

The other proposal would be to allow clients with good network connection to become relay-agents, creating a three level network. Other clients then could connect to those relaying ones. So the most commands, sent to everyone would be sent to about 10% of the users, which then would simply send those commands to the clients connected to them.

One clients load of relaying tp 5-20 others wouldn't be too big, while the HUB's outgoing traffic could be cut by a factor of 10. This would make possible HUBs with more than 2000 users.

Combining the two proposals - More servers - and relaying clients would bring the paradise.

As I far as I have investigated into this topic, it wouldn't require to much coding. Security issues could be cleared too.

Awaiting answers, seeSharp.

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

DDCP details

Post by seeSharp » 2003-04-19 11:15

Well, some details about how I imagined the whole thing. I will describe the situation where some clients are relaying HUB traffic.

So, upon connecting to the HUB client A should notify the HUB, that it is able to relay HUB traffic, sending connection details, indicating max. number of connections accepted, eg:
$Supports ... DDCP|
$ddcReady2Relay 127.0.0.1:1413 30|

When a client B connects to the HUB, it could notify the HUB that it knows about the ddc protocol extensions:
$Supports ... DDCP|

The HUB manages connections, trying to keep the number of outgoing mass-messages as low as possible, so instructs the new client B, to connect to a relay-server:
$ddcRelayServer 127.0.0.1:1413|
Along with this, a message should be sent to client A, to notify him, that the HUB decided to hook up someone (just for the security):
$ddcRelayClient client B|

The client B then tries to connect to client A, when succesful, it notifies the HUB:
$ddcRelayConnected nick|

The HUB then sets the users property relayServer to that nick. From now on mass messages won't be sent to him.

$ddcRelayDisconnected nick|
could be used to indicate the HUB, that a connection is closed.

If client A receives a $Quit client B| then it should close it's connection to him.

The HUB message parser should be cut into two parts, one for private messages ($PrivMsg /or similar, i don't remember.../, kicks, etc...) and an other one for mass-messages like $Hello,$Quit,$Search,$MyInfo,etc...

The Socket connection to the HUB should be used to send all messages to the HUB, and incoming data should be parsed for private AND mass messages.

The connection(s) to other "servers" should be parsed only for mass-messages.

The relaying logic would be simple too: client A has a list of Users connected to him, if he gets a mass-message he simply has to forward it on those connections. By the way, the number of accepted relay connections should be configurable on a per-hub basis.

Security: if we are paranoid enough, the protocol could be encoded using a private/public key system. Upon connection, every clients should receive the public key of the HUB, after that, every packet received through a relay-server could be decoded using this key. The relaying clients would be unable to modify the packets.
I'm not sure if it's really needed, because it makes everything much more complicated, and I think it would be enugh, if the HUB software would choose relaying servers amongs users marked by OPs to be "trusted".

I think it's more than enough for now, feel free to tell your opinions and ask questions.

seeSharp

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-19 20:50

This is one of the more interesting DC++-related ideas I've come across, and I've thought about it before as well.
My idea would be:
The client should connect the way it does know, and then it would receive 0-1-2-... other ip addresses where he should connect. Upon successful connection, he would then notify the HUB sw.
Hrm, the hub would have to keep track of who's passive and such, 'cause they can't take incoming connections. The hub, at the moment, holds almost no state, and certainly doesn't (usually) know whether a user's passive or active, and I'd rather keep it this way - keep as little state as feasible about the users on the hub.
[description of message signing/encryption]
I'm not sure if it's really needed, because it makes everything much more complicated, and I think it would be enugh, if the HUB software would choose relaying servers amongs users marked by OPs to be "trusted".
I would consider using signed messages critical to the practicality and trustworthiness of such a system. However, one would have to devise a way to detect relay stations which simply drop messages, as well; message signing doesn't help here, I'd think.

If you want simplicity, I would again suggest a dchub-type hub cluster (maybe other hubs implement it too?), where the hubs mirror all messages to each other, in a transparent manner to clients, which may view it as one logical hub.

It helps bandwidthwise because:
With n clients and an average of x kB/s of incoming messages to be broadcast to all connecting users coming from each user, required hub downstream bandwidth required/rate of messages to be broadcast:

Code: Select all

(n*x)
It then has to send that (n*x) back out to all n users (including the one who sent it out originally):

Code: Select all

n*(n*x) = n^2*x
Thus, main chat, $MyINFO, $Hello, $Quit, active search, and all the other broadcast parts of the client-hub protocol require n^2*x upstream bandwidth (passive search is worse, but harder to analyze without assuming a certain response rate by clients, and O(n^2) scaling is bad enough).

To generalize this to a hub cluster of h hubs (holding a total of n users), note that each peer hub requires as much upstream bandwidth as a client connected to the hub - but it represents, say, h/n users. Now, assuming even that every hub is connected to Lg(h) hubs, each hub needs to send x to h-1 hubs: x*Lg(h). However, it only has n/h users, so total upload bandwidth required is:

Code: Select all

x*(effective_users)^2 = x*(n/h+Lg(h))^2
To compare this with the single hub case (which now appears as a special case when h=1):

Code: Select all

n^2*x
and find the the decrease in bandwidth k with a hub cluster with n users and h hubs over one hub with those n users:

Code: Select all

n^2/k = (n/h+Lg(h))^2
n/sqrt(k) = n/h+Lg(h)
k = (h*n)^2/(n+h*Lg(h))
Which is quite favorable.

Further, it doesn't introduce any new trust issues, as hubs are already trusted by clients on the DC network.
The relaying logic would be simple too: client A has a list of Users connected to him, if he gets a mass-message he simply has to forward it on those connections. By the way, the number of accepted relay connections should be configurable on a per-hub basis.
Unless there's a loop in the graph. You can avoid this by design (and thus not have to design a protocol able to withstand it) with a controlled hub cluster, but it's harder with autonomous clients, so in practice the protocol would have to be able to detect & handle it, a complication of note.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-19 21:27

Whoops.

Code: Select all

x*(effective_users)^2 = x*(n/h+Lg(h))^2
is wrong. It should be:

Code: Select all

n*(n*x) = (n/h+Lg(h))*(n*x)
because it still has to send everyone's messages out to its users, not just those from its own hub. Thus,

Code: Select all

n^2/k = (n/h+Lg(h))*n
k = h*n/(n+h*Lg(h))
a much prettier formula, and which can be approximated as k being roughly proportional to h when h is small compared to the number of users. For an average of u users/hub:

Code: Select all

h = n/u
k = (n/u)*n/(n+(n/u)*Lg(n/u))
k = n/(u+Lg(n/u))
Which exhibits fairly proportional growth with reasonable u's.

HaArD
Posts: 147
Joined: 2003-01-04 02:20
Location: Canada http://hub-link.sf.net
Contact:

Post by HaArD » 2003-04-19 22:21

cologic,

Can't get my head wrapped around your math right now but I'll read it later...

In the meantime, I think I read something about passing chat between two hubs seamlessly...

Try this... http://flixd.no-ip.com Click the MHC BOT link.

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

ddcp

Post by seeSharp » 2003-04-20 02:21

Well,

de simplier one first :)

HaArD: we not only want to connect the chat, but tu expand the maximum user limit on the HUB. The linux HUB, which we are running now supports this thing, the problem is: each HUB runs well with 600-900 users, but in linked mode each can hold just a few hundred, above that they get out of sync in just a few minutes, and then, every user is disconnected, we had to restart both HUBs.

And, the incoming traffic and user handling is not a problem, so I think it would be easier to solve the problem of the outgouing traffic.

cologic:
your math is really long, and I have to go now, so I will check it later, but:

Active/Passive mode:
Passive mode clients, and the one, who don't want to relay, would send something like this:
$ddcReady2Relay 127.0.0.1:1413 0|
to indicate, they don't want anybody to connect to them. Or they simply don't send this message.

Sorry, I have to go now, I will come back later today.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-04-20 02:56

A move to a more distributed system would more naturally come after certain other features have been put in place.

There are some stages to evolving DC into something much better than what it is now. I used to have a really good idea of these stages in my head, but I don't participate in these forums as often anymore. It's something like the following, though:

1. Ratings system (or some other game theoretic addition)
2. multisource downloading, hash-based searching
3. Further distribution of resources (what you're suggesting)

The first two will help lend you global namespaces, which would be very useful in a distributed system. They would also help formalize search vectors (heh, couldn't help myself), making distributed systems able to be more efficient and scalable.

So, if you want distributed DC, which we all should, work for the accomplishment of these other stages.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-20 05:26

HaArD: we not only want to connect the chat, but tu expand the maximum user limit on the HUB. The linux HUB, which we are running now supports this thing, the problem is: each HUB runs well with 600-900 users, but in linked mode each can hold just a few hundred, above that they get out of sync in just a few minutes, and then, every user is disconnected, we had to restart both HUBs.
That sounds like a poor implementation; I've seen such clusters work well, and IRC networks scale to 100,000 users, so the concept is sound.
Active/Passive mode:
Passive mode clients, and the one, who don't want to relay, would send something like this:
$ddcReady2Relay 127.0.0.1:1413 0|
to indicate, they don't want anybody to connect to them. Or they simply don't send this message.
Well, there's nothing inherently wrong with that - I just see that a similar effect is (should be) achievable through more compatible means, such as the hub clustering with which you've apparently had experience. Clearly, if you change the protocol, you can make the clients inform the servers of their ability or inability to function as a relay.

Other related protocols of interest here are eDonkey and IRC: eDonkey servers appear to handle a couple of orders of magnitude of users higher than DC servers, despite its protocol looking pretty similiar. However, the logdunum eDonkey server web page refers to a default soft limit of 2,000 files/user, after which files are ignored, and a hard limit of 5,000, after which that user is kicked. It sounds like there's some nonscalable part - servers storing search indexes for everyone or something?

Of a different sort of interest is IRC: it's solved the scaling problem for medium-sized networks, still orders of magnitude beyond DC; one approach, then, is to ignore the current DC client-hub protocol entirely and create a CTCP protocol extension for IRC. Being exensible, a stock IRC server should suffice. This doesn't do much for decentralization, but I'm not sure how much bigger I want a single DC entity to become than that anyway, and there are suddenly many clients, for many platforms, available.

From logs I've seen, search dominates hub bandwidth by a huge margin, 20:1 over main chat, the nearest competitor, last time I looked. Thus, search is the main problem to solve bandwidthwise, and splitting the search server from the chat server will produce a chat server using negligable bandwidth and a search server the bandwidth requirements have been only negligably reduced from those of a full DC hub. It seems to me that's not the most effective approach, thus my other suggestion.

Finally, in any decentralization of DC, security and trust are my primary concerns. DC currently exists such that hubs are given full trust from the point of view of a user's client, and until/unless hubs are dropped entirely (in which case, why use DC?), any additonal routing nodes added serve additionally as extra potential points of vulnerability.

From my scanning of the G2/Mike's protocol released a month ago or so, it only vaguely addresses this issue: its iterative search style strikes me as less vulnerable to jamming, so to speak, but I don't recall if he addresses the issue of corrupt (supernodes? whatever the search responding things are) refusing to cooperate, beyond hoping that clients have enough connections to honest nodes to provide them with search results. There are research P2P systems which more rigorously address these issues, of course...

You mention briefly in your initial message that security can be addressed; I'm curious more precisely how.

HaArD
Posts: 147
Joined: 2003-01-04 02:20
Location: Canada http://hub-link.sf.net
Contact:

Post by HaArD » 2003-04-20 11:48

The linux HUB, which we are running now supports this thing, the problem is: each HUB runs well with 600-900 users, but in linked mode each can hold just a few hundred, above that they get out of sync in just a few minutes, and then, every user is disconnected, we had to restart both HUBs.
What version of MHC BOT are you using? anything below v2.5 has a bug that causes a lot of sync problems and dropped messages.

Are you running MHC on the same machine as the hub? You don't HAVE to , if you have another trusted user with spare bandwidth they can run MHC and simply act as the relay point thus distributing the load.

I am currently working on enabling searching/downloading using the same UDP messages as NMDC hub thus making MHC a VIRTUAL hub that links REAL hubs. I'm doing this in MHC because DC++ does not support $Multisearch/$MultiConnectToMe and >80% of our users are running DC++

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

ddcp

Post by seeSharp » 2003-04-20 12:52

Hi volkris! I've been there when we were speaking about the ratings system, on a different nick though (I didn't have time for a while to log in, and now, I forgot my password + mail address, which I used:).

I don't think that the evolution has to go that way. We have been toying with some ratings systems, at the and it came out, that it is a lot of work to do and many users (not only fakers, but good sources too) disliked the idea. Many people hate being "watched".

As for number 2.: that is a different question I think.

Number 3.: It's a problem that we have now, and it's there for months. We simply can't find a place/hw which could host more than 900 users. And there are more than 1300 users waiting to log in. Number raising...

Distributed HUB load seems easier to do, and it has a much higher acceptance than the first and second one.

IRC servers can serve more users because: users are making less traffic (no search, many rooms). And most of them are running on dedicated and professional servers. DC Hubs are running mostly on simple PCs, and on a different connection. Our problem was many times the router. It simply gave up after a while.

In my first post I mentioned two ways: either the clients could relaying mass traffic AND/OR specialized HUBs. The second one is more secure, the first one is "cheaper" and you have greater availability. Now we have 1000 users, and we could name at least 50-100 as trusted. With that many relaying client we could go over 2000 users, where we could find more "trusted" users. It surely should be left to the HUB to choose relaying agents.

I wouldn't like to change the whole protocol, because it's not necessary. I don't want to build a new network (DC2 or so), just extend the existing one. Support the same users, let in the existing clients.

The addition of DDCP should be as easy as possible to the existing clients.

And by the way, I don't want to change the Client/Hub modell at all, I do believe that this is part of the "DC" spirit, and thats the one that makes is so populat, and and makes it WORKING.

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

Security

Post by seeSharp » 2003-04-20 13:06

My ideas to security issues:

1. (The easier one...:): Hubs shouldn't let users/clients be relays, Hub owners should setup relay servers and let some users connect to them.

This is easy because: it won't bring up new threats, the trusted one is the Hub in the communication.

The clients don't have to support relaying, they just have to be able to connect to a second address, and parse mass-messages coming from there too. It's not a big deal.

Writing a HUB relay server is not a big thing too.

2. (could be done some time later): Include some two key encoding to the system. A PGP like one. The HUB gives his public key to the client at login. The client connects to a relaying client (to a supernode), and checks every incoming message. (that means, that outgoing mass-messages should be encoded on the Hub's side).

My question are:
I don't know how much processing power is needed to encode the messages (it should make it faster, that the key can be relatively small - we can change it quite often, say every hour or so).

Is there any private/public key encoding implementation that is available to including in OpenSource projects, and can be compiled on both Win32/Linux?

2HaArD: we didn't use MHC. Our hakced NMDC hubs went down too often with more than 500 users, and they have always been running with 100% processor usage.

And the linux hub does support HUB linking - you see all the users on the linked HUBs, you can search/download. The problem is, that this implementation is done so, that there is no need to change on the client side. And because of that, the server side got a bit hard to stabilize.

With some small changes in the clients, it should be able to achieve the same, and stay more stable.

And older/other clients could connect directly to the Hub as now, utilizing the resources available there.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: ddcp

Post by volkris » 2003-04-20 17:45

seeSharp wrote:I don't think that the evolution has to go that way. We have been toying with some ratings systems, at the and it came out, that it is a lot of work to do and many users (not only fakers, but good sources too) disliked the idea. Many people hate being "watched".
No, it doesn't have to go that way, but that way is probably the most efficient, natural way to get from here to there with the most utility being "grown" in the meantime. If you have the time to wait it is definately the right direction to go. If not, it would take more time and so perhaps you're better off jumping right in. However, by jumping right in to distributed hubs you'll be having a less efficient, less scalable system without nearly as many benefits as if you followed the more natural evolutionary path.

Specifically addressing what you said about ratings systems, it isn't a lot of work to do at all, and users dislike the very idea of uploading in the first place. For DC to work sacrifices have to be made, SOMEONE has to upload. This is similar.
As for number 2.: that is a different question I think.
Number two will get you global namespaces, etc, which are very useful in any distributed application. Multisource downloading will enable the system to scale much better. Without things like this you're going to hit technological walls where the system simply can't grow any further and begins to buckle under its own size. You could say that #1 removes social blockages to expansion while #2 removes technological walls. See Gnutella for an example of a distributed system that hit these walls and failed.

<i>And by the way, I don't want to change the Client/Hub modell at all, I do believe that this is part of the "DC" spirit, and thats the one that makes is so populat, and and makes it WORKING.</i>

To be sure, none of the steps I mention require an overhaul of the core of DC. And then I would disagree with your contention that it is actually working. I see DC as barely limping along, myself, especially considering where it COULD be.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-20 18:24

1. (The easier one...: Hubs shouldn't let users/clients be relays, Hub owners should setup relay servers and let some users connect to them.
Okay, I have no issue with this, but it sounds much more like what I suggested anyway than this system of, for example:

Code: Select all

$Supports ... DDCP| 
$ddcReady2Relay 127.0.0.1:1413 30|
$Supports ... DDCP|
$ddcRelayServer 127.0.0.1:1413|
$ddcRelayClient client B|
$ddcRelayConnected nick|
(and I'm still curious why you don't just make them full hubs, if they're now special nodes trusted by ops anyway.)
2. (could be done some time later): Include some two key encoding to the system. A PGP like one. The HUB gives his public key to the client at login. The client connects to a relaying client (to a supernode), and checks every incoming message. (that means, that outgoing mass-messages should be encoded on the Hub's side).
Okay, with basically only other hubs having access to the distribution of messages, I'll agree this isn't strictly necessary - hubs are indeed already trusted. As soon as you bring in "at least 50-100 as trusted" users, though, I wouldn't accept a system without such cryptographic support; I don't believe you can reliably find that many truly trustworthy users.
And the linux hub does support HUB linking - you see all the users on the linked HUBs, you can search/download. The problem is, that this implementation is done so, that there is no need to change on the client side. And because of that, the server side got a bit hard to stabilize.

With some small changes in the clients, it should be able to achieve the same, and stay more stable.
I would again point to IRC. Whilst IRC clients are somewhat aware of the distributed nature of the network, they're only vaguely so, as they don't act as relays in any manner. For example, they can message users on a specific server, but the IRC server network actually routes that message, and the client simply sends it to its local server. I still see no need to modify existing clients with competent implemenations of linking.
However, by jumping right in to distributed hubs you'll be having a less efficient, less scalable system without nearly as many benefits as if you followed the more natural evolutionary path.
I'm not sure how doing this precludes following the path of evolution you propose, or makes it any more awkward.
Is there any private/public key encoding implementation that is available to including in OpenSource projects, and can be compiled on both Win32/Linux?
Crypto++

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-21 10:35

The main bandwidth problem has been said to be searches so here are my views on this excellent conversation.

Search request coming from ClientA uses virtually no bandwidth compared to request sent from hub to every user, telling them to return results to ClientA.

Simplest solution here is the relay system mentioned above, let's walk trough the ideas presented on this topic again.

ClientA: Connects to hub
ClientA: Notifies it's compatible with relay listening and/or sending
--Hub decides which RelayClient ClientA should connect to or if ClientA should become a RelayClient
Hub: Sends mesage "Connect to RelayClientA" or "You are now a RelayClient"
Hub: Sends message to RelayClients about new incoming RelayListeners

Any search request to hub is then only sent to:
Passive users
Non Relay compatible clients (hub could allow only compatible clients...)
RelayClients (obviously)

If my understanding of the search system is correct this would help a lot to lower bandwidth usage on server and not be to hard to implement, but it requeres upgrading of both DC++ and the hub software.

Which means we need Arne and DCH++ on this. =)
It's a good start to create larger hubs.

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

Post by seeSharp » 2003-04-21 18:54

Out experience has shown, that it's far from easy to synchronize two big HUBs with eachother. The first and smaller problem is, that both hubs have to be in the same "initial state" - HUB SW version, config, scripts, op logins/passwords.

After that, each HUB has to make decissions - with it's own scripts, with it's own requests, state information, and from that, what it gets from the other HUB (or HUBs if we link 3 of them).

The root of the problem is the outgoing traffic. And that can be reduced by adding simple relays into the system, which is technicaly a lot simplier to make than syncing 2-3-more HUBs together.

Besides, writing a relaying station is not a big deal, and it can be made stable and working. After that, you have to install it, run, and you are ready. You don't have to update your relay station, when you update the HUB. You don't have to reconfigure it so often.

And you don't have to stay on the same platform - you can run your HUB on win or linux and connect relay stations running on win AND linux.

Iv'e checked Crypto++'s page, it seems to suite the needs. It could be used when (or if ever :) some clients become relayStations.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-04-21 22:08

There are some factual issues that need to be clarified, IMO:

seeSharp, when one of your hubs is running at its absolute peak, are all of the bottlenecks bandwidth related? How is it doing on CPU and memory?

How does the bandwidth usage break down more specifically? About how much is used by each of searching, chat, and administration? Nobody has here has done more than glance off this question.

I think there really might be some better solutions than the ones tentatively proposed here; in particular I get a bad feeling that the proposals so far are more client side than they should be.

You talk about problems with synchronization in these distributed setups, but I don't see why the things that require synchronization have to be distributed in the first place. Surely chat isn't a big enough load on the hubs that it has to be distributed. The same with authentication and such. Searching itself doesn't really need to be synchronized.

Just thinking out loud...

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-21 23:47

How does the bandwidth usage break down more specifically? About how much is used by each of searching, chat, and administration?
Argh, stupid Java implementation hung IE on the Mac just now, lost an almost complete post. Anyway, I'll try to do more than glance off the question:

Code: Select all

Total messages:
bunzip2 -c | wc: 1373324 4215281 64489022

Broadcast messages:
bunzip2 -c | grep "^\$Search" | wc: 60268 180818 4601116
bunzip2 -c | grep "^\$Search Hub:" | wc: 22806 68418 1689840
bunzip2 -c | grep "^\$MyINFO" | wc: 38887 30139 366028

Non-broadcast messages:
bunzip2 -c | grep "^\$ConnectToMe" | wc: 869538 2608614 40024878
bunzip2 -c | grep "^\$RevConnectToMe" | wc: 407685 1223055 18293513
bunzip2 -c | grep "^\$MultiConnectToMe" | wc: 36 144 2412
This was on a hub of 300-400 users; I forget the exact count at the time. Search, to all 300 users, consumes 24 times as much upstream bandwidth as all the ConnectToMe variations combined. This is the basis for my 20:1 ratio cited earlier. I'll see if I can get statistics on main chat from the log now, but I'll post this message to safety first...

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

Post by seeSharp » 2003-04-22 09:52

Well, our synch problems came when we tried to connect two HUBs. This setup wasn't a HUB - RelayServer - Client setup, but simply two HUBs, connected with eachother.

It's been mentioned here before, that is possible, so I wrote, what problems we had with it...

By the way, I never told that I want to distribute user management (login, kick, etc...) or private chat. It's only about splitting the mass traffic.

We also have about the same Search requests (maybe a bitt less, 20:1-15:1), there are some search rules, so the HUB is dropping a few of them :).

And yes, the problem is the outgoing traffic. After forgetting NMDC HUB we have plenty of spare RAM and CPU time... :)) Users simply get disconnected from the HUB under heavy load. On some ocassions we had to reset the router too.

And there is two proposal by the way: and the first one won't affect client's too much (just one more socket connection), the trust isn't shifted in any way to them. And I don't know a better load balancing solution (every one I checked needs some client side support - at least consuming commands and connecting to the right server).

One more thing I could imagine, and this one won't need any client side modification:
With an advanced relayServer it would be possible, that it could act as a HUB. Clients could simply connect to it, every command should be sent over to the HUB, and the HUB should send mass-messages only once.

The drawback would be, that the HUB-relayServer traffic would be higher, and we would introduce some latency by letting each command pass the "middleman".

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-22 12:17

On some ocassions we had to reset the router too.
Um, you weren't running the hubs on the same connection, behind the same router, were you? That'd be kind of useless...
With an advanced relayServer it would be possible, that it could act as a HUB. Clients could simply connect to it, every command should be sent over to the HUB, and the HUB should send mass-messages only once.
Huh? Isn't that what your previously stated notion of a relay server did too, or have I misunderstood something?

Why not, as I suggested in my first or second post to this thread, simply use a full hub with competent linking capabilites? You haven't stated the need that I can see to make a relay server special, and doing so limits the topologies using which you can connect the relay servers and hubs. True, you need to keep login information synchronized across them, but with any hub software that stores its user information in a separate file (all the one I've personally tried do so), that's pretty trivial. The hubs you tried didn't work so well, but of the three you listed:

-NMDC hub scales poorly anyway, even with one hub, so I wouldn't expect it to be at all usable in a multihub configuration.
-Ptokax doesn't come built in, last I checked, with the level of multihub support you want, so you must've used some script for or extension to it, which could have been programmed poorly.
-I'm most curious about the Linux DC hub software you mention; I know dchub's linking has worked fine for me.

However, none of those working satisfactorily implies only that, well, none of them worked satisfactorily. Two of them (NMDC and Ptokax) don't even have native linking support that works in any real sense, so their failure here doesn't, in my mind, cast doubt on the pure-hub approach.
And the linux hub does support HUB linking - you see all the users on the linked HUBs, you can search/download. The problem is, that this implementation is done so, that there is no need to change on the client side. And because of that, the server side got a bit hard to stabilize.

With some small changes in the clients, it should be able to achieve the same, and stay more stable.
I don't see the causal relationship here - please explain how increasing the diversity of the network by adding separate relay servers, or introducing client changes helps stabilize things.

Oh, and one final statistic I curiously left out, the number of non-$ commands, i.e. chat messages:

Code: Select all

bzgrep "^<" messages.txt.bz2 | wc: 18451 135822 954428
which is not actually 20 times less, but still about 5 times less than the number of searches, as well as potentially containing private messages [for which I'm not going to check, obviously] which aren't broadcast, thus reducing the real bandwidth numbers somewhat.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-04-24 22:36

I'd like to see what would happen if you could multicast out search requests. If you could get search requests out to the general Internet before they get split to all of the different listeners efficiency could improve drastically.

Not that that's horribly possible with today's Internet... but it's not completely out of the question either.

As I understand it ip6 has multicasting built in. Perhaps DC++ could be a driving force to get the word out about ip6. Considering the availability of tunnels and 6to4 type things it wouldn't necessarily be a horrible transition either. Only probably :)

But, another solution that keeps in mind the availability of more CPU and memory would be to do some of the tracking on the hub. Some say it's a bad idea legally to track filenames and locations on the hub, but it's very much a grey area. I'd say a good compromise would be to keep track of hashes on the hub and encourage their use.

Anyway, just some thoughts. It kind of sounds like there isn't a huge need for a new architecture right now, more of a need to get the existing one working properly. That you can't seem to get hub linking working now suggests some crappy software.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-24 23:02

But, another solution that keeps in mind the availability of more CPU and memory would be to do some of the tracking on the hub. Some say it's a bad idea legally to track filenames and locations on the hub, but it's very much a grey area. I'd say a good compromise would be to keep track of hashes on the hub and encourage their use.
Well, beyond the legal aspect (which I agree would be solved to my satisfaction by using hashes in place of whole filenames), it would require clients to upload userlists periodically to the hub in order to keep it in sync; this may overcome any bandwidth savings, though it is downstream bandwidth, generally easier to come by.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-04-25 11:28

cologic wrote:it would require clients to upload userlists periodically to the hub in order to keep it in sync; this may overcome any bandwidth savings, though it is downstream bandwidth, generally easier to come by.
Good point, though it would be nice if clients would be so kind as to inform the hub to add hash h to its list of hashes. Otherwise, yes, theh ub would have to occasionally request a whole new list.

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-26 09:00

I see no reason for the hub to keep the hublists! Then the hub should return searchresults? Nah, won't work.

The idea in this thread was to lower the bandwidth used by searching by distributing the load thorugh relaying, let's return to that shall we??

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-26 09:06

The hub could use that information to decide which users to which the search request is forwarded - if a user's known not to have a particular file, then don't send him the search.

That said, I do find that direction of discussion odd in a thread title "Distributed DC".

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-26 09:14

Just imagine the amount of string comparing the hub would have to do? I would have to do all the connected clients work. You'd need a 50 GHz cpu to do the job. Not to mention the memory requirements. No, that's not the way to go. Relaying is a lot better and should be pretty easy to implement in both clients and the server.

Arne: Could it be implemented in DCH++?

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-26 09:35

Could someone explain again what the role of the client is in relaying? If you want to involve it in any substantial way in this, I'd like to see much more acknowledgement of the security issues.

Further, much as it kept getting brought up, I see absolutely no need for any client-side modifications: this should be something handled completely by the hub network.

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-26 11:24

There are 2 things that needs to be changed in clients:

1: The client must be able to recognize a new command, which states what RelayStation IP and port it should listen to. Any client that does not recognize this command will recieve the search requests from the hub as usual.

2: Optionally; the client could incorporate the RelayStation code, meaning the server could ask the client to become a RelayStation. This means that the hubs can grow a lot bigger since every time the bandwidth is starting to get too saturated, the hub asks a new client to become a RelayStation then asks incoming clients or some already connected clients to connect to the new RelayStation.

Clients can select in the settings whether they will accept the role as RelayStation or not. A modem user should abviously not become a RelayStation...
______________________________________________________

Change #1 is obviously needed in the client, since in the normal protocol the client only listens to the Server IP. It is impossible for another IP(computer/connection) to send the search request to the client, as they would not be recieved. (not even sent).

Change #2 is optional, the hub owner could set up RelayStations on other computers&connections. But having the support in the hub and client alone means that any hub could have RelayStations, even if the owner doesn't have any other connections and computers.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-26 13:56

Change #1 is obviously needed in the client, since in the normal protocol the client only listens to the Server IP. It is impossible for another IP(computer/connection) to send the search request to the client, as they would not be recieved. (not even sent).
Which is as it should be: clients connect to one hub (what exactly is a relay station, besides a castrated hub that can more or less only live in a tree topology?), and all search requests to that client should come through that hub (or relay station).

If you want people to be able to connect to one IP, and still be distributed among different hubs, set up an empty front hub that just redirects based on some criteria (or randomly). People will have an incentive to allow themselves to be redirected because that hub won't itself be part of the network, and so won't have access to its resources. Thus, no client change is necessary.

There has to be some protocol between hubs and relay stations, but any linking configuration must introduce that; further, it should not leak out, and has no reason to do so, to the clients.
Change #2 is optional, the hub owner could set up RelayStations on other computers&connections. But having the support in the hub and client alone means that any hub could have RelayStations, even if the owner doesn't have any other connections and computers.
This is a very big change, and demands that clients trust each other, a bad assumption. (If nothing else, not to drop messages intentionally, even provided the message signing/encryption that would become mandatory.)

Whilst it has been demonstrated to work to varying degrees, it also makes completely different assumptions about relative trust that would break DC's model - it needs to be much more thought out than I've seen evidence for as of yet.

HaArD
Posts: 147
Joined: 2003-01-04 02:20
Location: Canada http://hub-link.sf.net
Contact:

Post by HaArD » 2003-04-26 16:42

I don't see the benefit of the RELAY STATION approach over an improved MULTI-HUB LINKING using Redirection to balance the load.

I do see increased Security concerns in having the hub ask users if they want to be relay stations and after they accept other clients connect to them as a trusted source..... I can just imagine the hacked clients that'll take advantage of this... Yikes!

HaArD

seeSharp
Posts: 24
Joined: 2003-04-19 10:03

Post by seeSharp » 2003-04-27 16:09

Hi, back again:)

We've been playing around with the newest linux HUB, and it works somehow. We can stay online for almost 24 hours :) with over 1000 people. There are some stability problems, but it seem to work a bit now:).

I've checked out the PtokaX dev. forum too, there was already a thread about HUB linking. The PtokaX guys told, they could imagine the thing, but they also want some client modifications. That idea seems to be more like the first proposal here (1 HUB sw - more relayStations, operated by the HUB owners). They just want to make that HUB SW to be able to operate in two different modes.

Their problem is the same as ours: it is quite hard to synchronize every user and HUB state data between two HUBs, it's much easier to concentrate on mass-messages, and setup a relay channel for them.

IPv6 will have some new improvements, which will be good for the DC network, but that way, you will have to make a lot of changes again - and there are a lot of OSs and ISPs not supporting IPv6 yet. So, patience...


Something, that wasn't clear I think:

Clients should NOT become ralaying agents, unless there is a good protection mechanism (like some PK encoding...). That's why it was a 2nd version at the beginning (v2.0:).


One more thing:
The PtokaX hub sw already has internally a separate out-queue for messages, acording to their forums, so it wouldn't be too hard to make make the serverside changes there. I don't know how DCH++ works, so maybe a bit more struggle there. DC++ shouldn't be to hard to change... (told by me, who never new C++ :). Maybe it's time to learn that one too?

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-28 05:52

Many P2P programs use other clients to propagate information.

If a RelayStation would just send out searchrequests and hub-chat, I see no real problem with "hacked-clients". What could they do? Send faked search requests? Well, it would become known quickly and the client would be kicked. It's entirely possible to build this into the client/hub as well.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-28 17:45

We can stay online for almost 24 hours with over 1000 people. There are some stability problems, but it seem to work a bit now:).
This would seem to argue pretty strongly that client knowledge/involvement is not, in fact, necessary.
The PtokaX guys told, they could imagine the thing, but they also want some client modifications. That idea seems to be more like the first proposal here (1 HUB sw - more relayStations, operated by the HUB owners). They just want to make that HUB SW to be able to operate in two different modes.
I assume you're referring to this post. (I should note that daniele_dll pretty accurately seems to convey my preferred configuration.)

To answer some of his questions:
ptaczek wrote:- how will you solve the private messages, passive searches, etc. in the case above ?
I assume he's referring to routing within the network - I don't see, in any event, how this can be avoided unless the client is restricted to searching across hubs, rather than chatting across hubs as well.
ptaczek wrote:- how you will assure data checking and how you will avoid data duplication (therefore the lowest dataflow poossible) ?
For broadcast messages, just set up a minimal spanning tree of hubs; each hub can then send out the messages to every hub but that on which it came in. Actual user to user routing is more difficult, but well known algorithms exist to solve it.
ptaczek wrote:- how you will check for crash of one linked hub and how you clean the users that were connected to that hub from nicklists of the other hubs?
First, only neighboring hubs should have to directly discover this fact; they can do so by assuming a hub has died unless it sends them an "I'm alive" message every X seconds.

This information can then be propagated through the network through the usual broadcast means ($HubQuit <some_hub_ID>| or something, never send to clients, but only other hubs), and assuming each foreign user's home hub is tagged, a hub iterates through its currently connected users from that hub and removes them.

It appears they want DC++ to implement $MultiSearch and $MultiConnectToMe, as a search-only hub network is the only way I can see to avoid their concerns about every user appearing in some form on every other hub; that latter situation would seem necessary, along with its incurred bandwidth, memory, and CPU usage costs, if a chat network were being run.

Whilst this isn't necessarily a less valid form of hub linking, it's more restricted/less interesting, and most importantly, isn't transparent to users. I believe it would produce a nicer experience if users could treat the hub network as a single, larger hub, and ensure the hub network maintains that illusion. One can _still_ get significant bandwidth savings per hub over a truly single hub of that size, even with that more feature-complete configuration.
distiller wrote:If a RelayStation would just send out searchrequests and hub-chat, I see no real problem with "hacked-clients". What could they do? Send faked search requests?
It could prevent the search requests from reaching those users who might respond to them (saving those relaying users a significant amount of competition for slots, giving them an incentive to do so), and it might intercept any chat messages from users behind them. Without any other means, necessarily, of contacting ops or somesuch, I don't see how "it would become known quickly and the client would be kicked", for that relay station is in control of most communications by which it would become known.

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-29 07:18

You have to agree that it would be simpler for hub owners to get a bigger hub since more clients would also mean more RelayStations (RS). Is in not in any way a beautiful thought?

A hubowner has to have several separate computers, and above all several separate internet connections for the hub and relaying, which is hard in a single house. By letting the clients take care of relaying themselves the hub can grow beyond 1000 users easily, without the hub-owner paying half his salary in ISP bills...

The sequrity issus are of course important. It might be possible to have a list of accepted nicks that could act like RS in the hub. Say friends that you personally trust. These clients could then become RelayStations as they get online but other ones would not be asked to become RS.

I believe hub owners would be quite happy if the hub/clients took care of the relaying themselves instead of the hub owner having to ask his friends to set up RelayStations.

If relaying was implemented your way it would seldom be used because of cost, need to trust your friends to be online...

If relaying is a part of the clients and hub it would become automatic, and every hub would automatically benefit from the new feature.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-04-29 08:42

distiller wrote:I see no reason for the hub to keep the hublists! Then the hub should return searchresults? Nah, won't work.

The idea in this thread was to lower the bandwidth used by searching by distributing the load thorugh relaying, let's return to that shall we??
Relaying is a solution that's to be avoided if possible because it complicates the system. Trying to use relaying to distribute bandwidth in particular is a case where other solutions should be considered first.

Having the hub return results is actually a good thing on many levels. As long as the hub has resources to do it, these searches would necessarily be faster and a heck of a lot more efficient than the current way of doing things.
Just imagine the amount of string comparing the hub would have to do? I would have to do all the connected clients work. You'd need a 50 GHz cpu to do the job. Not to mention the memory requirements. No, that's not the way to go. Relaying is a lot better and should be pretty easy to implement in both clients and the server.
Seriously, recheck your algorithms. Searching through an a single optomized list of all files being shared in a hub is not a horrible task. And I'm only suggesting hashes, which would mean even more efficient searches.

On the other hand, no matter what if you're using relaying you will be spending a whole lot more total cpu time, plus requiring more bandwidth, more latency, and more sources for failure.

There are reasons not to have hubs doing lookups, but these aren't they. It REALLY sounds like you want relaying just for the sake of having it.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-04-29 08:51

HaArD wrote:I don't see the benefit of the RELAY STATION approach over an improved MULTI-HUB LINKING using Redirection to balance the load.
Right
seeSharp wrote: Their problem is the same as ours: it is quite hard to synchronize every user and HUB state data between two HUBs, it's much easier to concentrate on mass-messages, and setup a relay channel for them.
What precisely needs to be synchronized? Or more to the point, what that should be distributed needs to be synchronized?
IPv6 will have some new improvements, which will be good for the DC network, but that way, you will have to make a lot of changes again - and there are a lot of OSs and ISPs not supporting IPv6 yet. So, patience...
In general, if a program has been written properly within the last three or four years it should make the switch to IPv6 automatically with the OS. Win2000 and on have native IP6 support. The main problem is not with the user's software but with ISP's routing.

It seems more and more clear that what people are looking for in this discussion is not a change in technology or the system, but rather higher quality of what's already around.

Linking hubs together is the way to go, or havn't you learned the lesson of gnutella?

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-29 18:25

Well RS just seemed to be a simple way, but is probably not the best as you say. Sorry... If linking hubs together is the real deal we will need some intelligent hubs I tell you. =) And still, changes are needed in at least the DC++ client.

The hub will need to have (for say 500 users/per hub):
* Roughly 256 MB of RAM for uncompressed filelists in RAM or
* Roughly 50 MB of RAM for bz2-compressed filelists in RAM and a fast CPU for on-the-fly decompressing and searching
* Recieve filelists if anyone connects or does a "/refresh"

Will recieving everyone's filelists take up more or less bandwidth than the current search system? (probably less but..)

The hub will have to:
* Recieve search request from other hubs
* Search the filelists and send back correct results to the appropiate user (or hub?)
* Recieve filelist to RAM for searching (What if RAM runs out, we put a restriction on how MUCH people can share now? :shock: )

The client will have to:
* Be able to send files to users not on the local hub (DC++ has no support)
* The client will have to be able to use multi-hub-search and recieve files from a linked hub (again no support)
* Send filelists to hub when connecting and refreshing (no client has support)

All users should be able to see each other even if they are connected to different hubs (that are linked). One might want to download a filelist from a user on another linked hub. So the "multi-hub" concept as known from NMDC is not good enough IMHO.

Relaying is something that will make one computer with one connection able to support more users than what is currently possible, probably many times more.

There are issus as has been pointed out, but solutions as well.
I don't know what solution is the best, you have my ideas anyway. I'm just trying to help out here, not spread propaganda for a special solution... :lol:

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-29 19:48

The hub will need to have (for say 500 users/per hub):
* Roughly 256 MB of RAM for uncompressed filelists in RAM
It wouldn't store filelists, per se; what it really needs is a mapping from hash values to lists of users.
* Recieve filelists if anyone connects or does a "/refresh"
It should receive a delta from what it previously knew, rather than a complete file list (which would, of course, require support from DC++ to identify which files are new).

However, the above modifications are entirely optional, and a multihub network can be built without them, so that it does not require special client support.
* Search the filelists and send back correct results to the appropiate user (or hub?)
Well, or just send the search query to all locally connected users; this is the compatible approach, which even should one to implement volkris's hub hash index, would be a necessary compromise in the meantime, and to support non-hash uploading clients in general.
* Recieve filelist to RAM for searching (What if RAM runs out, we put a restriction on how MUCH people can share now? )
This is my greatest concern, but I imagine the hub would simply always send searches to those users, should they have too many files for it to want to index.

But I don't care too much either way about the hashes-on-hub thing - it's the client modification requirement claims with which I really disagree. As soon as one accepts that
All users should be able to see each other even if they are connected to different hubs (that are linked).
then
* Be able to send files to users not on the local hub (DC++ has no support)
* The client will have to be able to use multi-hub-search and recieve files from a linked hub (again no support)
are completely unnecessary, as one has a single, larger virtual hub, and as long as that illusion is maintained, all client functions, even of unmodified clients, will function fine. No $MultiConnectToMe capability, for example, is required: the hubs handle the routing. Your third item,
Send filelists to hub when connecting and refreshing (no client has support)
is only necessary if one implements volkris's idea, which I've already acknowledged requires client changes.

However, throughout your post, you conflate the orthogonal ideas of the server holding file hashes and multihub linking. The former requires client modification, whereas the latter does not.
So the "multi-hub" concept as known from NMDC is not good enough IMHO.
Oh, hey, look, we agree on something.
Relaying is something that will make one computer with one connection able to support more users than what is currently possible, probably many times more.
No, it doesn't.

It relies on people being relays.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-04-29 20:36

distiller wrote:If linking hubs together is the real deal we will need some intelligent hubs I tell you. =) And still, changes are needed in at least the DC++ client.
Not really. It doesn't take all that much intelligence to route messages to multiple listeners. And why do you say changes are needed in the clients? Hub linking is already done without any changes.
The hub will need to have (for say 500 users/per hub):
* Roughly 256 MB of RAM for uncompressed filelists in RAM or
* Roughly 50 MB of RAM for bz2-compressed filelists in RAM and a fast CPU for on-the-fly decompressing and searching
* Recieve filelists if anyone connects or does a "/refresh"
I don't think these numbers are founded at all. Where do you get them?

Anyway, one mistake you're making is confusing the distributed issue with the suggestion for searching hashes on the hub. They're two entirely independent issues.

HaArD
Posts: 147
Joined: 2003-01-04 02:20
Location: Canada http://hub-link.sf.net
Contact:

Post by HaArD » 2003-04-30 06:04

If multi-hub linking could address this issue then what are the requirements?

I think most of this could be done in modifications to the Hub software alone. If mutli-hub protocol extensions were used by different hubs then linking a DCH++-->PkotaX-->OpenDC--->SDCH-->etc should be possible. The users/clients don't need to know or care...

Basic functionality...
1. Share a common Main Chat with Linked Hubs
2. ACTIVE Search Linked Hub Users
3. PASSIVE Search Linked Hub Users (bandwidth issues)
4. ACTIVE/PASSIVE Connect to Linked Hub Users (Uploads/Downloads/Filelists)
5. "See" Linked Hub Users (Request Filelists and PM's)

Op support...
Remote Op Commands?
OPChat for linked hubs?

anything else?

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-30 11:33

Ok, I forgot to point out some things I had already realized which led to some confusion, I'm sorry. =) When a client connects to a hub (virtual or real) it will only accept incoming messages from that IP, true?

If we have multiple hubs, every hub sends its chat/search/myinfo and whatever to the other hubs, and then that hub has to send that to it's connected clients. Every hub will send out more data, requiring more bandwidth and we have not solved the bandwidth issue, rather the contrary.

If we have one hub supporting 500 users or 5 hubs supporting 100 users should be the same but just more complicated.... =) (don't take those numbers for absolute fact now, it's to give a simple idea, ok...)
volkris wrote:Anyway, one mistake you're making is confusing the distributed issue with the suggestion for searching hashes on the hub. They're two entirely independent issues.
They are not entirely separate since if we do not lower the outgoing bandwidth needed per hub we can not link together hubs and have more users in a virtual hub either and then what is the point of linking hubs together?
cologic wrote:It wouldn't store filelists, per se; what it really needs is a mapping from hash values to lists of users.
That would indeed lower memory usage. But then we require hashing, which is not really being worked on. =) And hashes are probably not that much smaller than a filename and it's size, maybee even larger in some cases. Sure:
hash, user, user,user
hash, user, user
hash, user, user, user, user, user
might not use too much mem, but I still guess in the 50's for a 500+ user hub.
cologic wrote:However, the above modifications are entirely optional, and a multihub network can be built without them, so that it does not require special client support.
A multihub network can and has been built, but it's not practical if we want to have more users per virtual hub than we have per real hub today. Have you seen a hub with more than 1500 users?
distiller wrote:Relaying is something that will make one computer with one connection able to support more users than what is currently possible, probably many times more.
cologic wrote:No, it doesn't.
It relies on people being relays.
Well, it does actually make it possible to have larger hubs since outgoing bandwidth requirements are lowered a lot for the hub. But you're right, it relies on people being relays, that was the point anyway, don't shoot me... =)
volkris wrote:I don't think these numbers are founded at all. Where do you get them?
By doing some quick and really rough math. =) My dcList is 500 kB and my bz2-list is 100 Kb and an uncompressed filelist would of course be larger. In the case of bz2-list: 100Kb*500users=49Mb. Ok? But as cologic pointed out, hub only has to keep track of the hashes for the users which requires less memory. (still a lot of mem though)

Why are you trying so hard to put down my thoughts instead of reaching the goal? It's supposed to be a fun journey in my opinion. :wink:

HaArD
Posts: 147
Joined: 2003-01-04 02:20
Location: Canada http://hub-link.sf.net
Contact:

Post by HaArD » 2003-04-30 13:10

So then the idea of the RS is to create a single destination for the hub to send a string that must be distributed to 'n' clients.

The modification to the client is to facilitate that Hub "X" has split it's upstream traffic onto multiple IP's, the one used to connect to the hub and 'n' potential Relay Stations.

Hub "X" can now handle unmodified client #1 as it does today, but, in theory, client #2 and hopefully a growing number of clients, would support the RS concept and some traffic to them would be sent once to the RS #1 instead of once to each of those users. (The Hub will obviously have to keep track of which users support RS) As RS #1 starts to hit it's upstream capacity, we add RS #2 and start broadcasting that IP#3 to client #3 when they connect.

The effect is to reduce upstream bandwidth usage on Hub "X" and therefore this will permit more users to connect?

Am I getting this right?

Image

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-04-30 16:31

If we have multiple hubs, every hub sends its chat/search/myinfo and whatever to the other hubs, and then that hub has to send that to it's connected clients. Every hub will send out more data, requiring more bandwidth and we have not solved the bandwidth issue, rather the contrary.
A multihub network can and has been built, but it's not practical if we want to have more users per virtual hub than we have per real hub today.
Actually, multihub would still be an improvement the advantages of which would grow roughly linearly with the number of hubs involved, is thus capable of handling greater numbrs of users. See earlier in this very thread for an analysis of why. (See, next time it might help for people to read the math as opposed to just saying "it's math, oh no, get it away", which more or less summarizes the next couple of responses :wink: )

Thus, volkris's idea truly is separate. No client modifications are necessary to enable larger virtual hubs by saving each hub's outgoing bandwidth (in essence, because every peered hub replaces all of the users it holds with one virtual user, upload bandwidth-wise).

HaArD
Posts: 147
Joined: 2003-01-04 02:20
Location: Canada http://hub-link.sf.net
Contact:

Post by HaArD » 2003-04-30 20:05

cologic wrote:Actually, multihub would still be an improvement the advantages of which would grow roughly linearly with the number of hubs involved, is thus capable of handling greater numbers of users.
Which is why I said I didn't see an advantage to RS over a functional Multi-Hub Linking.
cologic wrote:See, next time it might help for people to read the math as opposed to just saying "it's math, oh no, get it away", which more or less summarizes the next couple of responses :wink:
Touche. :p
cologic wrote:Thus, volkris's idea truly is separate. No client modifications are necessary to enable larger virtual hubs by saving each hub's outgoing bandwidth (in essence, because every peered hub replaces all of the users it holds with one virtual user, upload bandwidth-wise).
Bingo! That is my preferred approach to this. I rather leave the client out of it.

I am trying to understand the RS idea and see its merits... but I keep thinking of the RS as a specialized HUB not as a client which just takes me back to linking hubs again...

HaArD

distiller
Posts: 66
Joined: 2003-01-05 18:05
Location: Sweden
Contact:

Post by distiller » 2003-04-30 21:09

I've done the math now and it is truely favorable. :oops:

A
Posts: 17
Joined: 2003-02-02 05:55

Post by A » 2003-05-01 04:21

have you tried ptokax hub software supports alot more users whit less resources

ButterflySoul
Posts: 210
Joined: 2003-01-23 17:24
Location: Nevada
Contact:

Post by ButterflySoul » 2003-05-01 07:00

have you tried ptokax hub software supports alot more users whit less resources
Ok, that Ptokax thing is really getting on my nerves. Just for your info, there's several other hubs than Ptokax that support alot more users with less ressouces. There's even 3 of them that unlike P. support vbs scripting, for people who don't feel like learning LUA from scratch, and a heck of a bunch of them that are open source (which is the spirit of sourceforge, DC++, and to a lesser extent of these boards).
Now Ptokax happens to be a good software, but I don't see what gives you guys the right to start that new trend going on for a few weeks and consisting of empty posts just for the sake of plain blattant advertising. Heck, there's even some people posting stuff in the lines of "switch to PtoKax" in answer to VBS scripting questions.

In case you had bothered looking who posted in this thread so far, you would have seen that they're all posters who've been around quite some time and who already suffered dozens of Ptokax shameless advertising, so YES, we do know about it, alright.
And if you had bothered actually reading this very interesting thread before spoiling it with garbage, you would noticed that Haard mentioned «If mutli-hub protocol extensions were used by different hubs then linking a DCH++-->PkotaX-->OpenDC--->SDCH-->etc should be possible. The users/clients don't need to know or care... ». So yes, we know about it, but we're talking about something completely different than switching to another hub software than NMDC.
Talk about Ptokax all you like, but keep it in relevant threads, and make relevant posts, please.
[CoZ] Children of Zeus
-----
Shadows DC Hub - VBS and JS scripting at their best

A
Posts: 17
Joined: 2003-02-02 05:55

Post by A » 2003-05-01 07:22

as far as i understand this thread it is about nmdc hub or nmdc hacked hubi think that software is just a dead end.

i'ts dch or ptokax

ButterflySoul
Posts: 210
Joined: 2003-01-23 17:24
Location: Nevada
Contact:

Post by ButterflySoul » 2003-05-01 08:45

Re-read first lines of the very first post of the thread, all the way at the top :
Background: we are running a semi-closed HUB network, with 900+ users on the biggest hubs. We can't let in more users, we have tried Hacked NMDC HUBs, PtokaX and now a Linux HUB SW. It seems, that the network utilization goes to high with that many users.
This is a thread about the protocol issues (as you would expect in "Protocol Alley"), not about a specific hub, and definitely not about NMDCH.
"Switch to hub software such and such" is definitely out of place, and the fact you missed Ptokax was already mentioned as "not doing the trick" proves you didn't even bother reading the very first post halfway through, but directly jumped in to do some disgustingly irrelevant advertisement simply based on the thread title.

The only thing that would really help in the current state of the protocol, is more like some kind of "master" hub that wouldn't accept any user connection, but simply comunicates with "slave hubs" and keeps track of all the users on all the slave hubs, while they take care of the outgoing traffic with clients, sending the bulk (searches) to each other, and the rest to the master hub. To the best of my knowledge, Ptokax doesn't support this (or any other) way of distributed functioning saving bandwidth (Which is the issue here. Not cpu or ram).
The thread itself has more elaborate examples as my own, allowing to effectively use 3 machines as 3 hubs accepting client connections, but they require (a minimum of reading and) a few protocol changes, so it's a good thing to first agree on a standard, common and efficient way to handle the issue if we go that way.
I wish the whole thing was as simple as "use software so and so" instead, but so far, all current softwares are an equal dead-end when it comes to handling 900+ users.
[CoZ] Children of Zeus
-----
Shadows DC Hub - VBS and JS scripting at their best

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-05-01 09:04

To be a little less blunt than butterflysoul, a lot of the things we're discussing here have nothing to do with specific hubs or implementations. The wall many are hitting is in bandwidth; it doesn't matter how good your software is if the DC protocol itself is choking to death.

So we need to find the best ways to do more with our finite bandwidth. Various solutions have been proposed in this thread.

To specifically address something butterflysoul said, there's no real reason to demand that a hub can only connect to other hubs. It'd be better to say that the hub should give priority to other hubs, if that's what's wanted, but to make the "only other hubs" rule is to slightly cripple the network.

ButterflySoul
Posts: 210
Joined: 2003-01-23 17:24
Location: Nevada
Contact:

Post by ButterflySoul » 2003-05-01 09:13

there's no real reason to demand that a hub can only connect to other hubs. It'd be better to say that the hub should give priority to other hubs, if that's what's wanted, but to make the "only other hubs" rule is to slightly cripple the network.
If I remember properly, the few approaches of distributed hubs so far apparently suffer from an "out of sync" issue after a bit. Having one single hub keep track of all the users on all the slave hubs is the most simple way to fix the issue.
However, I'll definitely agree with you that it's not the most elegant =)
[CoZ] Children of Zeus
-----
Shadows DC Hub - VBS and JS scripting at their best

Locked