Ratings server and protocol

Technical discussion about the NMDC and <a href="http://dcpp.net/ADC.html">ADC</A> protocol. The NMDC protocol is documented in the <a href="http://dcpp.net/wiki/">Wiki</a>, so feel free to refer to it.

Moderator: Moderators

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Ratings server and protocol

Post by sarf » 2003-02-08 06:20

Greetings.

This is a new topic where information about the ratings server and the protocol can be posted. Please keep the posts mainly on topic (but feel free to write long posts that ramble on for ages).

These questions need to be answered:

How to...
  • ...identify someone to the ratings server?
  • ...retrieve the rating for someone?
  • ...update the rating for someone?
  • ...get the location of a ratings server? (DNS name / IP address, with port)
More will probably be added later on.

If someone has the old discussions (from dcpp.lichlord.org) about the ratings server saved, then please post a link or whatnot so that we can be inspired by them.

Let's get some thoughts rolling, people.

Sarf
---
"A conclusion is the place where you get tired of thinking." -- Arthur Bloch

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Ratings server and protocol

Post by volkris » 2003-02-09 01:49

We had working answers for all of this stuff in the old threads. I can't remember any of them, though, so I guess we're starting over.

To reiterate, one of the goals for this, I think, should be that it can work with or without the cooperation of a hub. It would be preferable to have that cooperation, but it should not be required.

It should also not be much trouble for the reporters. The benefits go to the senders, and so as much as possible the expense should be confined there as well.

So, throwing out some thoughts:
identify someone to the ratings server?
I think it's fair to assume that "good" people will want to be rated. "Bad" people will probably be able to simply become unrated. The result is that since people will want to be rated they will agree to register for a ratings account. This registration can be either explicit or implicit, the people can actually click somewhere to agree to it or they can be registered as "anonymous" during the sign on process, perhaps even upgrading to an explicit registration later. In the case of explicit registration each count can be either stateful or stateless--it can be maintained across sessions or restarted with each one. Implicit, anonymous logins can, of course, only be stateless.

What it means to me is that users can be individually registered with the ratings server. Their identification can therefore be username@ratingsserver.
retrieve the rating for someone?
get the location of a ratings server?
First of all there is a weakness here in that someone can setup a fake ratings server to give false results. I would suggest that hubs specify a preferred ratings server and then a list of zero or more trusted ones. Eventually the system might evolve into a pgp-style "web of trust" thing where people vouch for each others' servers (if I trust you but don't know Bill I'll still trust Bill if you vouch for him).

I'd say that anyone should be able to query the ratings server for a user's rating. The asking client should get the target's ratings server by first asking one specified by the hub and falling back to querying the subject itself for its ratings server if it's on the hub's trust list. That's if the hub is cooperating. If it's not then the client should query the client for its ratings server and go directly to it.

Of course the client should be able to have its own trust and even anti-trust list that override the hub's even if the hub has zero servers on its list.
update the rating for someone?
How about this:
Each client is assigned a private key. It then generates a list of public keys from the private one, which it sends to the receiving client as a file transfer is being set up. When the client wants to report on the sender it will include the sender's username and public key, sort of authenticating that the report was about a real transfer. Meanwhile the public key is discarded and never used again. Any attempt to reuse it will be rejected.

This doesn't guard against false reports, it only means that those reporting have to at least go through the trouble of initializing a transfer with the source. It limits the amount of damage false reporters can do because the source can decide whether to open itself up to the potential for false reports.

One problem is that the client will have to maintain a constant supply of public keys. These used to take a signfiicant amount of time to compute on old computers, but I suppose it shouldn't be that big a deal now.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Ratings server and protocol

Post by sarf » 2003-02-09 05:27

volkris wrote:We had working answers for all of this stuff in the old threads. I can't remember any of them, though, so I guess we're starting over.
Well, I did ask for people to post here if they remembered how we solved all the worlds problems on dcpp.lichlord.org, but in the meantime, let's do what we can.
volkris wrote:To reiterate, one of the goals for this, I think, should be that it can work with or without the cooperation of a hub. It would be preferable to have that cooperation, but it should not be required.
Definitely. I'd like some sort of server that had a list of ratings servers, preferably linked from DC++:s version.xml so that everyone is guaranteed to get the address of the meta-server.
volkris wrote:It should also not be much trouble for the reporters. The benefits go to the senders, and so as much as possible the expense should be confined there as well.
I agree.
volkris wrote:I think it's fair to assume that "good" people will want to be rated. "Bad" people will probably be able to simply become unrated.[snip register ratings account]
What it means to me is that users can be individually registered with the ratings server. Their identification can therefore be username@ratingsserver.
My question was mainly how client B knows what client A has for a username at ratingsserver C. The whole ratings server registering would have to be easy and quick (at least the anonomyous part).
volkris wrote:
retrieve the rating for someone?
get the location of a ratings server?
First of all there is a weakness here in that someone can setup a fake ratings server to give false results. I would suggest that hubs specify a preferred ratings server and then a list of zero or more trusted ones. Eventually the system might evolve into a pgp-style "web of trust" thing where people vouch for each others' servers (if I trust you but don't know Bill I'll still trust Bill if you vouch for him).
How do we handle ratings server <-> ratings server communication? Or do clients report their activities to as many ratings servers as they like?
volkris wrote:I'd say that anyone should be able to query the ratings server for a user's rating. The asking client should get the target's ratings server by first asking one specified by the hub and falling back to querying the subject itself for its ratings server if it's on the hub's trust list. That's if the hub is cooperating. If it's not then the client should query the client for its ratings server and go directly to it.
Ah... I see you assume that each client only uses one ratings server. Nevermind, we now have an extension of the DC++ protocol, GetRatingServer or somesuch.
volkris wrote:Of course the client should be able to have its own trust and even anti-trust list that override the hub's even if the hub has zero servers on its list.
Naturally. I think that each client will report its activities to several ratings servers. This list of ratings servers will include both client-specified ratings servers, on which the client would most likely get an explicit account, and hub-specified ratings servers in which the client would get an implicit account (until the client decides that to register with one of the ratings servers).
volkris wrote:
update the rating for someone?
How about this:
Each client is assigned a private key. It then generates a list of public keys from the private one, which it sends to the receiving client as a file transfer is being set up. When the client wants to report on the sender it will include the sender's username and public key, sort of authenticating that the report was about a real transfer. Meanwhile the public key is discarded and never used again. Any attempt to reuse it will be rejected.
This would solve the problem of validation at a pretty high cost (to a CPU). The rating s server would have to keep the "next public key" calculated for every account. We'll have to see if this is feasible.
volkris wrote:This doesn't guard against false reports, it only means that those reporting have to at least go through the trouble of initializing a transfer with the source. It limits the amount of damage false reporters can do because the source can decide whether to open itself up to the potential for false reports.
Well, false reports can be handled in this way - when someone asks for a "receipt" the receipt is calculated including (somehow) the size of the filetransfer. This would mean that a client would be safe from someone else reporting a transfer to/from itself at a lesser/higher number. It doesn't prevent the ratings server from getting false reports, no, but it safeguards the clients interests. Exactly how to do this is still unknown to me... being less than familiar with cryptography.
volkris wrote:One problem is that the client will have to maintain a constant supply of public keys. These used to take a signficant amount of time to compute on old computers, but I suppose it shouldn't be that big a deal now.
Well, as said, if the ratings server should verify the key then the ratings server need to have calculated the newest key whenever it needs to verify a transfer. That would be a bit more CPU-intensive.

Sarf
---
If you're happy and you know it, clunk your chains.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Ratings server and protocol

Post by volkris » 2003-02-09 06:17

sarf wrote:My question was mainly how client B knows what client A has for a username at ratingsserver C.
Good question.
I'd hate to put yet more information on already (appearantly) overburdened hubs. Does DC currently give clients any way of knowing if another user is authenticated?
The whole ratings server registering would have to be easy and quick (at least the anonomyous part).
Oh yeah.
The anonymous part doesn't even have to involve user input. It could pop up a little window after the first transmission saying "Hey, you just scored some points, wanna register so you can keep them?" and thus turn anonymous into registered.
sarf wrote: How do we handle ratings server <-> ratings server communication? Or do clients report their activities to as many ratings servers as they like?
Ah, I see a primary misunderstanding.
I see the system as using the ratings server of the subject, while you're seeing it use the server of the reporter. I would say that the reporter would only report to the server from which it got the initial rating. This simplifies a lot of stuff, including the public key stuff, though it leads to other problems...
sarf wrote: Ah... I see you assume that each client only uses one ratings server. Nevermind, we now have an extension of the DC++ protocol, GetRatingServer or somesuch.
Ah... I see you figured out our mismatch above :)
There will have to be protocol extensions, though you might as well call them simply a new ratings protocol because they shouldn't really overlap the normal DC protocol. At least at this very moment I don't see it as overlaping.
sarf wrote:I think that each client will report its activities to several ratings servers. This list of ratings servers will include both client-specified ratings servers, on which the client would most likely get an explicit account, and hub-specified ratings servers in which the client would get an implicit account (until the client decides that to register with one of the ratings servers).
I don't think there needs to be more than one ratings server involved here. If a client has an anonymous ("guest" would probably be a better term) registration on a server it still has that on only one server.
sarf wrote:This would solve the problem of validation at a pretty high cost (to a CPU). The rating s server would have to keep the "next public key" calculated for every account. We'll have to see if this is feasible.
All the ratings server would have to keep is the private key. All public keys have to, by definition, validate against the private key. It would have to keep a list of used public keys, though, and check against the list, which kind of sucks except that the list should be pretty balancable into a search tree (the keys should be pretty well distributed by their nature). There are definitely performance questions here.
sarf wrote:Well, false reports can be handled in this way - when someone asks for a "receipt" the receipt is calculated including (somehow) the size of the filetransfer. This would mean that a client would be safe from someone else reporting a transfer to/from itself at a lesser/higher number. It doesn't prevent the ratings server from getting false reports, no, but it safeguards the clients interests. Exactly how to do this is still unknown to me...
Could you clarify what you meant here? I don't get what specific problem you're talking about.
Sarf wrote:Well, as said, if the ratings server should verify the key then the ratings server need to have calculated the newest key whenever it needs to verify a transfer. That would be a bit more CPU-intensive.
See above, the order of keys doesn't matter in verifying the keys. The only thing to keep a key from being reused would be keeping track of what has already been used.

An alternative to pki (public key infrastructure, the public-private key thing described above) is a one way hash. Basically a one way hash takes string A and makes it into string B so that you cannot get string A from string B but you can tell if string C matches string A by seeing if its hash equals string B. Basically, if "Chris" turns into "fjdie" you cannot take "fjdie" and figure out "Chris" but you can take "Mike" and see that it doesn't turn into "fjdie" so "Mike" != "Chris".

Anyway, the benefits to one way hashes are that they are (I believe) less computationally intensive but you don't get a supply of different public keys to hand out. The client could, perhaps, hash password + time and send the time to the receiver along with the hash. When the ratings server eventually gets the hash it could use its stored password + the reported time and see if it matches. This would both validate the report and let the server know of the time. If the reporter is required to contact the ratings server within, say, 15 minutes of getting a hash then it will be impossible to reuse these hashes because they will expire automatically.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Ratings server and protocol

Post by sarf » 2003-02-10 07:35

volkris wrote:I'd hate to put yet more information on already (appearantly) overburdened hubs. Does DC currently give clients any way of knowing if another user is authenticated?
Ehm... no. Well, there is the OpList with operators nicks in it, and then you could try to log in as someone to see if they had a password (and thus are registered).
volkris wrote:The anonymous part doesn't even have to involve user input. It could pop up a little window after the first transmission saying "Hey, you just scored some points, wanna register so you can keep them?" and thus turn anonymous into registered.
Or it could highlight a button or whatnot... I personally dislike popup windows as a part of any program which is why I rebuilt earlier versions of my client to make the "new version window" optional and/or only to pop it up whenever a new version of DC++ arrived.
volkris wrote:Ah, I see a primary misunderstanding.
I see the system as using the ratings server of the subject, while you're seeing it use the server of the reporter.
Hmmm... so you mean that every reporter had to get the server of the person it is uploading to and use that? Or am I totally out in the blue here?
volkris wrote:I would say that the reporter would only report to the server from which it got the initial rating. This simplifies a lot of stuff, including the public key stuff, though it leads to other problems...
Ahhhuh? Total lack of understanding. Thought terminated. Please clarify whom it got the initial rating for/from.
volkris wrote:There will have to be protocol extensions, though you might as well call them simply a new ratings protocol because they shouldn't really overlap the normal DC protocol. At least at this very moment I don't see it as overlaping.
Currently, any change in client <-> client or client <-> hub communication require (or "should use") the protocol extension.
volkris wrote:I don't think there needs to be more than one ratings server involved here. If a client has an anonymous ("guest" would probably be a better term) registration on a server it still has that on only one server.
OK, so what happens when you upload a file to a user that uses a different ratings server than your own? Use cases are a Good Thing (tm) here methinks.
volkris wrote:All the ratings server would have to keep is the private key. All public keys have to, by definition, validate against the private key. It would have to keep a list of used public keys, though, and check against the list, which kind of sucks except that the list should be pretty balancable into a search tree (the keys should be pretty well distributed by their nature). There are definitely performance questions here.
OK, let me get this straight here... you want the ratings server to keep ALL used public keys? How many users did you envision supporting on a "typical" ratings server? For how long? Remember, that although ratings may deteriorate, accounts remain. At least, we haven't specified accounts as being volatile or transitory.
volkris wrote:Could you clarify what you meant here? I don't get what specific problem you're talking about.
I want to use a hash, but I didn't have the words at the time I wrote the message (or rather, I didn't know I wanted to use a hash).
volkris wrote:[snip one way hash introduction]
Anyway, the benefits to one way hashes are that they are (I believe) less computationally intensive but you don't get a supply of different public keys to hand out. The client could, perhaps, hash password + time and send the time to the receiver along with the hash. When the ratings server eventually gets the hash it could use its stored password + the reported time and see if it matches. This would both validate the report and let the server know of the time. If the reporter is required to contact the ratings server within, say, 15 minutes of getting a hash then it will be impossible to reuse these hashes because they will expire automatically.
This is, in my opinion at least, a far more feasible way to make sure that faked results are not reported, but it does require both clients to use the ratings extension. Of course, one could allow NULL values and merely trust them less, or something to that effect.

Sarf
---
After things go from bad to worse, the cycle will repeat itself.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Ratings server and protocol

Post by volkris » 2003-02-10 17:24

sarf wrote: Or it could highlight a button or whatnot...
Maybe it could scroll a notification somewhere. Eh, it's for the interface designers. I just think that if someone saw that they could gain by registering, they would.
sarf wrote:
volkris wrote:Ah, I see a primary misunderstanding.
I see the system as using the ratings server of the subject, while you're seeing it use the server of the reporter.
Hmmm... so you mean that every reporter had to get the server of the person it is uploading to and use that? Or am I totally out in the blue here?
Every reporter has to get the server from the person it is uploading from.

I mean there could be some sort of relaying between ratings server so that the reporter could tell his server who will then tell the subject's server, but the information will be held on one specific server in my current view.
sarf wrote:
volkris wrote:I would say that the reporter would only report to the server from which it got the initial rating. This simplifies a lot of stuff, including the public key stuff, though it leads to other problems...
Ahhhuh? Total lack of understanding. Thought terminated. Please clarify whom it got the initial rating for/from.
I remove the comment. I'm not sure what I was thinking of.
sarf wrote:Currently, any change in client <-> client or client <-> hub communication require (or "should use") the protocol extension.
Most of the communications added by a ratings server would be client <-> ratings server, out of band from the client <-> client/hub protocols in place now. That's all I meant.
Sarf wrote:OK, so what happens when you upload a file to a user that uses a different ratings server than your own? Use cases are a Good Thing (tm) here methinks.
When the downloader reports he reports to my server, not his. Either that or, as I said above, his server relays the report over to my server. The thing to avoid would be having each downloader talking to his own ratings server and having one hundred different reports sitting on one hundred different servers.
Sarf wrote:OK, let me get this straight here... you want the ratings server to keep ALL used public keys? How many users did you envision supporting on a "typical" ratings server? For how long? Remember, that although ratings may deteriorate, accounts remain. At least, we haven't specified accounts as being volatile or transitory.
Ideally, yes. The public keys will be very efficient to search because they would be naturally distributed over the range of possible keys, so they would fit well into search trees. The existance of a key as used would be fast. It would be a case where some voodoo in the code could work miracles for scalability.

It would amount to a hundred byte string for every transfer to go through. Not a huge amount of space. Also, if private keys expired occasionally the used public keys could be flushed. But you're right, it is a problem with the proposal and only one possible solution. Mainly public keys are just cool :) If there was no concern over people reusing old keys then it would possibly be the "correct" way to go.
Sarf wrote:This is, in my opinion at least, a far more feasible way to make sure that faked results are not reported, but it does require both clients to use the ratings extension. Of course, one could allow NULL values and merely trust them less, or something to that effect.
I think we should allow null values for when there is no authenticating hash to allow for times when the subject's client either doesn't have any ratings extensions or isn't registered at any ratings servers. As for trusting less, I don't know that it would make that much of a difference. I'd leave it up to the adminstrator of the ratings server to decide, though, just like scoring of different aspects of the actual rating.

I think it only really matters in the case that the anonymous user registers and wants to keep his points. I've always thought ratings need to drop over time to insure that people keep behaving well; ratings given during the anonymous phase could simply drop at a little faster rate, the result of less trust.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Ratings server and protocol

Post by sarf » 2003-02-11 10:28

volkris wrote:Every reporter has to get the server from the person it is uploading from.
Ehm... uploading from? AFAIK, you upload to someone, not from them.
Do you mean that the downloading person reports to the same server as the person it is downloading from (this seems likely to be the case) ?
volkris wrote:I mean there could be some sort of relaying between ratings server so that the reporter could tell his server who will then tell the subject's server, but the information will be held on one specific server in my current view.
Yes. Let's keep it simple, no global networks of rating servers talking too each other, no matter how kewl that would be. :)
volkris wrote:Most of the communications added by a ratings server would be client <-> ratings server, out of band from the client <-> client/hub protocols in place now. That's all I meant.
Yes, but I was asking it due to the fact that the rating server the uploader uses has to be passed to the downloader somehow, thus an extension to the client <-> client protocol seemed necessary.
volkris wrote:When the downloader reports he reports to my server, not his. Either that or, as I said above, his server relays the report over to my server. The thing to avoid would be having each downloader talking to his own ratings server and having one hundred different reports sitting on one hundred different servers.
Yep. Sure agree on this one. Yet... does the downloaders report affect his/her own rating, or is it only used as verification? And what happens if the uploader gives out an invalid rating server?
volkris wrote:Ideally, yes. The public keys will be very efficient to search because they would be naturally distributed over the range of possible keys, so they would fit well into search trees. The existance of a key as used would be fast. It would be a case where some voodoo in the code could work miracles for scalability.
It would amount to a hundred byte string for every transfer to go through. Not a huge amount of space. Also, if private keys expired occasionally the used public keys could be flushed. But you're right, it is a problem with the proposal and only one possible solution. Mainly public keys are just cool :) If there was no concern over people reusing old keys then it would possibly be the "correct" way to go.
Fast, yes, indeed. Yet it was space that I was worried about. Remember that many users (in my experience) transfer at least several hundred files each days. Thats <several hundred> times <a hundred bytes> times <users> times <days> which is <at least a couple of thousand bytes per user per day>. Lots of bytes per day, in other words. If the keys had built-in expiry dates this would be more than a feasible solution (as long as the number of users are not too large). The way it will work in the beginning will probably mean one or perhaps two rating servers for all clients. The more clients that uses a rating server, the more reliable it is (since there are more people that will "honor" the ratings of the server).
volkris wrote:I think we should allow null values for when there is no authenticating hash to allow for times when the subject's client either doesn't have any ratings extensions or isn't registered at any ratings servers. As for trusting less, I don't know that it would make that much of a difference. I'd leave it up to the adminstrator of the ratings server to decide, though, just like scoring of different aspects of the actual rating.
I'd just think it would be logical to count a non-authenticated hash less than an authenticated one.
volkris wrote:I think it only really matters in the case that the anonymous user registers and wants to keep his points. I've always thought ratings need to drop over time to insure that people keep behaving well; ratings given during the anonymous phase could simply drop at a little faster rate, the result of less trust.
Well... the problem is that we have to use some good trust algorithms to make sure that it is hard to fake your ratings... If a ratings server merely halved the benefit given by non-authenticated ratings compared to authenticated, it would still mean that faking your ratings was doable. The rating server should be deployed with a good rating system (of course it would be optional, but it should be there to be used).

Hmm... I'd like to have a system somewhat like slashdot's karma system, where I could burn some of my rating to lessen another client's rating. Just having some ideas though, nothing necessary or anything. Oh well.

Sarf
---
Spam was, Spam is and Spam shall be. After summer is winter, and after winter, summer. It ruled once where Man rules now; where Man rules now, it shall rule again. As a foulness shall ye know it.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Ratings server and protocol

Post by volkris » 2003-02-11 11:13

sarf wrote:
volkris wrote:Every reporter has to get the server from the person it is uploading from.
Do you mean that the downloading person reports to the same server as the person it is downloading from (this seems likely to be the case) ?
The downloading person reports to the server specified by the person he is downloading from. Better? :)
sarf wrote: Yep. Sure agree on this one. Yet... does the downloaders report affect his/her own rating, or is it only used as verification? And what happens if the uploader gives out an invalid rating server?
I'm tempted to say that no, reporting does not affect the reporter's ratings in order to prevent people from, say, sitting around downloading things only so they can file reports. It would also give reporters more incentive to send in faked ratings, though this incentive can be mitigated by not giving them much of a boost for each report. In the end I'd be tempted to say no because we don't want any users to have any direct control over their own ratings, which this gives to an extent. It's certainly not a closed question, though.
sarf wrote:I'd just think it would be logical to count a non-authenticated hash less than an authenticated one.
If you award any points based on total time online (the more a user is actually available, the more valuable it is, after all) the authenticated users' scores will automatically rise.
Sarf wrote:Well... the problem is that we have to use some good trust algorithms to make sure that it is hard to fake your ratings... If a ratings server merely halved the benefit given by non-authenticated ratings compared to authenticated, it would still mean that faking your ratings was doable.
Oh, were you talking about authenticating the reporters? I'd never gotten around to that part of the deal yet. I've only been talking about authenticating the report by insuring that the subject was actually conducting a transfer.

One thing I though about last night was the role of the hubs in the rating system. Before I had always figured that cooperating hubs could play a large role in the deal, but I'm reconsidering that now. Due to various problems things like hub authentication bring about, some due to the limitations of existing DC protocols, I now think that the hub should do little more than suggest a default ratings server to clients and express trust for lists of trusted ratings servers.

All registered users would then log in to their ratings server(s) as they go online before they even log on to hubs (some hubs might chose to only let people above a certain rating come in). The key thing this guards against is a person using one ratings account from multiple computers at the same time. Otherwise every computer I'm using will get the points awarded to any single ones. It also allows people to access their accounts from other computers, which I think is just a good thing.

I was also thinking about different implementation choices last night. Because the data being reported is heirarchical (i.e. n hosts uploading m files each), it might be positive to encourage XML-style technologies in the foundation of the system. In particular I was toying with the idea of using XML-RPC as the main communications method. Do you have any experience with it? Any thoughts on its suitability?

In any case, it would be really nice if all reports could be logged in minimal form. That way you could run different ratings algorithms on the same data to determine scores in different ways. One user, for example, might say that the global score is what he choses to judge by while another only wants to judge by the amount that the person has shared with a specific hub. Also, this would include the ability to export a user's activities to a file to be imported at a different ratings server, allowing people to move between them. Depending on how much you log it will be a good amount of data, I can't say exactly how much, maybe something like this for each transfer:

<transfer hub="dsfsdfs" size="123456789" time="5:56:43" completed="y">

The data will be highly compressable, and if nothing else we should probably keep all of it no matter what until we've tweaked our ratings algorithm at least.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Ratings server and protocol

Post by sarf » 2003-02-12 16:42

volkris wrote:The downloading person reports to the server specified by the person he is downloading from. Better? :)
Much better. :)
volkris wrote:I'm tempted to say that no, reporting does not affect the reporter's ratings in order to prevent people from, say, sitting around downloading things only so they can file reports. It would also give reporters more incentive to send in faked ratings, though this incentive can be mitigated by not giving them much of a boost for each report. In the end I'd be tempted to say no because we don't want any users to have any direct control over their own ratings, which this gives to an extent. It's certainly not a closed question, though.
Ah... I never thought about giving credit (or whatnot) for reporting - I just assumed you meant that they would get their rating "fined" the appropriate amount, which would be just such a negative feedback thingy we agree on (I hope).
volkris wrote:If you award any points based on total time online (the more a user is actually available, the more valuable it is, after all) the authenticated users' scores will automatically rise.
Hmm... yes, that'd work. I had planned some sort of direct feedback thing, you know, "Join now and get a 25% increase in your ratings from now on to forever!" or whatever. :)
volkris wrote:Oh, were you talking about authenticating the reporters? I'd never gotten around to that part of the deal yet. I've only been talking about authenticating the report by insuring that the subject was actually conducting a transfer.
Well, why not kill two birds with one stone (except that I prefer guns) ?
This seems to be a Good Feedback Thingy (tm) that would both certify the uploader's report as well as authenticating that a transfer has been done.
volkris wrote:One thing I though about last night was the role of the hubs in the rating system. Before I had always figured that cooperating hubs could play a large role in the deal, but I'm reconsidering that now. Due to various problems things like hub authentication bring about, some due to the limitations of existing DC protocols, I now think that the hub should do little more than suggest a default ratings server to clients and express trust for lists of trusted ratings servers.
My, my. I don't even expect hubs to do that. Of course, you could let the hubs give suggestions and trust lists to the client using an extension or whatever. I'm not against this, just against it being needed or required by the client.
volkris wrote:All registered users would then log in to their ratings server(s) as they go online before they even log on to hubs (some hubs might chose to only let people above a certain rating come in). The key thing this guards against is a person using one ratings account from multiple computers at the same time.
Why must we guard against this? While the person would be able to "fool" a number of clients that he had a rating temporarily (before any results have been reported), he'd still get a massive change in his rating when all the transfers are completed.
volkris wrote:Otherwise every computer I'm using will get the points awarded to any single ones. It also allows people to access their accounts from other computers, which I think is just a good thing.
Me too. My only concern is the rating server and what it needs to be equipped with. Currently, I am considering to use a medium-independent Java class structure to store information (since, in the beginning, I'll use files to minimize the possible errors).
volkris wrote:I was also thinking about different implementation choices last night. Because the data being reported is heirarchical (i.e. n hosts uploading m files each), it might be positive to encourage XML-style technologies in the foundation of the system. In particular I was toying with the idea of using XML-RPC as the main communications method. Do you have any experience with it? Any thoughts on its suitability?
I do not have any experience with using XML-RPC professionally in person, but a person I know and trust (my brother) have used it in a product professionally. His conclusion was that, as long as XML is not used in such a way as to allow its overhead to bloat message traffic too much, it is a Good Thing (tm). In the project he was involved in every message used XML to convey information. As the product the project was supposed to create was a gigantic data-shuffler, the usage of XML bloated the traffic too much (mainly due to the use of legacy periphery systems which had to have their input/output translated).
volkris wrote:In any case, it would be really nice if all reports could be logged in minimal form. That way you could run different ratings algorithms on the same data to determine scores in different ways. One user, for example, might say that the global score is what he choses to judge by while another only wants to judge by the amount that the person has shared with a specific hub. Also, this would include the ability to export a user's activities to a file to be imported at a different ratings server, allowing people to move between them. Depending on how much you log it will be a good amount of data, I can't say exactly how much, maybe something like this for each transfer:

<transfer hub="dsfsdfs" size="123456789" time="5:56:43" completed="y">
I do agree that such a form is usable. I'd think that some changes are in order, though (mainly to speed up information processing and collaboration):

Code: Select all

<transfer hub="dsfsdfs" transferredSize="123456789" timeInSeconds="21403" type="download" targetHash="123EAC46EBD8" />
... or something like that. The information should be kept down to a possible minimum without sacrificing important information.
As to transfer to other servers... well, if the number of rating servers would be small enough, it might be possible to have each of them registered with the PGP key servers, and for them to use GPG (or somesuch) to sign a user "history" and for another server to import this information if it passes a verification check. This would mean that we could trust the user with this information, allowing him/her to get their signed information any time they wish... actually, this would allow us to use the user as a backup source in case of rating server failure (as long as the private key was still intact). Whew! Too many ideas boiling in my head! Must... Expound... Upon... Complicated... Ideas... <Brain Reset>

Right. Let's get back to Earth for a while, boring though it may be.
volkris wrote:The data will be highly compressable, and if nothing else we should probably keep all of it no matter what until we've tweaked our ratings algorithm at least.
Yes.

Using certain tools we could make the whole process of trusting a rating server a matter for the other servers. Perhaps we need to have a Master Rating Server that authenticates new servers (if they pass its tests). Every client could come equipped with the address of this master rating server, then it would be up to each new user to authenticate the master server using its fingerprint and a friend that has a trusted public key of the server. This way, each new rating server would only be authenticated if it was found to running to "spec", that is, disallowing fake ratings and whatnot. While an authenticated rating server is neat, it is not too much of a worry for an unauthenticated server since its user simply won't be able to move from it. There would have to be some controls put into the server so that it can't be DOS:ed into oblivion by signing an infinite number of transfers, but other than this the solution is... elegant.

Getting too caught up in my own ideas again.

Give me feedback on my ideas and assumptions - and let's hope some other clueful person decides to join in - it's getting kind of... well... disconcerting discussing issues such as these and not getting yelled at, harrassed, insulted and...

Kinda neat, actually. :)

Sarf
---
Being good at being stupid doesn't count.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Ratings server and protocol

Post by volkris » 2003-02-12 19:43

sarf wrote:Ah... I never thought about giving credit (or whatnot) for reporting - I just assumed you meant that they would get their rating "fined" the appropriate amount, which would be just such a negative feedback thingy we agree on (I hope).
What am I agreeing on here?

Oh, and to answer the other question you asked, a user has no incentive to give out any false information that would preclude him for getting credit for ratings. If a user choses to give out an invalid ratings server that's his choice, though it would be pretty stupid of him to do.
Sarf wrote:
volkris wrote:Oh, were you talking about authenticating the reporters? I'd never gotten around to that part of the deal yet. I've only been talking about authenticating the report by insuring that the subject was actually conducting a transfer.
Well, why not kill two birds with one stone (except that I prefer guns) ?
This seems to be a Good Feedback Thingy (tm) that would both certify the uploader's report as well as authenticating that a transfer has been done.
Well the thing is that it adds complexity, computation time, and bandwidth requirements. It also raises questions about how to exchange authentication keys with foreign servers. So, is it worth the complexity?
Sarf wrote:My, my. I don't even expect hubs to do that. Of course, you could let the hubs give suggestions and trust lists to the client using an extension or whatever. I'm not against this, just against it being needed or required by the client.
One thing hubs really should be involved in is chosing the metrics for the ratings. Different hubs specializing in different things will find different metrics to be best. We'll see how it plays out.
Sarf wrote:Why must we guard against this? While the person would be able to "fool" a number of clients that he had a rating temporarily (before any results have been reported), he'd still get a massive change in his rating when all the transfers are completed.
In case we're on different levels, I don't think it's a good idea for a person's downloads to have any affect on his own ratings because of some trust issues. Plus I simply don't think it's needed. Encourage more sources but don't discourage consumption, I say.

With this system one person uploading will have his positive rating shared among the ten others who are only downloading. That's the problem.
Sarf wrote: Me too. My only concern is the rating server and what it needs to be equipped with. Currently, I am considering to use a medium-independent Java class structure to store information (since, in the beginning, I'll use files to minimize the possible errors).
I'm not sure Java has the balls for a "real" server :)
I've just not had good experience with trying to get Java performance high enough. As a UNIX guy I look at things like BerkeleyDB as the workhorses underneith these datastores.
Sarf wrote:I do agree that such a form is usable. I'd think that some changes are in order, though (mainly to speed up information processing and collaboration):

Code: Select all

<transfer hub="dsfsdfs" transferredSize="123456789" timeInSeconds="21403" type="download" targetHash="123EAC46EBD8" />
I think it would be best for everyone if we kept all records of the actual file being transferred out of the logs. We could even do a little munging of the transferred size, adding or subtracting a random percentage to it just to make sure nobody can ever tell who transferred what by legally demanding the records.
Sarf wrote: Using certain tools we could make the whole process of trusting a rating server a matter for the other servers. Perhaps we need to have a Master Rating Server that authenticates new servers (if they pass its tests).
Getting too caught up in my own ideas again.
Eh, I don't really like the idea of an official master ratings server. Just like the DC++ master hub it's a weakness. I say while these master servers can play a part the main authentication system should be more like the PGP web of trust model.

One thing is that it will be very close to impossible to test with 100% accuracy whether a server is performing up to its promises. This is only made more complex by the use of various different metrics. I can easily see the human element having to play a part in the decision of trust of the ratings servers, unfortunately. By setting a precident of being very transparent with ratings assignments, allowing anyone to view just about anything involving how it's been done, hopefully this situation will be helped a bit. People can do spot checks, too, which will probably also play a big part in deciding who to trust. Perhaps we can encourage people to use multiple unrelated servers somehow?

I'm looking right now for a way to merge an XML-RPC server into my Apache installation. If I can figure out how its done I might just do some coding this weekend. I'd use XML-RPC to handle the client communications to a Python server side program that communicates with BerkeleyDB's XML project for a backend. Should be high performance and not too hard to write.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Ratings server and protocol

Post by sarf » 2003-02-13 10:54

volkris wrote:What am I agreeing on here?
Well, there were that blood-signed contract we talked about, last night... :)

I meant that we do not want clients to report things that will directly lower their own ratings.
volkris wrote:Oh, and to answer the other question you asked, a user has no incentive to give out any false information that would preclude him for getting credit for ratings. If a user choses to give out an invalid ratings server that's his choice, though it would be pretty stupid of him to do.
Well, it has to be taken into account so that the implementation does not crash if it receives an invalid address or whatnot.
volkris wrote:Well the thing is that it adds complexity, computation time, and bandwidth requirements. It also raises questions about how to exchange authentication keys with foreign servers. So, is it worth the complexity?
Depends on how much security you'd want. I want automated security (instead of relying on rumour and hearsay), so I vote "aye" on the issue. Well, if I am not supposed to write the darn thing, that is.
volkris wrote:One thing hubs really should be involved in is chosing the metrics for the ratings. Different hubs specializing in different things will find different metrics to be best. We'll see how it plays out.
Umm... choosing metrics? Why should hubs do that? Why should a client allow a hub to choose for the client?
volkris wrote:In case we're on different levels, I don't think it's a good idea for a person's downloads to have any affect on his own ratings because of some trust issues. Plus I simply don't think it's needed. Encourage more sources but don't discourage consumption, I say.
Eh... I don't think we are on the level on this issue, but I do agree that it would be bad if you could raise your own ratings somehow.
volkris wrote:With this system one person uploading will have his positive rating shared among the ten others who are only downloading. That's the problem.
But the ten others would decrease his rating, so that's alright by me... Unless you mean they somehow stole access to his rating?
volkris wrote:I'm not sure Java has the balls for a "real" server :)
Depends on what you want it to do. Java is pretty good at data-shuffling as long it does not have to handle the data itself too much. Using Java as a bridge between a database and the "outside world", validating users and authenticating transfers, could be done. Trying to get Java to fiddle with every data piece that flows throughs its system will not make Java happy.
volkris wrote:I've just not had good experience with trying to get Java performance high enough. As a UNIX guy I look at things like BerkeleyDB as the workhorses underneith these datastores.
Well, I prefer MySQL as a base, but that's because I'm Swedish and because I don't think we need lots of foreign keys and stuff in the database.
volkris wrote:I think it would be best for everyone if we kept all records of the actual file being transferred out of the logs. We could even do a little munging of the transferred size, adding or subtracting a random percentage to it just to make sure nobody can ever tell who transferred what by legally demanding the records.
I hadn't considered the security issues of saving the hash... perhaps a compromise could be had? Exchange the hash thingy with a simple boolean variable, "hadvalidhash" or somesuch.
volkris wrote:Eh, I don't really like the idea of an official master ratings server. Just like the DC++ master hub it's a weakness. I say while these master servers can play a part the main authentication system should be more like the PGP web of trust model.
Hmm... well, of course that would be a problem. The reason I'd like this is since it makes things easier if there is someone you can always trust. Oh well.
volkris wrote:[snip]People can do spot checks, too, which will probably also play a big part in deciding who to trust. Perhaps we can encourage people to use multiple unrelated servers somehow?
Well, I'd expect a master server to do spot checks every now and then, but how would you discredit a rating server? Automatically, mind you, no ideas about "well, they could have special forum for that". :)
By the by, using multiple unrelated servers should not be a problem if we keep down the message size.
volkris wrote:I'm looking right now for a way to merge an XML-RPC server into my Apache installation. If I can figure out how its done I might just do some coding this weekend. I'd use XML-RPC to handle the client communications to a Python server side program that communicates with BerkeleyDB's XML project for a backend. Should be high performance and not too hard to write.
Let me know how it works out. I'm eager to start piling up some ratings... :)

Sarf
---
The plural of spouse is spice.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Wait a sec... the beginning first?

Post by GargoyleMT » 2003-02-13 12:04

While I definitely see the merit in the discussion so far, it seems to me (not having been around/aware of the lichlord forums) that this is the middle of the conversation, not the beginning of it.

What problems do having a ratings system solve?

Basically, what is rated, and to what end?

Sarf, you mentioned this in the upload throttling thread. Would the rating somehow provide an indicator to the other users / hub ops that a user can be "trusted" to run a "hacked" (or upload-limited) client?

eMule identifies other clients uniquely (to keep track of up/download karma), perhaps it's worth inspecting how they do it - though I doubt their system is cryptographically secure.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Wait a sec... the beginning first?

Post by volkris » 2003-02-13 13:01

GargoyleMT wrote:While I definitely see the merit in the discussion so far, it seems to me (not having been around/aware of the lichlord forums) that this is the middle of the conversation, not the beginning of it.
Everyone who knows how I write knows how longwinded I can be. :)
Excuse me for being pedantic with some of what's written below. Hopefully it will answer questions people didn't even know they had.

I should write a webpage or something explaining the motivation. Sarf and I are following up on this ratings idea which was actually proposed two message board ago, developed in the last message board system, and never completely implemented.

To be hyperbolistic, a ratings system wouldn't solve problems so much as it would completely invalidate all of the other problems being discussed in the other forums.

A more down to earth response would be that a ratings system would, by identifying the actual goals of DC and working directly towards them, make many of the other concerns irrelevant because they tend to be corrections and countercorrections to a system with specific fundamental flaws. For example, you could implement the upload limiting being discussed to solve one stated problem, but then people would begin to require another solution that would insure that the ratings system was used properly. Corrections to corrections abound in DC as people have tried to pull more use out of what's really a pretty unoptimized system.

The basic problem in DC, as well as all filesharing, is the goal of allowing people to download as much as possible. Outside of this goal everything else is secondary.

Luckily, people are good at figuring out ways to suck more files out of a system. For one thing, if only one person thinks up a new way to squeeze another kb/s out of the network, other people will duplicate this advancement, leading to what should be a natural continual advancement in the downloading side of things.

The discussion then focuses on the supply side of the equation (after noting that the location part of it is working well enough in DC). People naturally have no reason to share. They can be influenced to do so by other peoples' proddings, but left alone not many people are going to be trying to figure out ways to optimize their upload capacity. They gain no benefit from doing so. Not only that, but they actually lose a good deal of the time, as uploading can often interfere with a user's accomplishing of a direct goal. It's pretty obvious that people don't naturally want to share.

To combat this situation, DC generally has instituted attacks on both sides of the equation. It tries to force people to share by making certain amounts of information available, and it tries to mandate that clients agree to transfer the information to pretty arbitrary minimum number of hosts at once. Somewhat implicit in these supply requirements are restrictions on use of incoming bandwidth as well. In fact, discussion on the subject often proposes that these restrictions be explicit.

So basically, DC solves the problem of providing transfers by interfering with the transfers themselves. DC cripples all of its clients, forcing them to adhere to a certain average of service in order to ensure that they're all at least at that average. Sure it works, but at a pace that is far from maximal efficiency.

Or, at least it would work except that such rules make assumptions that are flat out wrong. The entire method of controlling DC, which requires the penalization of all clients, assumes that all clients will penalize themselves. In fact, it only assumes that they actually do. There is no way to insure that the clients can be forced to abide by the rules, and there is no way to go back later and check to see if they are.

So, in the current situation horribly inefficient rules keep DC at a mediocre level of usability assuming that clients agree to penalize themselves. It's amazing that the thing still works at all after all of this time. It's also amazing (I suppose not THAT amazing, though) that users will simply roll over and accept this situation. The discussion in the other forums here pretty clearly show that users are perfectly content in just patching the system to deal with its defficiencies, fatal and not.

A ratings system, properly done (and I'm not claiming that I know how to do it properly...), will provide a feedback loop giving people a natural reason to transfer more information to the people who want it. After this change is in place so many other problems will simply drop off the face of the earth and the handicap hindering the natural progression towards better downloads will be free to continue. This time it will be joined by another natural progression that seeks out ways to improve uploads, though.
Basically, what is rated, and to what end?
A ratings server will rate the value of a particular client to a group, be that group DC as a whole, a hub, or whatever. This rating will be based on various behavioral factors such as total amount uploaded, stability of connection to the hub (quality of availability of information), and completion rate of uploads. The end is that this value calculation can be used to reward those who contribute more by putting them at higher priority for scarce resources, be they upload bandwidth or user spaces on a hub.

At the same time it will allow DC to drop all of the other nonsense that it currently uses to keep the system alive (share size requirements, etc).

Just to provide a specific example, if I had a copy of a movie that everyone wanted, say it hadn't come out in theaters yet but somehow I had gotten a high quality copy, I should be allowed to enter your movie sharing hub with only that single 700meg movie shared. The users of your hub, under the current system, would lose out because I didn't share 100G, while a ratings system would reveal the popularity of my offer and your hub would let me in.
Sarf, you mentioned this in the upload throttling thread. Would the rating somehow provide an indicator to the other users / hub ops that a user can be "trusted" to run a "hacked" (or upload-limited) client?
The issue of hack clients is one that becomes completely irrelevent with a ratings system. There would no longer ever be any need to trust a specific user. Who cares what client he uses or if it's hacked or not? If it participates, uploads lots, and generally proves itself valuable in the hub then it will have a high rating. In fact, users who stick with the current non-upload limited clients would probably be penalized. This is only right, though, because the current clients don't give users the ability to be a greater asset to the system for (rightful) fear that they'll be worse. Some "hacked" clients will make DC a better place, and those people will be rewarded.
eMule identifies other clients uniquely (to keep track of up/download karma), perhaps it's worth inspecting how they do it - though I doubt their system is cryptographically secure.
I personally don't believe in tracking download amounts, for various practical and idealistic reasons, and so and upload/download ratio would never be known. I don't believe this ratio is that important in the first place, though, because a person who is encouraged to upload will be sharing whatever they download, making his offering be something he actually liked himself, potentially rasing his value to the group. And anyway, it would be penalizing someone for using the system in ways that contribute to its goals (transferring as much as possible).

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Ratings server and protocol

Post by volkris » 2003-02-13 17:03

sarf wrote:I meant that we do not want clients to report things that will directly lower their own ratings.
It's more important that clients not be able to raise their own ratings. I mean, if someone really wants to I suppose I wouldn't have a problem with them voluntarily lowering their own levels, after all :)
Well, it has to be taken into account so that the implementation does not crash if it receives an invalid address or whatnot.
Oh, of course. Implementation issue. If a user reports an invalid address then it should be counted as not being registered to be rated. We just need to make sure clients don't really have any incentive to avoid being rated.
Depends on how much security you'd want. I want automated security (instead of relying on rumour and hearsay), so I vote "aye" on the issue. Well, if I am not supposed to write the darn thing, that is.
Without reporter signing we know that the subject has authorized a transmission and a ratings report. We also know that the authorization was probably given to the one who is actually sending in the report as there would only be a small window of opportunity for someone else to steal the key and use it themselves. Of course as soon as two different reporters try to report using the same key we can throw out both reports.

If you add reporter signing you well be able to somehow verify who the reporter is, according to some place that's storing the key.

How much additional security does it actually add to verify this? I don't think much.
volkris wrote:One thing hubs really should be involved in is chosing the metrics for the ratings. Different hubs specializing in different things will find different metrics to be best. We'll see how it plays out.
Umm... choosing metrics? Why should hubs do that? Why should a client allow a hub to choose for the client?
They should be involved in chosing metrics. If I'm running a hub sharing rare movies then I should suggest to clients that they should use a hub giving high ratings to people with stable connection properties. If my hub's doing mp3s then I should rank it more with pure speed (perhaps). Hubs know what they're aiming for, they should work with ratings server operators to tune the ratings and the suggest that users use them.
Eh... I don't think we are on the level on this issue, but I do agree that it would be bad if you could raise your own ratings somehow.
Well, just consider the trust.
The person downloading won't want to send in the report that he's just downloaded because it would penalize him. The person uploading could do it except that he would have to find a way to get the downloader's ratings ID without asking the guy being rated, who could lie.

It's just one consideration. I simply don't think penalizing people for downloading will help the situation very much at all for a lot more reasons.
volkris wrote:With this system one person uploading will have his positive rating shared among the ten others who are only downloading. That's the problem.
But the ten others would decrease his rating, so that's alright by me... Unless you mean they somehow stole access to his rating?
The ten others would only decrease his rating if you're lowering ratings because of downloading, which I'm against :)
Well, I prefer MySQL as a base, but that's because I'm Swedish and because I don't think we need lots of foreign keys and stuff in the database.
MySQL is also plenty good. I only mention BerkeleyDB because I doubt we're going to need real relational functionality.
I hadn't considered the security issues of saving the hash... perhaps a compromise could be had? Exchange the hash thingy with a simple boolean variable, "hadvalidhash" or somesuch.
Why do we need to keep track of the file at all?
I could see keeping track of some small things about the file, such as if it passed a integrity check, but not identification of the file itself.
Hmm... well, of course that would be a problem. The reason I'd like this is since it makes things easier if there is someone you can always trust. Oh well.
Trust no one :)
Well, I'd expect a master server to do spot checks every now and then, but how would you discredit a rating server? Automatically, mind you, no ideas about "well, they could have special forum for that". :)
I'd file this under implementation question.
Anyone who wants to can publish a list of servers specifically not to trust, and the client can check the list on its own. I don't think a fully automated trust checker would be very successful. After all, one person's metric will be another one's horrible misuse of the system.
By the by, using multiple unrelated servers should not be a problem if we keep down the message size.
It's something to keep in mind for the future.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Re: Wait a sec... the beginning first?

Post by GargoyleMT » 2003-02-13 20:41

volkris wrote:Everyone who knows how I write knows how longwinded I can be. :)
Excuse me for being pedantic with some of what's written below. Hopefully it will answer questions people didn't even know they had.
Thank you for being so explanatory about the back story behind this proposal. I agree with most of what you said, or I understand what you're referring to. A couple of your assumptions don't fit my motivations or needs as a DC user, but that doesn't take away from the validity of a ratings system, nor does it change that ratings would shift the behavior in the DC community, essentially allowing it to police itself, if it was technically impossible for someone bent on abuse of the system to acquire the fruits of being a dependable contributing member of the DC society.
volkris wrote:A ratings server will rate the value of a particular client to a group, be that group DC as a whole, a hub, or whatever. This rating will be based on various behavioral factors such as total amount uploaded, stability of connection to the hub (quality of availability of information), and completion rate of uploads. The end is that this value calculation can be used to reward those who contribute more by putting them at higher priority for scarce resources, be they upload bandwidth or user spaces on a hub.
I understand why you prefaced the specifics with the spirit behind them; it's too easy to get lost in the details, or forget that the details are just the tanible component of the ideas. Specifics are also the easiest things for people to wrap their minds around. :-D
volkris wrote:The issue of hack clients is one that becomes completely irrelevent with a ratings system. There would no longer ever be any need to trust a specific user. Who cares what client he uses or if it's hacked or not? If it participates, uploads lots, and generally proves itself valuable in the hub then it will have a high rating. [snip]
Here it boils down to the realizating that a proper ratings system will give hub operators a more substantive set of criteria to determine someone's "worth" and (perhaps) "intentions."
volkris wrote:I personally don't believe in tracking download amounts, for various practical and idealistic reasons, and so and upload/download ratio would never be known. [snip]
I brought up eMule not because upload/download ratios are important to me, but as an example of looking at other systems that are designed to solve the same problems and taking those into account when thinking of your own.

So part of what eMule does, on a per-client basis, is keep track of who let you download from them (via a userhash, so you can change IPs, names, etc.) and then uses that number, as well as the priority of the file they want to download to determine how to weight them in your upload queue. Clients which yours considers valuable (going by their contribution to your downloading ability) wait less time than someone who has contributed nothing to you.

You could take a look at Kazaa's "user participation" level, etc. too. Heck, even taking a look at how MojoNation did things might be a useful exercise, since that did have a centralized repository of "karma." This (in general) seems like the p2p-hackers list might have touched on in the past as well.

You and Sarf seem to have an interesting dynamic for idea exchange. I'm not sure where I fit in, other than being more interested in implementable details rather than concepts.... w00t.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Wait a sec... the beginning first?

Post by volkris » 2003-02-14 06:13

Gargoyle wrote: Here it boils down to the realizating that a proper ratings system will give hub operators a more substantive set of criteria to determine someone's "worth" and (perhaps) "intentions."
It's even more basic than that. It will give anyone the ability to get a fell for a user's behavior. I mean I'm sure right now hub operators can come up with plenty of criteria, but there's no system in place to collect data upon which to apply these criteria.

I understand why you prefaced the specifics with the spirit behind them; it's too easy to get lost in the details, or forget that the details are just the tanible component of the ideas. Specifics are also the easiest things for people to wrap their minds around. :-D
volkris wrote:The issue of hack clients is one that becomes completely irrelevent with a ratings system. There would no longer ever be any need to trust a specific user. Who cares what client he uses or if it's hacked or not? If it participates, uploads lots, and generally proves itself valuable in the hub then it will have a high rating. [snip]
Here it boils down to the realizating that a proper ratings system will give hub operators a more substantive set of criteria to determine someone's "worth" and (perhaps) "intentions."
You and Sarf seem to have an interesting dynamic for idea exchange. I'm not sure where I fit in, other than being more interested in implementable details rather than concepts.... w00t.
There are still questions that need to be hashed out. For example, I'd like it if we could come to a consensus as to whether or not to lower a user's rating based on his amount of downloading. I'm against, Sarf's for, but we havn't really discussed it all that much yet.

That's not even getting into the implementation issues. I'm currently thinking about using XML-RPC for communicaiton, but there are plenty of problems with this. It would require more bandwidth than a custom protocol, probably more processing, and it doesn't really fit in with the rest of the current system. At the same time, the communications around ratings are fundamentally different from the rest of the system. They're not as time critical, for example. Also, XML-RPC can go through web proxies (I believe), adding various benefits there.

There's plenty to discuss.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Wait a sec... the beginning first?

Post by sarf » 2003-02-14 06:50

GargoyleMT wrote:So part of what eMule does, on a per-client basis, is keep track of who let you download from them (via a userhash, so you can change IPs, names, etc.) and then uses that number, as well as the priority of the file they want to download to determine how to weight them in your upload queue. Clients that you consider are valuable (going by their contribution to your downloading ability) wait less time than someone who has contributed nothing to you.
This is similar to what I want - that your rating will influence how others will treat you.
GargoyleMT wrote:You could take a look at Kazaa's "user participation" level, etc. too. Heck, even taking a look at how MojoNation did things might be a useful exercise, since that did have a centralized repository of "karma." This (in general) seems like the p2p-hackers list might have touched on in the past as well.
Centralization is bad. That said, some degree of centralization is almost always necessary. The rating server is a, to me at least, acceptable compromise.
GargoyleMT wrote:You and Sarf seem to have an interesting dynamic for idea exchange. I'm not sure where I fit in, other than being more interested in implementable details rather than concepts.... w00t.
Well, just post what you would like the ratings system to be. That's what I've done, and that's what I think people should we do. Of course, putting a bit of thought into forming your opinions to an understandable message is a bonus.
volkris wrote:There are still questions that need to be hashed out. For example, I'd like it if we could come to a consensus as to whether or not to lower a user's rating based on his amount of downloading. I'm against, Sarf's for, but we havn't really discussed it all that much yet.
Well, I'm not so much against than I am concerned about the multiple user exploit we talked about. Reporting downloads would solve this problem. The only thing is that I do not want users to report stuff that would lower their own rating. That feels... wrong.

Sarf
---
Deliverance through annihilation.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Wait a sec... the beginning first?

Post by volkris » 2003-02-14 08:24

I created a Subversion (see http://subversion.tigris.org/) repository on my server for holding docs, code, etc, for a Ratings project. Check out http://volkstar.dyndns.org/~volkris/sub ... DCRatings/. If I can figure out how to do .htaccess files on a repository by repository basis I'll give out write access passwords so others can make changes and additions.

Sarf, as long as users log on to the ratings server it can check to make sure only one person at a time is using a rating. The exploit reminaing involves users moving from one computer to another with the ratings account as it benefits them, an exploit that would still exist if downloads were considered. The solution to this might involve some sort of penalty for repeatedly signing on and off. If a person changes IP addrsses more than something like three times in an hour then he should probably be penalized anyway for having an unstable connection. If someone can only switch computers three times an hour they can't just move their ratings around where it would be fraud. If I need to go into detail about this exploit tell me.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Wait a sec... the beginning first?

Post by sarf » 2003-02-14 16:40

volkris wrote:[snip repository info]
Great! I'll check it out as my time allows, sending you scathing comments on your grammar and such highly important issues. :)
volkris wrote:Sarf, as long as users log on to the ratings server it can check to make sure only one person at a time is using a rating.
Yes. Yet another assumption from my part, since I wanted to use a stateless server. Ah well. I guess that there is really no point to logging out from a server (since only logins would be needed) ?

Sarf
---
Our god's the FUN god! Our god's the SUN god! Ra! Ra! Ra!

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Wait a sec... the beginning first?

Post by volkris » 2003-02-14 18:38

sarf wrote:
volkris wrote:Sarf, as long as users log on to the ratings server it can check to make sure only one person at a time is using a rating.
Yes. Yet another assumption from my part, since I wanted to use a stateless server. Ah well. I guess that there is really no point to logging out from a server (since only logins would be needed) ?
I'd like to keep stats on users' uptimes, but relying on them logging themselves out is dangerous. I was trying to figure out another way. I used to think that cooperating hubs would make sure people were reported as logged out when they were.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Wait a sec... the beginning first?

Post by sarf » 2003-02-16 17:54

volkris wrote:I'd like to keep stats on users' uptimes, but relying on them logging themselves out is dangerous. I was trying to figure out another way. I used to think that cooperating hubs would make sure people were reported as logged out when they were.
Hmmm... we could use the facts we do have to make some statistics - e.g. we know when user X logged in and we know when X reported transfers. Any reported transfers "belong" to the log-in made prior to reporting the transfer. Is this useful in any way?

Sarf
---
"How many tentacles has Great Cthulhu got?"
"Too many."

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Re: Wait a sec... the beginning first?

Post by GargoyleMT » 2003-02-16 20:15

sarf wrote:Hmmm... we could use the facts we do have to make some statistics - e.g. we know when user X logged in and we know when X reported transfers. Any reported transfers "belong" to the log-in made prior to reporting the transfer. Is this useful in any way?
It seems that the time period during which users were actively uploading has more weight to it than a period during which they were not. I think for any contributing user, their uptime and their periods of transfers should be identical.

I'm with you on the issue of whether or not to measure downloads, Sarf. I don't see it as penalty, however. I just see it as fair. If two people have uploaded the same amount at the same speed, I'd say that the person who has asked for less (ie. downloaded less) should be recognized for that.

I think, in practice, that even *after* determining all these metrics, there'll need to be a lot of experimentation to determine the right way to balance them all. A similar example of having to adjust algorithms to weight certain measurement is the spam detection tool, SpamAssassin.

One thing that I've seen mentioned, but not talked about too much is whether, and how, to measure types of shared files. A real example is a Video Game Music themed hub - how'd you measure that? Would you just measure generic information about transfers, like which auto-search category they fell into, and leave themes beyond that to the hub? You could, in theory, mine ID3 tags for genre information, but I wouldn't trust that very far at all.

Food for thought.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Wait a sec... the beginning first?

Post by sarf » 2003-02-17 08:24

GargoyleMT wrote:It seems that the time period during which users were actively uploading has more weight to it than a period during which they were not. I think for any contributing user, their uptime and their periods of transfers should be identical.
Really? What if I upload 900 megs of data to someone in about 30 minutes, then idles in the hub for 90 minutes, and another user uploads 900 megs in 120 minutes? Is my rating going to suffer because I uploaded data faster than the other user? Is this fair? Do we care about it? :)
GargoyleMT wrote:I'm with you on the issue of whether or not to measure downloads, Sarf. I don't see it as penalty, however. I just see it as fair. If two people have uploaded the same amount at the same speed, I'd say that the person who has asked for less (ie. downloaded less) should be recognized for that.
But the problem is this - I, personally, would very much like to report my uploads, but I would not want to report my downloads. If we require the user to report downloads we require the user to report information that will be directly detrimental to the user. The user would not want to report the downloads, so how are we going to force him/her to do that?
GargoyleMT wrote:I think, in practice, that even *after* determining all these metrics, there'll need to be a lot of experimentation to determine the right way to balance them all. A similar example of having to adjust algorithms to weight certain measurement is the spam detection tool, SpamAssassin.
Ummmm... I've interpreted some of Volkris' posts to mean that the ratings server could serve as a sort of database where the clients either could do requests using some sort of SQL-like language (I'd guess) or you could simply pump the pertinent data to the clients that request it, thus allowing the client to choose what metrics it wants to evaluate users with. Someone might want to evaluate people to their time spent online coupled with their upload speed, another using the amount of data they have uploaded and the current position of Saturnus, and so on...
If I've misread Volkris' intentions, then I hope he won't roast me on a spit for it. :)
It might also be that the ratings server only gives out a rating (say, a float or a double) and uses internal algorithms that evaluates the uploads. In that case, the Video Game Music hub had better get its own rating server and recode the evaluation algorithms.
GargoyleMT wrote:One thing that I've seen mentioned, but not talked about too much is whether, and how, to measure types of shared files. A real example is a Video Game Music themed hub - how'd you measure that? Would you just measure generic information about transfers, like which auto-search category they fell into, and leave themes beyond that to the hub? You could, in theory, mine ID3 tags for genre information, but I wouldn't trust that very far at all.
The rating server should not have to request part of files to build its ratings, so your Video Game Music hub would not have any way (currently) to influence the ratings depending on what people shared. If they ran their own rating server they could go in and manually up the ratings of people whom share pertinent stuff, I'd guess.

Sarf
---
Thought for food.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Re: Wait a sec... the beginning first?

Post by GargoyleMT » 2003-02-17 09:39

sarf wrote:Really? What if I upload 900 megs of data to someone in about 30 minutes, then idles in the hub for 90 minutes, and another user uploads 900 megs in 120 minutes? Is my rating going to suffer because I uploaded data faster than the other user? Is this fair? Do we care about it? :)
Ah, good example. I guess my basic intuition is that someone who is always transferring (has overlapping uploads) has more "good stuff" than someone who idles. But that might not be true at all, if the file sizes are large enough and the transfer speeds low enough. What do you think?
sarf wrote:But the problem is this - I, personally, would very much like to report my uploads, but I would not want to report my downloads. If we require the user to report downloads we require the user to report information that will be directly detrimental to the user. The user would not want to report the downloads, so how are we going to force him/her to do that?
Maybe the reporting downloads issue should be tabled for a future revision... It might simply be too hard to do it perfectly the first time. An extreme example of why this might be necessary is: think of an uber-leech, maybe on a nice internet connection, or not. Somehow he's managed to upload more than anyone else in the hub, but he also downloads everyone's whole collection - ie. he downloads a lot more than anyone else too. Should he get preferential treatment?

Hahaha, nice example, self. I think the solution to that is, for any download statistics, keep that client side. Hmm, but then you could abuse that, even if you exchanged some sort of validity token during the upload/download (that the ratings server could verify). But anyone looking to abuse the system like that would be going through a lot of trouble if they just wanted to limit how much someone downloaded from them.
sarf wrote:Ummmm... I've interpreted some of Volkris' posts to mean that the ratings server could serve as a sort of database where the clients either could do requests using some sort of SQL-like language (I'd guess) or you could simply pump the pertinent data to the clients that request it, thus allowing the client to choose what metrics it wants to evaluate users with. Someone might want to evaluate people to their time spent online coupled with their upload speed, another using the amount of data they have uploaded and the current position of Saturnus, and so on...
If I've misread Volkris' intentions, then I hope he won't roast me on a spit for it. :)
I agree here, there's probably just been a little ambiguitity in how I've communicated. Ultimately, it's the users who decide how they weight the various factors (when people are competing for their resources). To that end, there should probably be a couple preset algorithms (as well as a facility for making your own formula) for those unable or unwilling to experiment on their own, shouldn't there? I think after "enough" clients support it, the hubs would also need default settings, but we can leave that problem for the hub software writers. :)
sarf wrote:It might also be that the ratings server only gives out a rating (say, a float or a double) and uses internal algorithms that evaluates the uploads. In that case, the Video Game Music hub had better get its own rating server and recode the evaluation algorithms.
True. Does "the hub I was on when I downloaded X from user Y" belong at all in the database of ratings? That would be a workaround of sorts, because hubs could look for that (even if it was just their own hashed private identifier) when querying/using their formula.
sarf wrote:The rating server should not have to request part of files to build its ratings, so your Video Game Music hub would not have any way (currently) to influence the ratings depending on what people shared. If they ran their own rating server they could go in and manually up the ratings of people whom share pertinent stuff, I'd guess.
Well, I never envisioned the ratings server getting the file, but relying on the client to somehow determine a genre or type to send along with it (such as the genre from the ID3 tag... ick).

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Wait a sec... the beginning first?

Post by volkris » 2003-02-17 09:50

sarf wrote:Hmmm... we could use the facts we do have to make some statistics - e.g. we know when user X logged in and we know when X reported transfers. Any reported transfers "belong" to the log-in made prior to reporting the transfer. Is this useful in any way?
The thinking was that a user who was online (meaning a combination of being logged into the hub accepting search requests and having a stable connection for quality uploads) would be more valuable, but now that I think further into this I don't suppose it's that greate a metric anyway. The stability of connection, for example, can be rewarded by upload speed and granting bonus points for successful completion of each upload. The accepting search requests part is automatically rewarded because uploads simply won't happen without a user reporting hits.

The thing that might be worth measuring, though, is time in a certain hub. For various reasons it's ok for users to want to reward people sharing in their same hubs. Time online in a hub would be worth watching. This is probably best measured through transfers in a hub, though, as being signed on in a hub doesn't guarantee that a client is actually accepting search requests from it.

Of course this continues to add complexity to the information being collected....
GargoyleMT wrote:One thing that I've seen mentioned, but not talked about too much is whether, and how, to measure types of shared files.
I simply wouldn't :)
For one thing, ratings can be tuned to reward certain transfer characteristics. One server can reward for smaller files, making it an mp3 sharing hub, while another can reward for larger ones, for trading movies. But more than anything else, I trust the users of the hub to only download what they want. If a hub is officially for anime movie trading but its users seem to download a lot of porn, then should a person really be penalized for offering what users REALLY want?

Anyway, I'm under the thinking that we really, really need to avoid giving the ratings server information about exactly what's being downloaded. It discourages downloaders from reporting stats if they think they're being tracked. Heck, the RIAA could come in here and set up a ratings server and watch people admit to downloading Bad Things. So I'm for giving the servers no more information (perhaps) thank a munged filesize.
sarf wrote: It might also be that the ratings server only gives out a rating (say, a float or a double) and uses internal algorithms that evaluates the uploads.
The ratings server should only transmit a numerical rating normally. I invision a system where the requesting client can specify a metric to use. There's no sense in transmitting a user's entire history to a client when the single number is all it really wants. Also, this way the Powers That Be can tweak the rating method without having to update every client.

Implementation questions involve whether the rating should be calculated on the fly from stored logs (requiring that the server store logs and possible delays on busy servers) and methods of specifying rating metrics. The second problem is the harder one; PERHAPS clients should be able to upload some sort of ratings profile, though this has trust issues.

In the end I think clients should be able to request the full logs for whatever reason. They could calculate their own ratings and then compair the calculated ones to the ones given by the server to verify that the server is being honest, for example. If this was the normal way of getting ratings information it would hurt bandwidth and perhaps processing performance, though.


And yes, if users can be rated on a hub by hub basis then there's no reason why hub ops wouldn't be allowed to artificially increase users' ratings. Perhaps hub ops could authenticate their submitted ratings increases using a key assigned to the hub.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Wait a sec... the beginning first?

Post by sarf » 2003-02-17 15:09

GargoyleMT wrote:Ah, good example. I guess my basic intuition is that someone who is always transferring (has overlapping uploads) has more "good stuff" than someone who idles. But that might not be true at all, if the file sizes are large enough and the transfer speeds low enough. What do you think?
I think that the person who transfers a file in 120 minutes and the person who transfers for 30 and idles for 90 minutes will get the same rating as everyones rating decays no matter what they're doing... :) I did not take that into consideration.
GargoyleMT wrote:Maybe the reporting downloads issue should be tabled for a future revision... It might simply be too hard to do it perfectly the first time.
Well... it could be implemented but only used to verify the uploader integrity. At least in the Standard Issue Mark I Rating Server.
GargoyleMT wrote:An extreme example of why this might be necessary is: think of an uber-leech, maybe on a nice internet connection, or not. Somehow he's managed to upload more than anyone else in the hub, but he also downloads everyone's whole collection - ie. he downloads a lot more than anyone else too. Should he get preferential treatment?
Well... yes. He's not a leech, since he has transferred lots of good stuff (or at least, people have downloaded stuff from him). He also increases everyone else's ratings when he downloads from them (if they have the rating server).
GargoyleMT wrote:Hahaha, nice example, self. I think the solution to that is, for any download statistics, keep that client side. Hmm, but then you could abuse that, even if you exchanged some sort of validity token during the upload/download (that the ratings server could verify). But anyone looking to abuse the system like that would be going through a lot of trouble if they just wanted to limit how much someone downloaded from them.
As said, send everything to server but do not decrease the rating of the downloadee just because they download stuff. Let us not make this into a quota system - it's a ratings system, darn it! :)
GargoyleMT wrote:I agree here, there's probably just been a little ambiguitity in how I've communicated. Ultimately, it's the users who decide how they weight the various factors (when people are competing for their resources). To that end, there should probably be a couple preset algorithms (as well as a facility for making your own formula) for those unable or unwilling to experiment on their own, shouldn't there?
I'd envisioned some sort of limited metric system (the rating supports metric system QWE, ASD and DFG, and the user gets a description of the rating metric as well as the formula used so that it can be verified).

If you combine this with the ability to download the user logs from the server via some sort of low-priority transfer system (so that it doesn't bring down the server) we can make the rating server almost fool-proof (or is it foul-proof?).
GargoyleMT wrote:I think after "enough" clients support it, the hubs would also need default settings, but we can leave that problem for the hub software writers. :)
Hmmm... the hubs are, in my humble opinion, not relevant to the rating system except that they may suggest or enforce a rating server - shouldn't prove too hard to make a script/client that checked if a user had an account with a rating server and warned/kicked/banned them. That's work for the script-kiddies (oops! wrong term for DC script writers, perhaps? :)).
GargoyleMT wrote:True. Does "the hub I was on when I downloaded X from user Y" belong at all in the database of ratings? That would be a workaround of sorts, because hubs could look for that (even if it was just their own hashed private identifier) when querying/using their formula.
Yes! More data in the database! Ehrm... well, sure, why not?
GargoyleMT wrote:Well, I never envisioned the ratings server getting the file, but relying on the client to somehow determine a genre or type to send along with it (such as the genre from the ID3 tag... ick).
This should not be used... but that is my humble opinion. It is simply not relevant, and could be used for malicious purposes, as Volkris states.
Volkris wrote:The thing that might be worth measuring, though, is time in a certain hub. For various reasons it's ok for users to want to reward people sharing in their same hubs. Time online in a hub would be worth watching. This is probably best measured through transfers in a hub, though, as being signed on in a hub doesn't guarantee that a client is actually accepting search requests from it.
Uhm... not to be mean or anything, but isn't that the hubs responsability? I mean, the hub has all sorts of ways to check on a user that the rating server never could hope to use since it (in my opinion) would serve much more clients than a hub does. Thus it seems more reasonable for the hub to do such things for its users. If we are talking about tallying the amount of time spent in a hub as the time the client has served the users of the hub (the time spent transferring stuff to people in the hub) then I could admit that this is something for the server. Doing it any other way seems to require additional data, as you say.
Volkris wrote:And yes, if users can be rated on a hub by hub basis then there's no reason why hub ops wouldn't be allowed to artificially increase users' ratings. Perhaps hub ops could authenticate their submitted ratings increases using a key assigned to the hub.
If we combine this with the hub-hash idea submitted by GargoyleMT we could do some checking so that the OPs do not lower ratings on users that do not use their hub... yes, yes, I like this, I like it a lot.

Sarf
---
Ban Censorship!

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Wait a sec... the beginning first?

Post by volkris » 2003-02-17 18:17

sarf wrote:Uhm... not to be mean or anything, but isn't that the hubs responsability? I mean, the hub has all sorts of ways to check on a user that the rating server never could hope to use since it (in my opinion) would serve much more clients than a hub does.
Some might see it as the hub's responsibility, but in the end a ratings server would be able to do it much more meaningfully and noninvasively than the hub itself. A hub would have to poll each sharer, somehow, hoping that the sharer isn't smart enough to realise that it's the hub doing it. While the hub is probing everyone so it can punish those who are not complying, the ratings server can just sit back and accept reports that clients are complying, rewarding them for doing so. In the end we get it for free anyway, so we might as well track and offer to use it in ratings.

In other words, yes, the download information we get that specifies the source hub can be use to imply that the client is serving query replies :)
If we combine this with the hub-hash idea submitted by GargoyleMT we could do some checking so that the OPs do not lower ratings on users that do not use their hub... yes, yes, I like this, I like it a lot.
Oh, certainly, under normal conditions I would say that one hub's ops should never affect a user's rating other that for that single hub. I would offer it as an option, but it would be pretty irresponsible for anyone to actually use such a rating.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-17 22:24

Well it only ended up taking a couple of minutes to do in Python, so I now have an operational XMLRPC server running at http://volkstar.dyndns.org/XMLRPC/ . It's currently serving two functions, getRating(name) and submitRating(rating,name). It will return just a little text basically echoing the input.

If there is agreement to proceed in this direction, I'll look into hooking it up to a database backend and we'll hammer out an API. We could consider the XMLRPC to be the reference platform and maybe add on lighterweight protocols later. Someone else would need to hack support into a client.

If we decide to not go with XMLRPC I won;t be able to really contribute much as I can only open port 80 to my server (gotta love web services, though). That's fine too since I've gained some knowledge in setting up a python XMLRPC server. I'm happy either way.

Sapporo
Posts: 36
Joined: 2003-02-09 23:10
Location: AZ, USA

Post by Sapporo » 2003-02-17 22:36

IMO, I would highly recommend that a proposed protocol would use XML. Yes it's a slightly fatter stream, and no I'm not recommending it because it's a "cool" thing.

(The following maybe incorrect as I don't know much about the DC protocol atm)
I think one of the reasons that the DC protocol hasn't grow is because it's difficult to extend. From what I understand, most of the adhoc DC protocol extensions either break compatibility or are serious close to it. With an XML based protocol at least the application could easily parse and then ignore tokens/commands that it doesn't understand. Without affecting the tokens/commands that it does understand in that packet.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Re: Wait a sec... the beginning first?

Post by sarf » 2003-02-18 05:40

volkris wrote:[snip]In other words, yes, the download information we get that specifies the source hub can be use to imply that the client is serving query replies :)
Well, sure, but, to my knowledge, that means that a hub hash (or something that uniquely identifies a hub anyhow) has to be sent with the rating data, and we will have to judge a client from a certain hubs perspective (e.g. when we ask for a rating we'd have to send along the hub hash).
volkris wrote:Oh, certainly, under normal conditions I would say that one hub's ops should never affect a user's rating other [than?] for that single hub. I would offer it as an option, but it would be pretty irresponsible for anyone to actually use such a rating.
The "default" implementation has to be trustworthy, otherwise very few people will start rating servers or use them.
volkris wrote:If there is agreement to proceed in this direction, I'll look into hooking it up to a database backend and we'll hammer out an API. We could consider the XMLRPC to be the reference platform and maybe add on lighterweight protocols later.
Sounds like a plan. I'm willing to go with XMLRPC. What, exactly, is necessary to "host" a rating server on a machine currently, by the way?
volkris wrote:Someone else would need to hack support into a client.
Well, I can do that as well as posting "diffs" in another topic. I can also host a modified "vanilla" DC++ client (e.g. not DC++k, only the rating mods) on my pages. They should stand up to any reasonable loads (as long as they're not /.:ed :)).
Sapporo wrote:I think one of the reasons that the DC protocol hasn't grow is because it's difficult to extend. From what I understand, most of the adhoc DC protocol extensions either break compatibility or are serious close to it. With an XML based protocol at least the application could easily parse and then ignore tokens/commands that it doesn't understand, without affecting the tokens/commands that it does understand in that packet.
Well, NMDC does ignore commands that it does not understand as does DC++, but I do agree that from the extension viewpoint, XML is preferable.

Sarf
---
Ignorance is the pillar of optimism.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-18 08:25

Sapporo wrote:IMO, I would highly recommend that a proposed protocol would use XML. Yes it's a slightly fatter stream, and no I'm not recommending it because it's a "cool" thing.
Well, IIRC the protocol would only be broken if the specific implementation doesn't ignore unknown commands. XML would be no different, it would be like a client crashing if it receives a tag it doesn't recognize.

But anyway. One of the main differences between a ratings system and the normal DC protocols is that the ratings stuff can afford to be much higher latency and lower efficiency. You know, a user wants his download to start NOW and go as fast as possible but it's no big deal if a rating takes a little while to trickle in. XML is therefore not as bad a tradeoff as it is in other places.

The current recommendation being floated is to do the ratings stuff over XMLRPC, which actually makes the entire question of protocols kind of hidden. Transmissions will look like XML on the line, but programmers will see only method calls that might as well be on their own computers.

Wow, I've written a lot above and said little :)

In short XML or not XML isn't really a question right now, but in the future if it becomes one there wouldn't be strong arguments against it from me.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Re: Wait a sec... the beginning first?

Post by volkris » 2003-02-18 08:38

sarf wrote:Well, sure, but, to my knowledge, that means that a hub hash (or something that uniquely identifies a hub anyhow) has to be sent with the rating data, and we will have to judge a client from a certain hubs perspective (e.g. when we ask for a rating we'd have to send along the hub hash).
Well I had just thought that reporters could mention the hub by its address or something. Then if another client asked for a the ratings of a user with respect to a specific hub it would be calculated using information that includes a reference to the hub. I wonder now if there needs to be any security there. I mean, are there really exploits through specifying a different hub?
volkris wrote:Oh, certainly, under normal conditions I would say that one hub's ops should never affect a user's rating other than for that single hub. I would offer it as an option, but it would be pretty irresponsible for anyone to actually use such a rating.
To clarify, I meant that it should be an option for users to request a user's rating including all artifical bumps from any hub, not that it should be an option for an op to bump someone's global rating.
Sounds like a plan. I'm willing to go with XMLRPC. What, exactly, is necessary to "host" a rating server on a machine currently, by the way?
Well, anyone hosting would need an XMLRPC server with enough programming to handle queries and calculation of the actual ratings and somewhere to store the data. I'm using python and its very easy to use SimpleXMLRPCServer class, and I'm going to look into using Berkeley's new XML database to hold data. I'm not sure if it's publicly released yet, but I'm a betatester.
Well, I can do that as well as posting "diffs" in another topic. I can also host a modified "vanilla" DC++ client (e.g. not DC++k, only the rating mods) on my pages. They should stand up to any reasonable loads (as long as they're not /.:ed :)).
Well don't let the vanilla client take too much effort. It would be nice to have a whole hub full of people "Without Limits" (quote the DC++ homepage), but still working.

Now, for the API....

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-18 12:46

Well I suppose more design needs to be done before approaching an API. Timelines and such....

I'll post documents to http://volkstar.dyndns.org/~volkris/sub ... DCRatings/ eventually.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-18 22:07

I'm honestly pretty afraid of overbloating this whole ratings thing, it's already heavier than I particularly liked, but I suppose a reference platform is allowed to be bulky. Just look at Freenet :)

So, here's my proposition: users shoud submit data in XML. The api for submitting information can therefore consist of a single argument and we can never worry about changing that again. The obvious problems for the client will be the cost of creating an XML document and a bandwidth penalty.

I also want to reiterate my idea from the old boards that there be two classes of information submission, the main verbose one and a highly simplified update one. The verbose one would give all of the information at the beginning of the transfer while the update one would occasionally let the server know how the transfer is going. All it really needs is a single integer: the number of bytes received. This way the ratings server can immediately begin to award the client with points without having to wait an hour for a 700 meg movie to complete. This update should be very low priority, I had said it should even be UDP, but that doesn't fit in with XMLRPC. So, just a quick XMLRPC call sending the single integer would do.

Just something to keep in mind...

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Post by sarf » 2003-02-19 08:17

volkris wrote:I'm honestly pretty afraid of overbloating this whole ratings thing, it's already heavier than I particularly liked, but I suppose a reference platform is allowed to be bulky. Just look at Freenet :)
That's one of the beauties of making a reference implementation... unfortunately some hub owners does not understand this and thus refuse the use of DC++. ;)
volkris wrote:So, here's my proposition: users should submit data in XML. The API for submitting information can therefore consist of a single argument and we can never worry about changing that again. The obvious problems for the client will be the cost of creating an XML document and a bandwidth penalty.
Sure. There already are XML document classes in DC++ (SimpleXML) which should prove sufficient, so this should prove no problem.
I will try to do a RatingManager.cpp/h file later this week, but I'll probably throw in a few helper functions just because I can. :)
volkris wrote:I also want to reiterate my idea from the old boards that there be two classes of information submission, the main verbose one and a highly simplified update one. The verbose one would give all of the information at the beginning of the transfer while the update one would occasionally let the server know how the transfer is going. All it really needs is a single integer: the number of bytes received. This way the ratings server can immediately begin to award the client with points without having to wait an hour for a 700 meg movie to complete. This update should be very low priority, I had said it should even be UDP, but that doesn't fit in with XMLRPC. So, just a quick XMLRPC call sending the single integer would do.
This should prove no problem, but there should be a minimum allowed time between updates. I'll probably just put the update thing in a onTimerMinute somewhere.
volkris wrote:Well I had just thought that reporters could mention the hub by its address or something. Then if another client asked for a the ratings of a user with respect to a specific hub it would be calculated using information that includes a reference to the hub. I wonder now if there needs to be any security there. I mean, are there really exploits through specifying a different hub?
Well, not really, just thought of how the rating server would store it, but that's now officially under a APP-field (Another Person's Problem, from Douglas Adams but I am not sure it is named this since I base it on the Swedish translation of his books) as far as I am concerned. 8)
volkris wrote:To clarify, I meant that it should be an option for users to request a user's rating including all artifical bumps from any hub, not that it should be an option for an op to bump someone's global rating.
Ahh... the clients should be able to get a "raw" or "true" rating, ignoring the OPs modifications? This will not go over well with the OPs and hub-owners, but I am all for it.
volkris wrote:Well, anyone hosting would need an XMLRPC server with enough programming to handle queries and calculation of the actual ratings and somewhere to store the data. I'm using python and its very easy to use SimpleXMLRPCServer class, and I'm going to look into using Berkeley's new XML database to hold data. I'm not sure if it's publicly released yet, but I'm a betatester.
Ah... I was more in line of thinking what is needed in hardware/software to set up a server using the reference implementation.
volkris wrote:Well don't let the vanilla client take too much effort. It would be nice to have a whole hub full of people "Without Limits" (quote the DC++ homepage), but still working.
I'll just take a "clean" DC++ source,modify it then propagate the changes into DC++k from that one. No extra work for me, no sirree. I might even have the diffs as a seperate download to make it easier to merge into other DC++-based projects.

Sarf
---
Whipping the llama into a new shape.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-19 11:39

sarf wrote: Sure. There already are XML document classes in DC++ (SimpleXML) which should prove sufficient, so this should prove no problem.
Really? Great! What was it being used for? Config files?
Anyway, take your time.
This should prove no problem, but there should be a minimum allowed time between updates. I'll probably just put the update thing in a onTimerMinute somewhere.
We could do a sort of "fast start" type of thing where the reports come in a little bit faster at the beginning. You know, it starts off sending a message four times a minute, then twice a minute, then once a minute, then finally once every five minutes for the rest of the time. Perhaps the ratings server can even (someday) mention that it's overloaded and to scale back the reports. In any case, drastic changes in transfer speed could also trigger an immediate report, preventing uploaders from sending at high speeds and then lowering them when the reports are spaced out further. Or is this too complex for the benefit?
Well, not really, just thought of how the rating server would store it, but that's now officially under a APP-field (Another Person's Problem, from Douglas Adams but I am not sure it is named this since I base it on the Swedish translation of his books) as far as I am concerned. 8)
I don't remember seeing APP, what book was that from?
Anyway, if the server is willing to store a lot of information then it won't be a problem storing hub names with data.
Ah... I was more in line of thinking what is needed in hardware/software to set up a server using the reference implementation.
I have no idea what the hardware requirements will be :) That will be completely empirical.

To run what I'm looking at writing someone will need a Python installation (I'm 99% sure it will run under Windows, Mac, and everything else) and a database backend, which is yet to be determined.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2003-02-19 11:50

sarf wrote:Well, not really, just thought of how the rating server would store it, but that's now officially under a APP-field (Another Person's Problem, from Douglas Adams but I am not sure it is named this since I base it on the Swedish translation of his books) as far as I am concerned. 8)
That would be the SEP - Somebody Else's Problem. It was used to essentially cloak a spaceship by making it look like an Italian Bistro. That is so horribly out of place in many places (such as a Rugby Field) that many people just pretend that it doesn't exist.

Quite nifty, actually.
sarf wrote:I'll just take a "clean" DC++ source,modify it then propagate the changes into DC++k from that one. No extra work for me, no sirree. I might even have the diffs as a seperate download to make it easier to merge into other DC++-based projects.
If you want me to borrow the queueing code from BCDC, I can see if I can separate that out into something nice (and maybe add a simple frame, ala. Finished Downoads).

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-19 14:55

GargoyleMT wrote: If you want me to borrow the queueing code from BCDC, I can see if I can separate that out into something nice (and maybe add a simple frame, ala. Finished Downoads).
What does the queueing of BCDC do? I know I could probably go look it up myself, but I'm lazy :)

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2003-02-19 19:13

volkris wrote:What does the queueing of BCDC do? I know I could probably go look it up myself, but I'm lazy :)
Well, I haven't looked at it since I played around in the ConnectionManager a couple days ago (and thus expanded my knowledge of how DC++ works), but I think it just keeps track of people who've been denied slots because yours were taken. Then, when it's their turn, it connects directly back to them (no $[Rev]ConnectToMe through the server) and starts their uploads. There's no gui for it as far as I saw, just some /commands in main chat (/showqueue or something similar).

It also strikes me that we can't give preferential treatment (ie. better speeds) to people unless we have upload throttling in the code. I sent Sarf a copy of what I'd done based on his mods in DC++k a bit ago, and those could pretty easily be extended to give higher rated people more slices of the upload pie. Of course, if we just want to use ratings to weight how long people wait for queue slots, preferential transfer speeds aren't an issue...

Sapporo
Posts: 36
Joined: 2003-02-09 23:10
Location: AZ, USA

Post by Sapporo » 2003-02-20 01:46

After reading this thread I feel that some of the fundamentals have been ommitted. I see discussion of physical implementation and User Interface elements without the logical system being completed first. This maybe an error on my part as I never read the previous threads on the topic. I strongly believe that the logical design needs to be completed and available for review before proceeding with any physical implementation. I'm sure this thread has generated a lot of interest but if people don't understand the system then people won't use it. I have been putting off reading this thread because I had no idea what was really being discussed. When I read "Ratings Server and protocol", I think of something completely different than what is being discussed here.

In reality this whole system being discussed is a Karma system, and there is no use in calling it anything else. People are used to the word Karma and know what it's associated with. I think that this whole project should be renamed from "Ratings" to "Karma". It's taken me a couple of hours of reading to figure out just what this "ratings" system will cover (all in one sitting too!). Anyways, moving on. Said simply, Karma is a representation of your contribution to the community. People with a good/high Karma should be rewarded as they have helped the community grow. The fact that people with good/high Karma should be rewarded is a simple concept. People with a lower Karma should not be negatively affected, but the won't get the VIP treatment. The concept of Karma also have very few interpretations, however there are many ways of implementing such a system.

Defining System Boundries

First I think the boundries of the Karma system need to be defined. The DirectConnect network is a decentralized Peer2Peer network which I'm sure your all aware of. With that being restated, a HUB considers itself to be the only HUB in the entire universe. There are no alien lifeforms, and it doesn't communicate with them. Implementing a Karma server that multiple HUBs can utilize will inevitable fail in my opinion; for a few reasons.

Since each HUB considers itself to be alone, it implements it's own rules. This should be obvious as no two HUBs have the same rules and there is no consensus on a standard set of rules either. If I have a HUB, why would I want to share my users ratings with another HUB? Especially considering there is a good chance a user can't even connect to that other HUB based on differing rules?

Since a Karma system is based on a community. I think that a HUB should be the level of granularity for what is considered a community. This also makes sense, considering that most HUBs are theme based communities. For example if I join an Anime HUB and share only porn, am I contributing to that community? Most likely not as there are a lot of HUBs that have scripts that kick users sharing porn. However, I think this would be represented in your Karma. I think I would have a lower Karma in that HUB because there would most likely be less downloads from me, as I'm not sharing the content that that community is looking for. (Well, shouldn't be anyways .. maybe the porn will make me the most #1 downloaded from user lol) It doesn't really matter, or maybe it furthers my point about communities in the DC network. That is one thing that is nice about DC is the fact that their ARE communities. It's definately not a free for all leeching orgy like other P2P. You can actually chat with the other users, I think this system should compliment that concept.

Another thing to consider is the fact that I bet most HUB operators will implement their own Karma server (which might be a moot point see below about Karma Benefits). I do think from a technological standpoint it is more logical and easier to implement a Karma system that works directly with the HUB. Either through direct implementation in the HUB software, a script or HUB<->Karma communication. That's an implementation issue to be discussed in the future. Remember we are creating a protocol and thus a standard. The standard must be designed and written in a logical manner irregardless of a physical implementation.

Also, I should point out that the Karma system should be an optional system that the HUB can implement. It should also be implemented so that clients that don't support Karma will still continue to work. It still could be required by a HUB though.

Defining Karma

Next I think Karma should be defined. What is it? What contributes to having a good Karma? How is Karma calculated? How does time and history affect my Karma? Some basic questions that I don't think have been answered because we are stuck on this 'rating' concept. I don't think this should be based on a value that can go up or down like a rating. A Karma shouldn't be a value that goes up or down either. Karma should be an indicator that is calculated from well defined attributes. How this is calculated should be discussed later, we need to first define what attributes we would like to contribute to Karma.

Potential Karma Attributes:

Online Time/Availablity (See next paragraph for explaination)
Contributed Bytes (Amount of data sent to the community via uploads)
Transfer Speed per upload (Based on avg Kb/s of uploads)

I had a few other attributes listed but I removed them. A Karma attribute MUST be a representation of a behaviour that directly helps the community.

I haven't like any proposed way of tracking online time so far. Using a straight "Connected to HUB" time is not enough for a couple of reasons. What if I have been connected to the hub for the last 120hours (5days) and I got disconnected and immediately re-connected. Now my "Connected to HUB" time is 5mins. Is it fair to call me a leech because I've only been on the HUB for 5mins? No. Availabity time should be based on a time frame (configurable, lets say 1 week for discussion). All of my "Connected to HUB" time should be summarized into a single value for the last 7 days. So even if I get disconnected and reconnect it will still show that I have been highly available for the last week. This metric should only be based on history going back one week, after that it doesn't count. So if I don't connect to a HUB for a couple of weeks. When I come back my Availability metric would be zero again.

I think Karma should only be used to represent good and thus is relative. How good is this user compared to another user? It should not be used to enforce a consequence or penalty directly. For instance, disconnecting you because your Karma is to low. Karma can only give you stuff, it can't take away stuff. For example if you go to a club and have VIP treatment. You don't have to wait in line and you get to sit at a better table. Now, if I goto the same club and am not a VIP. I can still get into the club, but I had to wait in a line and my table wasn't as good. Again, Karma should not deny you access, only affect the availablity and quality.

For discussion purposes I'll try to define a way of calculating a Karma value based on my above mentioned attributes.

User Carl: High Availability, contributed 20Gb at an avg of 50Kb/s per transfer
User Zapi: High Availability, contributed 40Gb at an avg of 25Kb/s per transfer
User Mary: High Availability, contributed 60Gb at an avg of 5Kb/s per transfer
User Burt: Med. Availability, contributed 100Gb at an avg of 3Kb/s per transfer
User Lurk: Low Availability, contributed 10Gb at an avg of 100Kb/s per transfer

(Ok, I'm too tired to figure out a way of doing this right now. Basically, all of the above people should have a similar Karma. Looking at that I wonder if the contributed amount should also be limited to a week of data like availability. With a total contributed stored seperately.)

Karma Benefits

Obviously the whole point of using this system is to give benefits to users with a good/high Karma. So, how can we do this? The following is slighly orientated for a physical implementation due to incorporating it into an existing system. From everyway I look at this, Karma Benefits have to be implemented in the client software directly. There are not many benefits that a HUB can grant other than exclusion to certain rules possibly.

Karma Benefits that I can think of are priority queueing and larger chunk of available bandwidth. I really can't think of anything else atm.

A priority queueing system would have to be carefully implemented. There is the potential that I could never download a file from someone if people with a higher Karma constantly request the same file. Remember that Karma can't take away. Just because people can take cuts in line in front of me doesn't mean that I should let 100 people take cuts in line in front of me.

Obviously allocating more bandwidth to good/higher Karma users is an extremely controvosial issue to say the least. One which maybe dropped entirerly. Eitherway I'll avoid it for the moment.

Anyways, I think that Karma Benefits need to be defined as they are the reason to use the system in the first place. If it's impossible or diffult to award benefits I want, then why use the system? Suggestions on this? (I maybe forgeting some others that were mentioned.)

(Some other things I have ideas on, but don't want to put into words right now. Too damn tired.)

When to report Karma attributes

Karma Security & Verification

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-20 13:40

Sapporo wrote:In reality this whole system being discussed is a Karma system, and there is no use in calling it anything else.
While the word "Karma" is cute and all, it has connotations that are completely opposite what the aim here is. If you don't pick up these differences with the rest of my response I can go into more detail on this.
It's taken me a couple of hours of reading to figure out just what this "ratings" system will cover (all in one sitting too!).
Yes, we need to generate some webpages clearly expressing the purpose and goals of the project. Rooting through a developers' mailing list is never the best way to get to the heart of a project.
Said simply, Karma is a representation of your contribution to the community. People with a good/high Karma should be rewarded as they have helped the community grow. [...] People with a lower Karma should not be negatively affected, but the won't get the VIP treatment.
I could hardly disagree with more things in the above quote :)
Karma is a representation of.... ummm.... some abstract concept of amount of good. As such, people who have been good and thus have high Karma should be rewarded, yes, but most interpretations of Karma would see nothing wrong with punishing people with bad Karma. In fact, most day to day use of the word Karma (at least in my part of the world) is associated with bad things happening to a person with bad karma.

It's quibbling over words, but a ratings system, to me, implies a much more mechanistic, deterministic, and practical device. In my opinion we WANT the ratings system to sink into the background where people will hardly ever think of it again. Once the get the picture that sharing more is good the rest of the effects of the system will only be relevent to client developers. "Ratings system" should be on the same level as "IP address", not "client skin".
First I think the boundries of the Karma system need to be defined. The DirectConnect network is a decentralized Peer2Peer network which I'm sure your all aware of. With that being restated, a HUB considers itself to be the only HUB in the entire universe. There are no alien lifeforms, and it doesn't communicate with them. Implementing a Karma server that multiple HUBs can utilize will inevitable fail in my opinion; for a few reasons.
It's ironic that you start right in saying that DC is decentralized and then immediately throw up a centralized view of the thing. But no matter.

Hubs don't necessarily communicate with the ratings system. The ratings are there for clients to use, not hubs. One of the biggest flaws in DC is the mindset that the hub is the center of the universe. A proper view of the system would be much more considerate of the clients. One of the things that the ratings system does is to empower clients, irrespective of their hubs. This actually ends up being good for both clients AND hubs.
Since each HUB considers itself to be alone, it implements it's own rules. [...] If I have a HUB, why would I want to share my users ratings with another HUB?
Hubs currently implement these rules as a very, very poor measure to improve quality of sharing going on. DC is flawed in that such patches are necessary to keep the whole thing from crashing. With a ratings system in place these rules become much less needed and the quality of sharing can go exponentially beyond what even the most effective rules could hope to provide.

As for the question of why share user ratings, who cares? You're a hub. Your job is to route messages from one client to another. You mind your business and let the clients mind theirs.
I think that a HUB should be the level of granularity for what is considered a community.
Of course you do, you have a centralized, hub-as-the-center view of the world. I, on the other hand, recognize the value of the clients and let the clients chose what communities they consider to be the proper level of granularity. Normally this will probably be at the hub level, but not always.
This also makes sense, considering that most HUBs are theme based communities. For example if I join an Anime HUB and share only porn, am I contributing to that community?
If people download the porn from you, then yes, you are contributing. If people don't download from you because you don't have Anime then no, you're not contributing. The ratings system will rate you as to your ACTUAL value to the community, not the value you're "supposed" to have based on the community's stated rules.
Another thing to consider is the fact that I bet most HUB operators will implement their own Karma server (which might be a moot point see below about Karma Benefits).
And under the proposed system they have every opportunity to. However, also under the proposed system the actual location of the server and its relationship to other servers is completely irrelevant. What I mean is that I could set up my server and you could set up a different server, but in the end it will work no differently from us setting up a single server. The decision as to which way to go here becomes matters of trust of server admin and hardware constraints.
Also, I should point out that the Karma system should be an optional system that the HUB can implement. It should also be implemented so that clients that don't support Karma will still continue to work. It still could be required by a HUB though.
The proposed system is completely optional, benefiting those who participate while not directly injuring those who don't. However, it isn't the hub implementing it. The user is implementing it. Like I said, the hub needs to go back to routing messages and let the user make sense of ratings. Finally, it is technologically impossible for a hub to require it in any implementation without causing massive problems. And you wouldn't get any real benefit out of it anyway.
Next I think Karma should be defined. What is it? What contributes to having a good Karma? How is Karma calculated? How does time and history affect my Karma? Some basic questions that I don't think have been answered because we are stuck on this 'rating' concept.
Not at all. They are answered, and the answer is "let the user decide". Problem solved :)
I don't think this should be based on a value that can go up or down like a rating.
Of course they should go up and down in response to the user's activities on DC. Otherwise you'd have all of the massive misuse of the system that exists now.
(Ok, I'm too tired to figure out a way of doing this right now. Basically, all of the above people should have a similar Karma. Looking at that I wonder if the contributed amount should also be limited to a week of data like availability. With a total contributed stored seperately.)
Answer: let the user decide.
Anyway, the users you mentioned above got their ratings in very different ways. That they have the same ratings would mean a very poor metric is being used, seeing as different users in different situations WILL want different aspects of the subject's profile to be positive.

Take for example the difference between a mp3 fan wanting high burst rates while a movie fan would want stable connections.
Anyways, I think that Karma Benefits need to be defined as they are the reason to use the system in the first place. If it's impossible or diffult to award benefits I want, then why use the system? Suggestions on this? (I maybe forgeting some others that were mentioned.)
Well here's the simple answer: we'll bully you into using it.
If by not using the system you are having a 2% harder time downloading, then by holding out you're hurting yourself.

Here's a longer answer: you have no choice in the matter. If you upload to someone who is participating then you're in the system. You will never necessarily have any way of knowing if you're uploading to someone participating, and so you are forced to accept it. If you're going to be getting points for it you then might as well step up and accept them.

Ratings benefits CANNOT be defined because we're not omnipotent and therefore cannot forsee all possible benefits to having higher ratings. All we can do is say that at least you might stand a higher chance of grabbing download slots if you participate.

A lot of what you went into detail above has been already discussed in this thread. That the rewarding of priority on downloads has to occur on the clients, for example, is a fundamental part of the proposal.

Another fundamental part addresses you question of when to report. This will be reported at various times during file transferrs. At the beginning and end, certainly, but also occasionally throughout.

Security is not a big deal to us simply because in such a distributed system it will be almost impossible to make a significant false impact on a user's ratings without basically owning most of the hub, at which point you have bigger problems to worry about. We have discussed a couple of simple measures to make it more difficult to forge reports, but on the whole there is only so secure that a system can get and this is not a problem.

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Post by sarf » 2003-02-20 14:04

Well, I thought I'd go and comment Sapporos interesting post, but then the darn volkris beat me to it. Grr. :)
GargoyleMT wrote:It also strikes me that we can't give preferential treatment (ie. better speeds) to people unless we have upload throttling in the code. I sent Sarf a copy of what I'd done based on his mods in DC++k a bit ago, and those could pretty easily be extended to give higher rated people more slices of the upload pie. Of course, if we just want to use ratings to weight how long people wait for queue slots, preferential transfer speeds aren't an issue...
Thanks for the code, by the way - I'll have to look into it when I have not had a three-hour marathon meeting with my project customer. :roll:

We could, as a start, use a queue and make clients with good ratings go ahead of their unfortunate brethren, or we could make a fully fledged upload controlling client. I like the queue, and it is needed in the DC network, so I'd be glad to include a queue as well as introducing upload controlling code into the code. We do need to be aware of the fact that the client would be upload controlling, which means that the user decides how much upload bandwidth the client should maximally use and that the client then decides how to divvy the upload bandwidth between the clients that want a piece of the pie.

Because I had too little time today, I have to put off the ratings manager for another day, but I will hopefully be able to make the .h file at least, so that we have something to work with.
It will be a draft, though, so let's not rush ahead of ourselves.

Meanwhile, if someone (GargoyleMT?) could take a peek at BCDC and determine whether a queue could be implemented in a snap (especially if some nifty algorithm could be use to "bump" clients with good ratings to better positions) this'd help, because we probably need to do some distributed work to get a modified client up and running.

Lastly, and most importantly, what is to be the client's name?

DC++ Ratings Enabled Version seems so... boring, in my humble opinion.

Does anyone have a name for our pet client?

Sarf
---
Never sign a contract including any of the phrases "sort of", "kind of", or "and stuff".

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Post by sandos » 2003-02-20 17:54

sarf wrote:Meanwhile, if someone (GargoyleMT?) could take a peek at BCDC and determine whether a queue could be implemented in a snap (especially if some nifty algorithm could be use to "bump" clients with good ratings to better positions) this'd help, because we probably need to do some distributed work to get a modified client up and running.
This seems to be a pretty easy and fair idea, and it should have less state than a full queue-system (maybe not?):

Define a constant timeperiod, such that this period T will be long enough to catch any client that is trying to connect (use the interval DC++ uses * 2?). Hash any client that get a maxedout "$nick+$ip" down to a byte and keep that byte around, mapped together with a rating and a timestamp (remark: ouch). Discard any values older than T.

Now, when there is a free slot, just go through this mini-hashtable and add upp all ratings and generate a random number. Now just calculate which client got it, by subtracting each clients rating one by one. (Each client will have a chance proportional to the number that the rating is)

This all sounded so good when I thought about it, but it seems it was about 10x more complicated than I thought, and uses alot of state, especially when there are many hammering clients. Sometimes you need to get an idea down on paper to understand it thorougly. This was a plain bad one. :)

Sapporo
Posts: 36
Joined: 2003-02-09 23:10
Location: AZ, USA

Post by Sapporo » 2003-02-20 20:27

There is obviously a lot more to this discussion that what is listed in this thread then. All I did was read thru the thread and rehash it. As mentioned before a forum is a tough if not horrible way of working out a standard. I assume that there is further information here http://volkstar.dyndns.org/~volkris/sub ... DCRatings/ but I have never seen that link being up.

Looking back, Karma isn't the word I really wanted to use as I completely glazed over the concept of Bad Karma.
volkris wrote:It's ironic that you start right in saying that DC is decentralized and then immediately throw up a centralized view of the thing. But no matter.
I don't see it as centeralized because I'm not considering a "global" rating system like you are proposing.
volkris wrote:The proposed system is completely optional, benefiting those who participate while not directly injuring those who don't. However, it isn't the hub implementing it. The user is implementing it.
--
We have discussed a couple of simple measures to make it more difficult to forge reports, but on the whole there is only so secure that a system can get and this is not a problem.
I think this is the biggest flaw of the entire proposed system, the fact that the client is implementing it. Personally, I think it would be quite easy to exploit this system. I could participate in the ratings system and get points for each person uploading from me, thus eventually gaining a high rating and it's advantages. Nothing is forcing me to report points for other users and more importantly, grant the advantages to them. The result of this give/take is that I only take from it. Hell, I didn't even have to fake or forge data to get a good rating and its benefits.

Unless this has all fused into one clump for me and I'm missing something. I don't remember seeing any sort of check and balance that would prevent this.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-20 23:03

Yeah, that was my fault. I was doing some work on my repositories. It should be working now.
volkris wrote:I don't see it as centeralized because I'm not considering a "global" rating system like you are proposing.
No, I'm referring to your view of the hub being the center of the universe for each distinct community. That's a very centralized model. Not that there's necessarily anything wrong with it, but let's not refer to it as a decentralized system when it's hardly that.
I think this is the biggest flaw of the entire proposed system, the fact that the client is implementing it. Personally, I think it would be quite easy to exploit this system. I could participate in the ratings system and get points for each person uploading from me, thus eventually gaining a high rating and it's advantages. Nothing is forcing me to report points for other users and more importantly, grant the advantages to them. The result of this give/take is that I only take from it. Hell, I didn't even have to fake or forge data to get a good rating and its benefits.
But you're blind to the other side of the picture. If people are giving you ratings, then there's a good chance that they're also rating other people. Just because you chose not to take an action that would increase other peoples' ratings doesn't really mean much as long as the people rating you are rating others. I can provide a hypothetical case if you want...

The point is that even if you don't submit rating information you're still being judged against the others. You neither gain nor lose by not submitting information, so you might as well.

Not only that, but it's entirely concievable that extra consideration would be awarded to people who are submitting ratings.

Client implementation is the greatest part of the entire proposal. The hub centric nature of DC currently fights against the clients every step of the way, with its rules and limits. Client implemented solutions, when they're real solutions, move the client from an enemy to an ally.

In any case it would be impossible to do this in a hub centric way unless all transfers were masked and routed through the hub. Even then clients could subvert it more easily than they could here.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-02-21 10:52

[quote]Meanwhile, if someone (GargoyleMT?) could take a peek at BCDC and determine whether a queue could be implemented in a snap (especially if some nifty algorithm could be use to "bump" clients with good ratings to better positions)[/quote]
Dunno if this'll help, but the upload queue consists of three main parts: one stores the IPs and ports of active users who attempt, successfully or not, to download something, the queue itself of nicks (I don't use users, but that could work too), and code distributed in various places where the number of free slots changes.

These amount to providing the client a way of restarting the conection protocol. It restarts an active user's attempt by connecting again to the IP address and port of the nick who is on the queue (the normal sequence is $ConnectToMe/a connection attempt; the first step is skipped). It restarts a passive connection by sending the $ConnectToMe command to connect to oneself (the normal sequence being $RevConnectToMe/$ConnectToMe/the connection attempt, and again the first step is skipped). I call the function that does this halfconnect.

I've vascillated over upload management policy, and I'm still not sure I like it (i.e. when is a user deleted from the queue, when are they placed on the end of the queue).

The primary weakness of the queue as I implement it is that it doesn't lock out traditional download attempts. This means that if the notified user off the front of the queue doesn't respond quickly enough, someone else will get the slot. However, I'm uncomfortable with "no slots available" messages for a given time period after the user's notified, though, as they're lies.

(I am interested in incorporating this ratings server protocol code into BCDC++ at some point, as well.)

*shrug*

sarf
Posts: 382
Joined: 2003-01-24 05:43
Location: Sweden
Contact:

Post by sarf » 2003-02-21 15:10

cologic wrote:[snip quote and info about queue]
These amount to providing the client a way of restarting the connection protocol.
[snip rest of explanation]
Ah. Interesting.

This way we'd be controlling exactly who gets the slot. I like it.
cologic wrote:I've vascillated over upload management policy, and I'm still not sure I like it (i.e. when is a user deleted from the queue, when are they placed on the end of the queue).
Well, it is something that is easily done, and a rating system without upload management is kind of... pointless.
cologic wrote:The primary weakness of the queue as I implement it is that it doesn't lock out traditional download attempts. This means that if the notified user off the front of the queue doesn't respond quickly enough, someone else will get the slot. However, I'm uncomfortable with "no slots available" messages for a given time period after the user's notified, though, as they're lies.
No, they are not if we consider "No slots available" to mean "No slots available for you, loser. Try again.". Unfortunately there is no way of indicating another message without losing all the old clients.
I'd prefer to "lie" since that way the client can distribute slots more efficiently. Besides, isn't it "lying" to tell a client "No slots available" if they hammer you? I, for one, do not want to reward hammering. Currently, there is nothing stopping anyone from coding a client that will send connect messages every millisecond, and this would be rather wearying on the client being hammered.
cologic wrote:(I am interested in incorporating this ratings server protocol code into BCDC++ at some point, as well.)
Good! Another client that'll join the crusade! :)

volkris took up most of the things I wanted to say about Sapporo's post, but here are my quotes and relevant comments.
Sapporo wrote:I think this is the biggest flaw of the entire proposed system, the fact that the client is implementing it.
Maybe so, but I do not think that many people will want to install OverMind 2.0 that would limit them from doing anything whatsoever to the client program and who would take their system files as hostages to the users good behaviour - which is what would be needed if we decide to remove all power from the client. A wee bit dramatized but I hope you get the gist of it. If we do not trust the client at all then we must make sure that either a) that the client can do nothing with the data it has or b) that the consequences of modifying the client would be so horrendous as to discourage all but the most dedicated.
Sapporo wrote:Personally, I think it would be quite easy to exploit this system.
Of course it is easy to exploit it, but you do not benefit as much from doing so compared to what you gain from exploiting the current system. Slot blocking, fake sharing using dummy files, upload limiting... all are ways of decreasing the work you need to do to gain the goodies, and they are all quite viable in todays DC network.
If you combined slot faking with upload limiting you could use that as a motivation why someones download went so slowly ("dude, I have eight other slots and they're all getting 25-50 kb/s each - be happy with your 32 kb/s") and which would work quite well with the OPs, if the other user decided to bring them in ("well, I can get his filelist so he's not fakesharing").

In short, no system is perfect, but the rating system is, in my opinion, a step in the right direction (or was it the left direction? hmmmm).

Sarf
---
His philosophy was a mixture of three famous schools -- the Cynics, the Stoics and the Epicureans - and summed up all three of them in his famous phrase, "You can't trust any bugger further than you can throw him, and there's nothing you can do about it, so let's have a drink."

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2003-02-21 16:39

volkis wrote:Security is not a big deal to us simply because in such a distributed system it will be almost impossible to make a significant false impact on a user's ratings without basically owning most of the hub, at which point you have bigger problems to worry about. We have discussed a couple of simple measures to make it more difficult to forge reports, but on the whole there is only so secure that a system can get and this is not a problem.
And since I do control a couple of hubs, that's not a problem for me. :D
The best way I can think off to exploit this, though, that I haven't seen covered in this thread relies users crediting each other for uploads that never took place.

One method of doing this relies on a hub actually containing a community containing users willing to help each other. In this instance, they'd agree to give users within the hub, say, 10 times the normal upload credit than anyone outside the hub. The downloading users report this to the ratings servers, and those users can then more effectively leech from other hubs whose members haven't conspired to disproportionately raise each other's ratings. Though this does rely on people desiring to help each other, that DC is distinguished from many other peer to peer filesharing systems due to the existence of such communities seems to be an operating assumption by volkris, Sapporo, and GargoyleMT at least.

However, I'll acknowledge that such communities may be rare; I believe that's an overly charitable view of DC users, anyway. The variation that works even for individual users without a hub to support them relies on their creating bots, essentially, to download from on localhost continuously. Less honestly, they could not even bother to download and just send in reports of their having uploaded to other users. This produces a similar effect of boosting those users' ratings disproportionately to their true or useful uploads.
Sapporo wrote:A priority queueing system would have to be carefully implemented. There is the potential that I could never download a file from someone if people with a higher Karma constantly request the same file. Remember that Karma can't take away. Just because people can take cuts in line in front of me doesn't mean that I should let 100 people take cuts in line in front of me.
This is getting into the realm of mode complex scheduling concerns, and I'm most familiar with operating system scheduling, so I'll use that analogy.

I would liken the current situation, whether upload queuing is used or not, to cooperative multitasking (Windows 3.x, MacOS 6, 7, 8, 9): a client gives up a slot when it's done, as a program gives up its execution slot when it's done by yielding to the OS.

Well, this was found to be vulnerable to programs not giving up their slot. In the case of OSes, the most extreme cases of this were, of course, accidental: programs hanging and such. In the case of a DC client, the effect occurs intentionally, because why would a downlading client voluntarily disconnect from an uploading client? However, the result is analogous, and that downloading client monopolizes CPU time.

Preemptive multitasking (and half-decent OS, as well as Windows 95, 98, and Me when dealing with Win32 programs :wink: ) solves this for operating systems, and I believe it could also help the DC network by taking the form of someone's slot running out after they've downloaded either a certain amount of data or done so for a certain amount of time. Doing so by data helps slow connections and relatively penalizes fast connections, whilst limiting it by time does the reverse. By data also has the virtue of suggesting more universal default settings (e.g. 600MB or something, roughly the size of a CD).

Once a client's hit whatever limit one chooses to define, it'll have lost its slot and will receive "No slots available" next time it tries to download something.

Well, this was sort of off-topic, but it's related to upload queuing, and I wanted to get it out somewhere. A mod can feel free to move it/split it off this thread.

Sapporo
Posts: 36
Joined: 2003-02-09 23:10
Location: AZ, USA

Post by Sapporo » 2003-02-21 16:40

sarf wrote:Maybe so, but I do not think that many people will want to install OverMind 2.0 that would limit them from doing anything whatsoever to the client program and who would take their system files as hostages to the users good behaviour - which is what would be needed if we decide to remove all power from the client. A wee bit dramatized but I hope you get the gist of it. If we do not trust the client at all then we must make sure that either a) that the client can do nothing with the data it has or b) that the consequences of modifying the client would be so horrendous as to discourage all but the most dedicated.
This isn't what I was thinking. I was referring to the fact that a good system will work as it was designed irregardless of whether people have access to the source.
volkris wrote:Not only that, but it's entirely concievable that extra consideration would be awarded to people who are submitting ratings.
This is a step in the right direction. The fact that a client is or is not submitting should also be indicated in their rating (a seperate metric). That way if someone connects to me to download something, I can then look up their rating. If they have a high rating but have never submitted metrics themselves I can deny them VIP treatment. Like sticking them in the queue like a normal user without a rating.

Since it's up to the client to choose who and how to reward. I would set mine up to only reward people that are submitting rating metrics on other users.

volkris
Posts: 121
Joined: 2003-02-02 18:07
Contact:

Post by volkris » 2003-02-21 21:01

Sapporo wrote: This is a step in the right direction. The fact that a client is or is not submitting should also be indicated in their rating (a seperate metric). That way if someone connects to me to download something, I can then look up their rating. If they have a high rating but have never submitted metrics themselves I can deny them VIP treatment. Like sticking them in the queue like a normal user without a rating.
Hey, you can reward whatever you want :)
I personally don't think this is a great idea, partially because I wouldn't know WHY a user isn't participating. The whole point of DC is not to amass ratings, it's to transfer files, and as long as a user is transferring files he's good in my book. There's also an issue as to how exactlty we'd know if a user has submitted ratings. Just because his own server doesn't know about the submission doesn't mean that he hasn't been rating like mad at another. At the same time, just because another server says he's been rating like mad doesn't mean that the server can be trusted.

cologic wrote:And since I do control a couple of hubs, that's not a problem for me. :D
Ahh, I misspoke. When I said "it will be almost impossible to make a significant false impact on a user's ratings without basically owning most of the hub" what I should have said was "it will be almost impossible to make a significant false impact on a user's ratings without basically populating a hub entirely with people set to go out of their way to mess with that user." See, ownership of the hub itself has nothing to do with it, seeing as a hub can only really control its users in matters involving the hub (device not community) itself.
The best way I can think off to exploit this, though, that I haven't seen covered in this thread relies users crediting each other for uploads that never took place.
Or giving more credit than they should for a download. This is the one remaining exploit I've found and it does trouble me, Sarf and others pay attention as I havn't mentioned it before.

What is to stop users from saying "If you let me download from you I'll credit you with double the upload points"? I honestly don't have a final solution to this, though I don't know how big a problem it actually is.

To an extent such practices can be discouraged by limiting the amount of information that clients can get about how their ratings were decided. In order for a deal like this to work, the source has to trust the client at least a little bit. The main way to create this trust is to allow the source to verify that the inflated points were actually reported. A single person's report should not normally make a big enough difference in ratings to allow this checking, but if a client can download reporting history, even with reporter identifications missing, he can be reasonably sure of whether or not the misrating occurred.

Basically my premise here is that I'm not going to put much effort into upping a user to the front of the line if he's offering to double the points if I can't verify that he actually does it. Of course if he's offering then it might be worth taking the chance, right? Unless people start running clients that report that a user is accepting these offers, punishing them for it.

The problem is that this is at odds with another feature of the proposed system: allowing clients to download ratings histories in order to verify that the server remains honest. Not to mention downloading them in order to do external ratings.

But that's only really considering the technological side anyway.
One method of doing this relies on a hub actually containing a community containing users willing to help each other. In this instance, they'd agree to give users within the hub, say, 10 times the normal upload credit than anyone outside the hub. The downloading users report this to the ratings servers, and those users can then more effectively leech from other hubs whose members haven't conspired to disproportionately raise each other's ratings. Though this does rely on people desiring to help each other
Ahh, but if users of a hub are able to grab more stuff from other hubs they can come back and share new stuff in their own hubs, giving hubs incentive to participate in these schemes.

So that's the problem: users socially trusting each other, cooperating to conspire against the rest of the world, and users brokering deals that influence the ratings system (this only really matters technologically).

We can break down the technological trust at a price, but the social side is still a problem.

How big a problem? I don't know :)
It is very dependent on scope. For example, for very small total groups it would be easier to corrupt the system without lots of valid results to average in. For larger groups it would be harder, as part of the rating can rely on the diversity of reporters (which can't completely solve the problem, of course).

So, any genius solutions?

Locked