[dcdev] Text protocol draft

I've already posted the first part of this in a reply, but I'll repost
it for clarity.

I suggest using a line-based protocol, with CRLF line termination (the
CRLF can also be thought of as a start code, and it can also be
quoted). Within lines, words are seperated by whitespace. I was
planning on simply using isspace(3) for detecting whitespace, but I
can agree to only allowing ASCII 32 spaces instead for efficiency. Any
thoughts on that? As for quoting, I suggest allowing both double
quotes and backslash escaping, for good reasons. Double quotes
escaping is, for optimizaton reasons, limited to only include whole
words - ie. no sub-word quoting like a"b c"d instead of "ab
cd". Backslashes can quote anything. CRLF can be quoted both by
backslash or double quotes. The first word of each command is the
command name.

When it comes to optimization, the only thing that would require data
moving or copying is backslash removal. As for everything else, just
insert NULs where you want them. Since double quotes can quote
everything but themselves, and words are likely to not very often
contain double quotes or backslashes, backslash removal will probably
be a rather rare procedure (yes, I want pathnames to consist of
slashes, not backslashes). Also, if you want to optimize it on the hub
side, you can simply choose not to dequote words that you don't need
to look at.

eric, your Search challenge had some troubles in your own
example. Quote:

1) an mp3
2) an avi bigger than 2MB
3) an mp3 bigger than 2 MB but smaller than 10MB
4) an avi or an mpg being bigger than 350MB but smaller than 700MB
The 2 first cases are easy, show me the third and the fourth :)
With a binary protocol like the one I describe above, it is easy:
* case 3) 1 parameter has "string" type and value "mp3", the 2 nd
* has "size" type and value "2MB", the third has "size" type and
* value "-700MB".
* case 4) Just provide 2 strings parameters, 1 is "avi" and the
* second is "mpg".

In your case 3, you imply that the conditions are "and"ed, while in
case 4, you imply that they are "or"ed (you also omitted the size
criterias from case 4, but I guess that was accidental).

Instead, I suggest a "Search" command (SRCH) with an arbitrary number
of words, in each of which the first two characters describe a
selector and an operator. Then it depends on how advanced a search
algorithm you want the clients to implement, but if you want a search
algorithm that can do what you wanted above, look at this example SRCH
command:

SRCH ( N~.avi | N~.mpg ) & S>350M & S<700M

Your examples require condition precedence, so I introduced grouping
operators as well, as you can see.

Personally, I believe that the more advanced filtering should be left
to the client that recieves the results, to relieve the search
algorithm a bit.

I would also _love_ to see regexp searching, but I guess the Windows
folks won't be too fond of that. If it would come to pass, I would use
a SRCH command like this instead, where all parameters are always
"and"ed (removing comparison groupings do relieve clients a lot, as
well):

SRCH N~(.mpg|.avi)$ S>350M S<700M

For those who aren't used to regexps, they also allow some pretty
interesting stuff. For example, take a very popular series like Ranma
1/2, which has lots of variants. If I want to search for episode 5 of
season 4 and want to avoid sizes below 100MBs (to get rid of the
dubs), I could use this to make sure I get the right one:

SRCH N~ranma[^0-9]*4[^0-9]+0?5[^0-9]*.avi$ S>100M

Then again, I guess those writing Windows clients won't love that (or
does Windows have regex parsing these days?), so I guess I'll have to
do it as I do it today, ie. let my client filter on that.

As for the chat example, it's really easy. Just let the first word be
the chat message, and the rest by dest nicks. (I don't think a source
nick should be specified; it should be registered with the hub.) Then:

CHAT "Hey all, whaddaya think about this so called \"binary\" protocol
they're cooking up on dcdev?" "ASCII Lover" bin_hater Dolda\ 2000

;-)

Of course, UUIDs should be used rather than nicks, I just used
nicknames here for reference. As for actually sending to nicks rather
than UUIDs, I don't think that should be implemented anyway. Wasn't
one of the points of this new protocol to have unambiguous user IDs?
If you really want to be able to use nicks, it's no big problem
either, though. Just make the parser recognize the UUID syntax, or
prefix the recipients with either some character or another short
word. Also, if you want to optimize it further, you can make the chat
message the last word instead, that's no difference for the protocol.

That's my suggestion. Feel free to comment on it if you think
something's wrong/missing.

Fredrik Tolf

--
DC Developers mailinglist
http://3jane.ashpool.org/cgi-bin/mailman/listinfo/dcdev