Re: [dcdev] Text protocol draft
Fredrik Tolf
2003-12-04 2:03
Direct Connect developers <[email protected]>

eric writes:
> >  > <very serious mode>
> >  > Why not using something more powerful like expression that can be
> >  > processed by perl ( you are not very far with your N~ and S> ). Only few
> >  > things are missing like &&, || or parenthesis.
> >  > </very serious mode>
> >
> > I have nothing at all against using PCRE instead of POSIX RE, if
> > that's what you mean. And sure, I can most certainly agree to using a
> > full-sized expression syntax; a parser can be written in five minutes
> > using flex and bison anyway (and with a word based protocol, you
> > probably won't even need flex).
> > or if it is exactly a perl expression returning TRUE or FALSE, you
> can let it parsed by an embedded perl.

While I admittedly like the idea at one level, I'm not sure that
everyone likes the idea of embedding perl into their
clients. Especially on Windows, where not everyone has perl to begin
with, it could become somewhat of a distribution problem.

> > Basically, you just need to define some basic criterias that can be
> > specified by the syntax that don't use too much CPU, like substring
> > matching (or, rather, regex matching), size comparison, hash
> > comparison.
> > what about:
> N for name of the file
> S for size of the file
> H for hash of the file
> T for type of the file
> > with this it is at least possible to build big query like
> ( N=~avi$ || N=~ogm$) && (T==video) && (S>450000000)

Just to clarify a small detail, I was thinking to let the protocol do
the word splitting, which more or less eliminates the need for a lexer
in the expression scanner.  So that query IMHO should look like this

( N =~ avi$ || N =~ ogm$ ) && ( T =~ ^video/ ) && ( S > 450000000 )

All you have to do then is just token classifying before handing them
to the parser, which IMHO is better than to do word splitting
twice. Also, since we probably won't need many operators, we can
probably short them down to one character each (=~ : ~, || : |,
etc.). In any case, having a complex syntax isn't really such a great
problem since it doesn't need to be parsed that often. The problem, if
anything, is to make a good, optimized search algorithm to deal with
the parsed expressions. Oh, and if anyone doubt the efficiency of
regexes, please don't. They're really fast.

So, anymore opinions on this except just me and eric here?

Fredrik Tolf

DC Developers mailinglist