Todd Pederzani wrote:
Carl-Adam Brengesjö wrote:
Pattern can also have two meanings, either it can be a regex (poisix
or perl, doesnt matter) or it is the exact (or wildcarded) name of the
file. Clients can decide, upon submitting, if they want to replace
spaces with a * to work the way searching currently does.
[2004-01-15 14:51] <nysin> (Might someone suggest on dcdev-list that
these people try some sample queries in their regex syntax on a largeish
If any of you need it, I have several large filelists (in bz2 format)
that I passed along to cologic (nysin). One in particular is 493gb and
172,000 files. I think modeling a typical search load on this share
might quickly tell if regexp searching is feasible.
A previous discussion on regexps also yielded concerns about... ahem...
resource intensive expressions. Nothing quite like being able to DOS
all the clients in a hub simply by sending a well crafted regexp search.
Hm, 439GB ... what is that guy sharing exactly? Piracy is a crime...
no, silly answer :)
Any form of a more advanced query requires some time for execution (more
or less). So how much time requires the query type suggested (fredrik
was it, no?) ?
I'm not trying to promote regex now, just wanna know.
I can make a test with some various filelists. (C#, don't know C/C++
that good) Results coming up in a few days (or hours, depending how
little I plan to sleep this night ;)
a part of a WinRAR packed file name "foo" with exact size 15000000 bytes
"SEARCH 126.96.36.199:412 03 text/* =15000000
Ah yes. A wonderful example of why searching by MIME types is not so
useful. What exactly is the proper type for RAR files?
> application/rar ?
> application/x-rar-compressed ?
> How about .MKV? OGM? OGG?
sorry about that one, it was a really bad example, I know. I was in a
hurry and didn't much bother to check what I had written :(
check this out:
a more 'readable' list or mimetypes have been put together here:
This is specified in RFC2045 and RFC2056 (as stated in the latter link)
These lists ARE standard. Unfortunatly, rar is not listed. And as so we
can decide it (if we want to), as long the decition corresponds to the
RFC's. And thus it can be, as Todd wrote, application/x-rar-compressed
(RFC2046, page 9).
I say (if we use mimetypes) we use strictly standard, meaning if a
client searches for "application/x-rar" - well bad for him if he dont
get any results.
RFC2046 specifies that application/octet-stream should be used with
"uninterpreted binary data". And doesn't rar files come under that
If the user really wants to search for .rar files, well use
the filename pattern instead!