Re: [dcdev] Searching
Carl-Adam Brengesjö <[email protected]>
2004-01-16 3:39
Direct Connect developers

Todd Pederzani wrote:

Carl-Adam Brengesjö wrote:

Pattern can also have two meanings, either it can be a regex (poisix or perl, doesnt matter) or it is the exact (or wildcarded) name of the file. Clients can decide, upon submitting, if they want to replace spaces with a * to work the way searching currently does.

[2004-01-15 14:51] <nysin> (Might someone suggest on dcdev-list that these people try some sample queries in their regex syntax on a largeish share?

If any of you need it, I have several large filelists (in bz2 format) that I passed along to cologic (nysin).  One in particular is 493gb and 172,000 files.  I think modeling a typical search load on this share might quickly tell if regexp searching is feasible.

A previous discussion on regexps also yielded concerns about... ahem... resource intensive expressions.  Nothing quite like being able to DOS all the clients in a hub simply by sending a well crafted regexp search.

Hm, 439GB ... what is that guy sharing exactly? Piracy is a crime...
no, silly answer :)

Any form of a more advanced query requires some time for execution (more or less). So how much time requires the query type suggested (fredrik was it, no?) ?
I'm not trying to promote regex now, just wanna know.

I can make a test with some various filelists. (C#, don't know C/C++ that good) Results coming up in a few days (or hours, depending how little I plan to sleep this night ;)

a part of a WinRAR packed file name "foo" with exact size 15000000 bytes
"SEARCH 03 text/* =15000000 :/^foo\.part[0-9]{2,3}\.rar$/\r\n"

Ah yes.  A wonderful example of why searching by MIME types is not so useful.  What exactly is the proper type for RAR files?
> application/rar ?
> application/x-rar-compressed ?
> How about .MKV?  OGM?  OGG?

sorry about that one, it was a really bad example, I know. I was in a hurry and didn't much bother to check what I had written :(

check this out:

a more 'readable' list or mimetypes have been put together here:

This is specified in RFC2045 and RFC2056 (as stated in the latter link)

These lists ARE standard. Unfortunatly, rar is not listed. And as so we can decide it (if we want to), as long the decition corresponds to the RFC's. And thus it can be, as Todd wrote, application/x-rar-compressed (RFC2046, page 9).
I say (if we use mimetypes) we use strictly standard, meaning if a client searches for "application/x-rar" - well bad for him if he dont get any results.
RFC2046 specifies that application/octet-stream should be used with "uninterpreted binary data". And doesn't rar files come under that category? :) If the user really wants to search for .rar files, well use the filename pattern instead!


DC Developers mailinglist