Ämne:
Re: [dcdev] adc
Från:
Carl-Adam Brengesjö <[email protected]>
Datum:
2004-01-22 7:37
Till:
Direct Connect developers


As I demonstrated above, not without special support from the RE engine;
that 24-way or'd expression is going to be noticeably slower than the
semantically equivalent multiple substring search. The Perl5 method should
be similar in speed, but requires an NFA RE implementation exhibiting
unbounded-time behaviour and thus potentially vulnerable to the attacks
described above.


There is a little flaw in your demonstration. You have assumed when a client searches for "a b c d", this means "a" & "b" & "c" & "d" but it is wrong. Speaking about probability, a.*b.*c.*d is probably what the user wants the most. For example, if a user searches for an album, he won't spend time to write the album title in the reverse order. That's why most of the time, "a b c d" means "a.*b.*c.*d" (when it is not exactly "a b c d"). Any user having enough experience in any search engine (even google) knows he should not use some simple words like "a", "the", "is" and to search for "heaven is a place on earth", he will search "heaven place earth" (and it is even shorter to write :) ).

Moreover, both substring and regex searches can be supported. Why arguing? It's not a question wether to use regex or substring, but if to support regex at all.
Like I said in my early mail, simply add a / (0x2F) on both sides of the regex. Reason? It's a magic char used in filesystems. UNIX uses it, windows supports it. So you can't use in a filename pattern. And if you do, the client simply escapes it before sending. And if you want to search for directories, tell it by using filetype/category.

Regex is good for advanced searches; substring is enough for the basic user. Both can be supported. Support it! :p

And for resource limitations (resources = bandwidth,cpu,memory.. whatever), the client would simply refuse to perform the search if it gets too heavy (decided on factors that the client are to decide, not part of the actual protocol). And I belive that everyone agrees that a client refusing to perform the search due to resource limits is a fully acceptable reason.

/Carl-Adam

-- 
DC Developers mailinglist
http://3jane.ashpool.org/cgi-bin/mailman/listinfo/dcdev