Ämne: Re: [dcdev] file list, regexp, and mailing list |
Från: eric |
Datum: 2004-01-23 3:49 |
Till: Direct Connect developers <[email protected]>, Fredrik Tolf <[email protected]> |
I'm all for .xml.bz2. I don't even see a reason for a binary file
list. If it truly is smaller, it won't be more than a few bytes
considering the bzip2 compression of XML. Therefore, I agree with XML
since it's much more accepted everywhere.
> About regexp library choise, I'd say the support for wide charsets
> should not only be considered, but required. Regex++ supports it, that's
> all I know for now.
Indeed, it should be that way. However, it's not usually a
problem. I'm not sure how Windows works in this area, but on *ix
systems, filenames are still stored as 8-bit byte strings, encoded
using the character set of the current locale. Therefore, when a regex
comes in on the protocol with UTF-8, and it cannot be converted into a
multi-byte string in the current locale's charset, that would
constitute an automatic false expression, since if the regex contains
characters that aren't in the locale's charset, then no filenames can
exist which contains those characters anyway.