Re: [dcdev] file list, regexp, and mailing list
Fredrik Tolf
2004-01-23 3:36
Direct Connect developers

Opera writes:
> About the file list format; One can discuss the best compression ratios
> back and forth (I personally think .xml.bz2 would be perfectly fine and
> small enough, and prolly a lot smaller than ebml), but it doesn't really
> matter. The most important issue (partly since bz2 _is_ a good
> compressor)  is usability, maintainability and through-out support for
> the format. In that case, nothing beats XML. Also there are _several_
> libraries for both zlib, bzip2 and a h*ll lot of libraries for xml, for
> any operating system, in any programming language, and even for
> web-things such as php etc...

I'm all for .xml.bz2. I don't even see a reason for a binary file
list. If it truly is smaller, it won't be more than a few bytes
considering the bzip2 compression of XML. Therefore, I agree with XML
since it's much more accepted everywhere.

> About regexp library choise, I'd say the support for wide charsets
> should not only be considered, but required. Regex++ supports it, that's
> all I know for now.

Indeed, it should be that way. However, it's not usually a
problem. I'm not sure how Windows works in this area, but on *ix
systems, filenames are still stored as 8-bit byte strings, encoded
using the character set of the current locale. Therefore, when a regex
comes in on the protocol with UTF-8, and it cannot be converted into a
multi-byte string in the current locale's charset, that would
constitute an automatic false expression, since if the regex contains
characters that aren't in the locale's charset, then no filenames can
exist which contains those characters anyway.

Fredrik Tolf

DC Developers mailinglist