Ämne:
[dcdev] Re: [dddev] Searching
Från:
Carl-Adam Brengesjö <[email protected]>
Datum:
2004-01-16 9:43
Till:
Direct Connect developers


Made a test for regex matching. Source code, filelists and binaries used are attached to this mail. If that doesnt work (don't know if attaching files on this mailing list works) they can be downloaded from <http://ptha.mine.nu/~ptha/regextest.tar.bz2>. Be nice on the server though, its hosted on my personal home 0.5Mbit connection...

There is no cpu usage limit, so it will go up in the top. If you want to implent one, please do and mail the results.

I decided to first read the file and add the entries (each line) in memory, then loop the memory and match the regex. Don't know if clients do that or not, but if you want to read and match on the fly - the source is free to use.

The tests were made on a 2.18TB share (named huge), a 669.58GB share (large) and a 34.31GB share (small).

The machines used are
 *nix:
   Intel Pentium 2, 333MHz. 192 MB SDRAM. Slackware 9.1 (Linux 2.4.22)
 windows:
   Intel Celeron, 2GHz. 768MB DDR-RAM. Windows XP, SP1.


There is another .NET library for *nix - dotGnu, but I don't have it installed.

Anyways - now I have atleast done some /real/ job with this, and now (finnally) going to bed! Juding is a job I leave for you.

this tool is really slow though, using .NET and all.. but I don't know any other language so well (newb ;)


---- SMALL (*nix) ----
$ mono RegexTest.exe small.bz2 ".*microsoft.*"
      file: small.bz2
   pattern: .*microsoft.*
Begin decompression... (`bzip2 -dc "small.bz2"')OK!
Reading...OK! reading took 0.326057 seconds.
Beginning regex test of against lines in memory (854 lines to test)
Test completed. 0 matches where found.
The search took 0.202791 seconds!

---- SMALL (windows) ----
>RegexTest.exe small.txt ".*microsoft.*"
      file: small.txt
   pattern: .*microsoft.*
Reading... 854 lines read
OK! reading took 0,09375 seconds.
Beginning regex test of against lines in memory (854 lines to test)
Test completed. 0 matches where found.
The search took 0 seconds!

---- LARGE (*nix) ----
$ mono RegexTest.exe large.bz2 ".*microsoft.*"
      file: large.bz2
   pattern: .*microsoft.*
Begin decompression... (`bzip2 -dc "large.bz2"')OK!
Reading...OK! reading took 9.072878 seconds.
Beginning regex test of against lines in memory (28453 lines to test)
Test completed. 1 matches where found.
The search took 19.938845 seconds!

---- LARGE (windows) ----
>RegexTest.exe large.txt ".*microsoft.*"
      file: large.txt
   pattern: .*microsoft.*
Reading... 28453 lines read
OK! reading took 3,0625 seconds.
Beginning regex test of against lines in memory (28453 lines to test)
Test completed. 1 matches where found.
The search took 1,3125 seconds!

---- HUGE (*nix) ----
$ mono RegexTest.exe huge.bz2 ".*microsoft.*"
      file: huge.bz2
   pattern: .*microsoft.*
Begin decompression... (`bzip2 -dc "huge.bz2"')OK!
Reading...OK! reading took 36.705775 seconds.
Beginning regex test of against lines in memory (121028 lines to test)
Test completed. 76 matches where found.
The search took 45.15112 seconds!

---- HUGE (windows) ----
>RegexTest.exe huge.txt ".*microsoft.*"
      file: huge.txt
   pattern: .*microsoft.*
Reading... 121028 lines read
OK! reading took 13,25 seconds.
Beginning regex test of against lines in memory (121028 lines to test)
Test completed. 76 matches where found.
The search took 12,484375 seconds!

/Carl-Adam

ps. reading the files/streams are _really_ slow.. "Men orka!" as we say in swedish.

-- 
DC Developers mailinglist
http://3jane.ashpool.org/cgi-bin/mailman/listinfo/dcdev