Please have a look at this tutorial in case you’re interested in scriptutil usage examples.
mergesort_function now does not use the haskell list
lengthoperator any more. Instead, the length of the list to be sorted is passed down the recursive chain.
Please see the optimised mergesort implementation for details.
These improvements reduced the RAM utilisation and improved the run-time performance by another 20% respectively.
Nothing to scoff at, eh?
After my (naive) mergesort implementation from yesterday used around 730 MB of RAM to sort a (26 MB) file containing approx. 400,000 strings I consulted the good folks on the
#haskell IRC channel.
Their advice was to use byte strings as opposed to normal strings since the former perform much better.
I tried that and observed that the RAM utilisation and run-time went down by approximately 85% !
The reduced RAM utilisation was attributed to the more efficient byte strings and the improved run-time performance to the reduced garbage collection overhead respectively.
I have been reading about haskell for a while now and felt that it’s time to “get my hands dirty” by (re-)implementing some of the well known algorithms in computer science. For my first exercise I chose merge sort.
The resulting code (sans command line parameter handling, auxiliary I/O functions etc.) is quite neat and was written in little time. Haskell is very expressive and its attractiveness immediately obvious when formulating pseudo-mathematical problem solutions.
1 module Main(main) where 2 3 import qualified IO 4 import System(getArgs) 5 import Monad(mapM_) 6 import qualified Scaffolding 7 8 mergesort l = 9 if (length l) <= 1 then l -- The list is already sorted. 10 else -- Split the list into two halves and sort these. 11 merge (mergesort lpart) (mergesort rpart)  12 where llen_half = (length l) `div` 2 13 (lpart, rpart) = splitAt llen_half l 14 15 -- Case #1: the right list is empty; just append the left list to the 16 -- accumulator. The latter is reversed because we were prepending 17 -- to it in case #3. 18 merge lpart  acc = (reverse acc) ++ lpart 19 20 -- Case #2: the left list is empty; just append the right list to the 21 -- (reversed) accumulator list. 22 merge  rpart acc = (reverse acc) ++ rpart 23 24 -- Case #3: neither of the left/right lists to be merged is empty; 25 -- prepend the lesser head element to the accumulator list. 26 merge (l:ls) (r:rs) acc = 27 if l <= r then merge ls (r:rs) (l:acc) 28 else merge (l:ls) rs (r:acc) 29 30 -- The main method, handles command line arguments, input and output. 31 main = do 32 args <- getArgs 33 fileh <- case head args of 34 -- The text to be sorted is to be read from stdin. 35 "-" -> do return (Just IO.stdin) 36 -- The text to be sorted is to be read from the file 37 -- specified. 38 "-f" -> do Scaffolding.openFile (head (tail args)) 39 -- Just sort the command line parameters. 40 _ -> do let sorted_args = mergesort args 41 IO.putStrLn ("mergesort(" ++ show args ++ 42 ") = " ++ show sorted_args) 43 return Nothing 44 case fileh of 45 -- Read text from file handle, sort it and print to stdout. 46 Just fileh -> do text_to_sort <- Scaffolding.readLines fileh  47 mapM_ IO.putStrLn (mergesort text_to_sort) 48 -- We failed to open the file specified (if any). 49 Nothing -> do return ()
I also liked the fact that the haskell standard library is rich and nicely documented.
Just to get an idea of how my naive
mergesort implementation performs I created a file listing all the files on my Dell D630 laptop running Hardy Heron (please ignore the recursive nature of this problem 🙂 ).
1 u804: haskell $ ghc --version 2 The Glorious Glasgow Haskell Compilation System, version 6.8.2 3 u804: haskell $ ls -l all-files.txt 4 -rw-r--r-- 1 mhr mhr 26245382 2008-05-31 17:18 all-files.txt 5 u804: haskell $ wc -l all-files.txt 6 399754 all-files.txt 7 u804: haskell $ uname -a 8 Linux u804 2.6.24-18-generic #1 SMP Wed May 28 20:27:26 UTC 2008 i686 GNU/Linux
I then let loose the compiled haskell binary on that file (
all-files.txt) and compared how it performed against
1 u804: haskell $ time ./mergesort -f all-files.txt >/dev/null 2 3 real 0m12.994s 4 user 0m12.349s 5 sys 0m0.632s 6 u804: haskell $ time sort all-files.txt >/dev/null 7 8 real 0m10.675s 9 user 0m10.613s 10 sys 0m0.052s 11 u804: haskell $ time sort all-files.txt >/dev/null
The difference in performance was moderate (22%) given that I did not tune the haskell implementation in any way.
However, I gasped when I observed the RAM utilisation while the haskell program was running: it used 726 MB of RAM!
Compare this to 34 MB that were used by the standard
I guess it’s time to get acquainted with
ghc -prof and friends 🙂
Last but not least I’d like to point to a great haskell resource, the Real World Haskell beta book.