I have been reading about haskell for a while now and felt that it’s time to “get my hands dirty” by (re-)implementing some of the well known algorithms in computer science. For my first exercise I chose merge sort.
The resulting code (sans command line parameter handling, auxiliary I/O functions etc.) is quite neat and was written in little time. Haskell is very expressive and its attractiveness immediately obvious when formulating pseudo-mathematical problem solutions.
1 module Main(main) where
2
3 import qualified IO
4 import System(getArgs)
5 import Monad(mapM_)
6 import qualified Scaffolding
7
8 mergesort l =
9 if (length l) <= 1 then l -- The list is already sorted.
10 else -- Split the list into two halves and sort these.
11 merge (mergesort lpart) (mergesort rpart) []
12 where llen_half = (length l) `div` 2
13 (lpart, rpart) = splitAt llen_half l
14
15 -- Case #1: the right list is empty; just append the left list to the
16 -- accumulator. The latter is reversed because we were prepending
17 -- to it in case #3.
18 merge lpart [] acc = (reverse acc) ++ lpart
19
20 -- Case #2: the left list is empty; just append the right list to the
21 -- (reversed) accumulator list.
22 merge [] rpart acc = (reverse acc) ++ rpart
23
24 -- Case #3: neither of the left/right lists to be merged is empty;
25 -- prepend the lesser head element to the accumulator list.
26 merge (l:ls) (r:rs) acc =
27 if l <= r then merge ls (r:rs) (l:acc)
28 else merge (l:ls) rs (r:acc)
29
30 -- The main method, handles command line arguments, input and output.
31 main = do
32 args <- getArgs
33 fileh <- case head args of
34 -- The text to be sorted is to be read from stdin.
35 "-" -> do return (Just IO.stdin)
36 -- The text to be sorted is to be read from the file
37 -- specified.
38 "-f" -> do Scaffolding.openFile (head (tail args))
39 -- Just sort the command line parameters.
40 _ -> do let sorted_args = mergesort args
41 IO.putStrLn ("mergesort(" ++ show args ++
42 ") = " ++ show sorted_args)
43 return Nothing
44 case fileh of
45 -- Read text from file handle, sort it and print to stdout.
46 Just fileh -> do text_to_sort <- Scaffolding.readLines fileh []
47 mapM_ IO.putStrLn (mergesort text_to_sort)
48 -- We failed to open the file specified (if any).
49 Nothing -> do return ()
I also liked the fact that the haskell standard library is rich and nicely documented.
Just to get an idea of how my naive mergesort implementation performs I created a file listing all the files on my Dell D630 laptop running Hardy Heron (please ignore the recursive nature of this problem
).
1 u804: haskell $ ghc --version
2 The Glorious Glasgow Haskell Compilation System, version 6.8.2
3 u804: haskell $ ls -l all-files.txt
4 -rw-r--r-- 1 mhr mhr 26245382 2008-05-31 17:18 all-files.txt
5 u804: haskell $ wc -l all-files.txt
6 399754 all-files.txt
7 u804: haskell $ uname -a
8 Linux u804 2.6.24-18-generic #1 SMP Wed May 28 20:27:26 UTC 2008 i686 GNU/Linux
I then let loose the compiled haskell binary on that file (all-files.txt) and compared how it performed against /usr/bin/sort.
1 u804: haskell $ time ./mergesort -f all-files.txt >/dev/null
2
3 real 0m12.994s
4 user 0m12.349s
5 sys 0m0.632s
6 u804: haskell $ time sort all-files.txt >/dev/null
7
8 real 0m10.675s
9 user 0m10.613s
10 sys 0m0.052s
11 u804: haskell $ time sort all-files.txt >/dev/null
The difference in performance was moderate (22%) given that I did not tune the haskell implementation in any way.
However, I gasped when I observed the RAM utilisation while the haskell program was running: it used 726 MB of RAM!
Compare this to 34 MB that were used by the standard sort program.
I guess it’s time to get acquainted with ghc -prof and friends
Last but not least I’d like to point to a great haskell resource, the Real World Haskell beta book.