Finger exercises in haskell

I have been reading about haskell for a while now and felt that it’s time to “get my hands dirty” by (re-)implementing some of the well known algorithms in computer science. For my first exercise I chose merge sort.

The resulting code (sans command line parameter handling, auxiliary I/O functions etc.) is quite neat and was written in little time. Haskell is very expressive and its attractiveness immediately obvious when formulating pseudo-mathematical problem solutions.

  1 module Main(main) where
  2 
  3 import qualified IO
  4 import System(getArgs)
  5 import Monad(mapM_)
  6 import qualified Scaffolding
  7 
  8 mergesort l =
  9     if (length l) <= 1 then l -- The list is already sorted.
 10     else -- Split the list into two halves and sort these.
 11         merge (mergesort lpart) (mergesort rpart) []
 12         where llen_half = (length l) `div` 2
 13               (lpart, rpart) = splitAt llen_half l
 14 
 15 -- Case #1: the right list is empty; just append the left list to the
 16 -- accumulator. The latter is reversed because we were prepending
 17 -- to it in case #3.
 18 merge lpart [] acc = (reverse acc) ++ lpart
 19 
 20 -- Case #2: the left list is empty; just append the right list to the
 21 -- (reversed) accumulator list.
 22 merge [] rpart acc = (reverse acc) ++ rpart
 23 
 24 -- Case #3: neither of the left/right lists to be merged is empty;
 25 -- prepend the lesser head element to the accumulator list.
 26 merge (l:ls) (r:rs) acc =
 27     if l <= r then merge ls (r:rs) (l:acc)
 28     else merge (l:ls) rs (r:acc)
 29 
 30 -- The main method, handles command line arguments, input and output.
 31 main = do
 32     args <- getArgs
 33     fileh <- case head args of
 34                 -- The text to be sorted is to be read from stdin.
 35                 "-" -> do return (Just IO.stdin)
 36                 -- The text to be sorted is to be read from the file
 37                 -- specified.
 38                 "-f" -> do Scaffolding.openFile (head (tail args))
 39                 -- Just sort the command line parameters.
 40                 _ -> do let sorted_args = mergesort args
 41                         IO.putStrLn ("mergesort(" ++ show args ++
 42                                      ") = " ++ show sorted_args)
 43                         return Nothing
 44     case fileh of
 45         -- Read text from file handle, sort it and print to stdout.
 46         Just fileh -> do text_to_sort <- Scaffolding.readLines fileh []
 47                          mapM_ IO.putStrLn (mergesort text_to_sort)
 48         -- We failed to open the file specified (if any).
 49         Nothing -> do return ()

I also liked the fact that the haskell standard library is rich and nicely documented.

Just to get an idea of how my naive mergesort implementation performs I created a file listing all the files on my Dell D630 laptop running Hardy Heron (please ignore the recursive nature of this problem :-) ).

 1 u804: haskell $ ghc --version
 2 The Glorious Glasgow Haskell Compilation System, version 6.8.2
 3 u804: haskell $ ls -l all-files.txt
 4 -rw-r--r-- 1 mhr mhr 26245382 2008-05-31 17:18 all-files.txt
 5 u804: haskell $ wc -l all-files.txt
 6 399754 all-files.txt
 7 u804: haskell $ uname -a
 8 Linux u804 2.6.24-18-generic #1 SMP Wed May 28 20:27:26 UTC 2008 i686 GNU/Linux

I then let loose the compiled haskell binary on that file (all-files.txt) and compared how it performed against /usr/bin/sort.

  1 u804: haskell $ time ./mergesort -f all-files.txt >/dev/null
  2 
  3 real    0m12.994s
  4 user    0m12.349s
  5 sys 0m0.632s
  6 u804: haskell $ time sort all-files.txt >/dev/null
  7 
  8 real    0m10.675s
  9 user    0m10.613s
 10 sys 0m0.052s
 11 u804: haskell $ time sort all-files.txt >/dev/null

The difference in performance was moderate (22%) given that I did not tune the haskell implementation in any way.

However, I gasped when I observed the RAM utilisation while the haskell program was running: it used 726 MB of RAM!

Compare this to 34 MB that were used by the standard sort program.

I guess it’s time to get acquainted with ghc -prof and friends :-)

Last but not least I’d like to point to a great haskell resource, the Real World Haskell beta book.

About these ads

8 thoughts on “Finger exercises in haskell

  1. After an interesting discussion with the nice folks on #haskell I was able to reduce the run time and the RAM usage by 84% .. I will be posting a follow-up article shortly. Stay tuned!

  2. I am puzzled as to why your merge function includes an accumulating parameter, it’s easier and faster without.

    Furthemore, partitioning the list “top-down” via and (length l) `div` 2 and splitAt traverses the list two times, one for length and one for splitAt. It’s possible to do that “bottum-up” with only one traversal per recursion level.

    See Re: Quicksearch VS Laziness for an example that incorporates both.

  3. Hello everyone…

    Have seen both implementations, actually they are both very nice, but there is something that I quite not understand in the apfelmus implementation… what should be this `return` function you are using in this piece of code:

    mergesort [] = []
    mergesort xs = foldtree1 merge $ map return xs

    When I try to use it in a normal function (just testing) it get me a wrong use, because return is reserved for Monads (See http://pastie.org/212186 for the whole interaction)

    Best Regards.

  4. Hello Román,

    I have to admit that I didn’t fully understand the apfelmus code either. However, in my most recent mergesort version (http://hrnjad.net/src/s/optimized-mergesort.hs.html) I addressed both of his criticisms:

    1 – the merge() function does not use an accumulator argument any more
    2 – the mergesort() function now does not use (length list) any more. Instead the length of the list is passed down the recursive chain.

    These improvements lead to another 20% reduction in RAM utilisation and run-time performance improvement respectively.

  5. Hey Roman-

    The return in

    mergesort xs = foldtree1 merge $ map return xs

    creates a one element list for each element in the initial list. It is being used “monadically” here, see

    http://www.haskell.org/all_about_monads/html/listmonad.html

    for details. The reason you got an error when you tried to use it in ghci is that the interpreter had no context to figure out which monad the result of return should be in. In the above line there is enough context for the compiler to figure out that the result of return should be in the list monad (i.e. a list).

  6. Ah, sorry about the return. It’s the same as the “smiley function” :[] or the less cryptical \x->[x], it simply creates a list of one element.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s