My new favourite book

I was attending the UDS in Mountain View last week and it proved to be one of those fascinating albeit somewhat exhausting events (please see either of these resources for UDS reports and commentary).

Anyway, while being there I managed to get my hands on a paper copy of Real World Haskell. Being busy with the UDS I only started reading it on the plane back to Frankfurt. Despite being very, very tired I enjoyed it thoroughly and have to say it’s one of the best technical books I have ever perused.

All successful (technical) projects have a vibrant community and great documentation. This book makes Haskell so accessible, it may well be the last bit that’s needed for a great break-through for Haskell and functional programming in general!

Whether you are interested in functional programming or merely seeking to broaden your horizon, don’t delay, go out and grab a copy of this book. You won’t be disappointed.


Wrap-up: mergesort in haskell

I have to admit that I didn’t fully understand the apfelmus example code. Nevertheless, I made an effort to address both of his criticisms:

  1. The merge() function does not use an accumulator argument any more and is indeed much simpler now.
  2. The recursive mergesort_ function now does not use the haskell list length operator any more. Instead, the length of the list to be sorted is passed down the recursive chain.

Please see the optimised mergesort implementation for details.

These improvements reduced the RAM utilisation and improved the run-time performance by another 20% respectively.

Nothing to scoff at, eh?

Haskell byte strings to the rescue!

After my (naive) mergesort implementation from yesterday used around 730 MB of RAM to sort a (26 MB) file containing approx. 400,000 strings I consulted the good folks on the #haskell IRC channel.

Their advice was to use byte strings as opposed to normal strings since the former perform much better.

I tried that and observed that the RAM utilisation and run-time went down by approximately 85% !

The reduced RAM utilisation was attributed to the more efficient byte strings and the improved run-time performance to the reduced garbage collection overhead respectively.

The difference between the source files (mergesort.hs, Scaffolding.hs) is minimal and switching over to byte strings was facilitated by the fact that they expose the same interface as normal strings.


Laziness is a virtue (in programming)


One of the things I really appreciate about Python is that it allows me to explore different programming styles. When I started with Python (coming from a C/C++/Java background) I would mostly use procedural style constructs.

These days, however, I am usually inclining towards a more functional pogramming style.

One of the key books that swayed me towards the latter was Mark Pilgrim’s “Dive Into Python”, one of the best (technical) books I have read so far. If you haven’t come across it yet, you should definitely take a look at it.

Another Python gem you may want to check out while experimenting with functional programming ideas is the itertools module.


In the brief section of code that follows below I am contrasting the procedural versus the functional pogramming style through an example in which I am interested in obtaining the first in a (potentially large) range of numbers whose 8th root is an integer.

  1 from math import (pow, modf)
  2 from itertools import dropwhile
  4 def calculate(x):
  5     """calculates the 8th root of the number 'x'"""
  6     return(pow(x, 0.125))
  8 def wanted(v):
  9     """integer check: returns 'True' if fractional part of a number is 0"""
 10     return(not modf(v)[0])
 12 def unwanted(v):
 13     """returns 'False' if fractional part of a number is > 0"""
 14     return(modf(v)[0])

The lines 16-25 show a procedural style function using a loop and related control flow constructs to get the job done.

 16 def proceduralStyle(f, t):
 17     """
 18     finds the first in the range of numbers [f..t[ whose 8th root is an
 19     integer; procedural style
 20     """
 21     for x in xrange(f, t):
 22         v = calculate(x)
 23         if wanted(v): break
 24     else: v = None
 25     return(v)

The functional style solution (see lines 27-34 below) uses the dropwhile() method (from the itertools module) to get rid of any unwanted initial values (line 32).

Please note also how a generator expression (as opposed to normal list comprehension) is used (on line 32) to achieve lazy evaluation behaviour.

The syntactical differences are minute (parentheses versus square brackets) but the effects are huge.

A normal list comprehension will construct the entire list (and perform any calculations needed in the process) before completing. A generator expression on the other hand is comparable to an iterator, values are returned in piecemeal fashion and calculations are performed only as needed.

 27 def functionalStyle(f, t):
 28     """
 29     finds the first in the range of numbers [f..t[ whose 8th root is an
 30     integer; functional style
 31     """
 32     r8iter = dropwhile(unwanted, (calculate(x) for x in xrange(f, t)))
 33     try: return(
 34     except StopIteration: return(None)

Finally, by experimenting with the source code we can see that lazy evaluation is in effect for the functionalStyle() function. It returns straightaway with the result anticipated (256 is the first number in the range whose 8th root is an integer (2)).

  1 bbox33:published $ python
  2 Python 2.4.4 (#1, May 22 2007, 13:30:14)
  3 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  4 Type "help", "copyright", "credits" or "license" for more information.
  5  >>> import evaltest2
  6  >>> from time import asctime as t
  7  >>> print t(); evaltest2.functionalStyle(2, 50000000); print t()
  8 Sun May 27 10:33:18 2007
  9 2.0
 10 Sun May 27 10:33:18 2007
 11  >>> print t(); evaltest2.proceduralStyle(2, 50000000); print t()
 12 Sun May 27 10:33:30 2007
 13 2.0
 14 Sun May 27 10:33:30 2007