Handy bash function for merge conflicts

June 26, 2009 by muharem

I have been merging packages from debian unstable to ubuntu karmic lately and every so often I need to look at a merge conflict.

Sometimes it’s not so obvious why a conflict occurred at all. Here’s an example conflict from the libsepol package:

  1 <<<<<<< libsepol-2.0.32-1ubuntu1 (ubuntu)
  3     /* Get the scope info for this boolean to see if this is the declaration,
  4      * if so set the state */
  5     scope = hashtab_search(state->cur->policy->p_bools_scope.table, id);
  6     if (!scope)
  7         return SEPOL_ERR;
  8     if (scope->scope == SCOPE_DECL)
  9         base_bool->state = booldatum->state;
 10 
 11 =======
 13     /* Get the scope info for this boolean to see if this is the declaration,
 14      * if so set the state */
 15     scope = hashtab_search(state->cur->policy->p_bools_scope.table, id);
 16     if (!scope)
 17         return SEPOL_ERR;
 18     if (scope->scope == SCOPE_DECL)
 19         base_bool->state = booldatum->state;
 20 
 21 >>>>>>> libsepol-2.0.36-1 (debian)

In order to avoid staring at the screen until my eyes start to bleed :) I came up with the following bash function:

  1 function dsec() {
  2     xclip -selection clipboard -o > /tmp/patch-seg-to-diff.txt
  3     rm -f /tmp/ubuntu-snippet.txt /tmp/debian-snippet.txt
  4     ed /tmp/patch-seg-to-diff.txt > /dev/null 2>&1 <<-EOUBUNTU
  5         1 d
  6         /^===/,$ d
  7         wq /tmp/ubuntu-snippet.txt
  8 EOUBUNTU
  9     ed /tmp/patch-seg-to-diff.txt > /dev/null 2>&1 <<-EODEBIAN
 10         1,/^===/d
 11         $ d
 12         wq /tmp/debian-snippet.txt
 13 EODEBIAN
 14     echo 'diff -Nru /tmp/ubuntu-snippet.txt /tmp/debian-snippet.txt | colordiff | less'
 15     diff -Nru /tmp/ubuntu-snippet.txt /tmp/debian-snippet.txt | colordiff | less
 16 }

I can now use my favourite editor to copy the conflicting code to the clipboard. When invoked on the command line the dsec function reads the clipboard content and chops it apart (using the ed utility) into a debian and an ubuntu snippet respectively. Subsequently it invokes the diff utility on the two snippets showing me how they differ.

 1 --- /tmp/ubuntu-snippet.txt 2009-06-26 12:42:04.000000000 +0200
 2 +++ /tmp/debian-snippet.txt 2009-06-26 12:42:04.000000000 +0200
 3 @@ -1,5 +1,5 @@
 4     /* Get the scope info for this boolean to see if this is the declaration,
 5 -    * if so set the state */
 6 +    * if so set the state */
 7     scope = hashtab_search(state->cur->policy->p_bools_scope.table, id);
 8     if (!scope)
 9         return SEPOL_ERR;

Hmm .. this looks like a white space issue. Since the dsec function printed the actual diff command to the terminal, I can copy and paste that to the command line and add the -b parameter so that the diff utility ignores changes in the amount of white space and ..

 1 $ diff -Nrub /tmp/ubuntu-snippet.txt /tmp/debian-snippet.txt
 2 $ echo $?
 3 0

.. voila! the diff output this time around is empty.

In order for this to work the utilities used must be installed, so e.g. on a ubuntu system run:

sudo apt-get install xclip colordiff

And .. don't forget to have fun :)

My new favourite book

December 18, 2008 by muharem

I was attending the UDS in Mountain View last week and it proved to be one of those fascinating albeit somewhat exhausting events (please see either of these resources for UDS reports and commentary).

Anyway, while being there I managed to get my hands on a paper copy of Real World Haskell. Being busy with the UDS I only started reading it on the plane back to Frankfurt. Despite being very, very tired I enjoyed it thoroughly and have to say it’s one of the best technical books I have ever perused.

All successful (technical) projects have a vibrant community and great documentation. This book makes Haskell so accessible, it may well be the last bit that’s needed for a great break-through for Haskell and functional programming in general!

Whether you are interested in functional programming or merely seeking to broaden your horizon, don’t delay, go out and grab a copy of this book. You won’t be disappointed.

This *is* much better indeed :)

December 4, 2008 by muharem

My moaning about Python(2) was apparently “snow of yesterday”. Python3 just came out featuring extended iterable unpacking.

  1 Python 3.0rc1+ (py3k, Oct 28 2008, 09:22:29)
  2 [GCC 4.3.2] on linux2
  3 Type "help", "copyright", "credits" or "license" for more information.
  4  >>> def f(a, b, c):
  5  ...   print('%s, %s, %s' % (a, b, c))
  6 ...
  7  >>> f(1,2,3)
  8 1, 2, 3
  9  >>> f(1,*(2,3))
 10 1, 2, 3
 11  >>> f(*(1,2),3)
 12   File "<stdin>", line 1
 13 SyntaxError: only named arguments may follow *expression
 14  >>> f(*(1,2),c=3)
 15 1, 2, 3

Hmm, not ideal but it does the job :-)

Why is consistency so difficult to achieve?

December 4, 2008 by muharem
  1 Python 2.5.2 (r252:60911, Oct  5 2008, 19:29:17)
  2 [GCC 4.3.2] on linux2
  3 Type "help", "copyright", "credits" or "license" for more information.
  4 
  5  >>> def f(*args):
  6  ...   print ' '.join([str(a) for a in args])
  7 ...
  8  >>> f(1,2,3)
  9 1 2 3
 10  >>> f(1,*(2,3))
 11 1 2 3
 12  >>> f(*(1,2),3)
 13   File "<stdin>", line 1
 14     f(*(1,2),3)
 15              ^
 16 SyntaxError: invalid syntax

Sigh.

Minor scriptutil enhancements

June 16, 2008 by muharem

I have cleaned up the documentation for the scriptutil module which is available on the web now. If you happen to run ubuntu you can also install it as a package straight from my PPA.

Please have a look at this tutorial in case you’re interested in scriptutil usage examples.

Enjoy!

Wrap-up: mergesort in haskell

June 10, 2008 by muharem

I have to admit that I didn’t fully understand the apfelmus example code. Nevertheless, I made an effort to address both of his criticisms:

  1. The merge() function does not use an accumulator argument any more and is indeed much simpler now.
  2. The recursive mergesort_ function now does not use the haskell list length operator any more. Instead, the length of the list to be sorted is passed down the recursive chain.

Please see the optimised mergesort implementation for details.

These improvements reduced the RAM utilisation and improved the run-time performance by another 20% respectively.

Nothing to scoff at, eh?

Haskell byte strings to the rescue!

June 6, 2008 by muharem

After my (naive) mergesort implementation from yesterday used around 730 MB of RAM to sort a (26 MB) file containing approx. 400,000 strings I consulted the good folks on the #haskell IRC channel.

Their advice was to use byte strings as opposed to normal strings since the former perform much better.

I tried that and observed that the RAM utilisation and run-time went down by approximately 85% !

The reduced RAM utilisation was attributed to the more efficient byte strings and the improved run-time performance to the reduced garbage collection overhead respectively.

The difference between the source files (mergesort.hs, Scaffolding.hs) is minimal and switching over to byte strings was facilitated by the fact that they expose the same interface as normal strings.

Nice!

Finger exercises in haskell

June 5, 2008 by muharem

I have been reading about haskell for a while now and felt that it’s time to “get my hands dirty” by (re-)implementing some of the well known algorithms in computer science. For my first exercise I chose merge sort.

The resulting code (sans command line parameter handling, auxiliary I/O functions etc.) is quite neat and was written in little time. Haskell is very expressive and its attractiveness immediately obvious when formulating pseudo-mathematical problem solutions.

  1 module Main(main) where
  2 
  3 import qualified IO
  4 import System(getArgs)
  5 import Monad(mapM_)
  6 import qualified Scaffolding
  7 
  8 mergesort l =
  9     if (length l) <= 1 then l -- The list is already sorted.
 10     else -- Split the list into two halves and sort these.
 11         merge (mergesort lpart) (mergesort rpart) []
 12         where llen_half = (length l) `div` 2
 13               (lpart, rpart) = splitAt llen_half l
 14 
 15 -- Case #1: the right list is empty; just append the left list to the
 16 -- accumulator. The latter is reversed because we were prepending
 17 -- to it in case #3.
 18 merge lpart [] acc = (reverse acc) ++ lpart
 19 
 20 -- Case #2: the left list is empty; just append the right list to the
 21 -- (reversed) accumulator list.
 22 merge [] rpart acc = (reverse acc) ++ rpart
 23 
 24 -- Case #3: neither of the left/right lists to be merged is empty;
 25 -- prepend the lesser head element to the accumulator list.
 26 merge (l:ls) (r:rs) acc =
 27     if l <= r then merge ls (r:rs) (l:acc)
 28     else merge (l:ls) rs (r:acc)
 29 
 30 -- The main method, handles command line arguments, input and output.
 31 main = do
 32     args <- getArgs
 33     fileh <- case head args of
 34                 -- The text to be sorted is to be read from stdin.
 35                 "-" -> do return (Just IO.stdin)
 36                 -- The text to be sorted is to be read from the file
 37                 -- specified.
 38                 "-f" -> do Scaffolding.openFile (head (tail args))
 39                 -- Just sort the command line parameters.
 40                 _ -> do let sorted_args = mergesort args
 41                         IO.putStrLn ("mergesort(" ++ show args ++
 42                                      ") = " ++ show sorted_args)
 43                         return Nothing
 44     case fileh of
 45         -- Read text from file handle, sort it and print to stdout.
 46         Just fileh -> do text_to_sort <- Scaffolding.readLines fileh []
 47                          mapM_ IO.putStrLn (mergesort text_to_sort)
 48         -- We failed to open the file specified (if any).
 49         Nothing -> do return ()

I also liked the fact that the haskell standard library is rich and nicely documented.

Just to get an idea of how my naive mergesort implementation performs I created a file listing all the files on my Dell D630 laptop running Hardy Heron (please ignore the recursive nature of this problem :-) ).

 1 u804: haskell $ ghc --version
 2 The Glorious Glasgow Haskell Compilation System, version 6.8.2
 3 u804: haskell $ ls -l all-files.txt
 4 -rw-r--r-- 1 mhr mhr 26245382 2008-05-31 17:18 all-files.txt
 5 u804: haskell $ wc -l all-files.txt
 6 399754 all-files.txt
 7 u804: haskell $ uname -a
 8 Linux u804 2.6.24-18-generic #1 SMP Wed May 28 20:27:26 UTC 2008 i686 GNU/Linux

I then let loose the compiled haskell binary on that file (all-files.txt) and compared how it performed against /usr/bin/sort.

  1 u804: haskell $ time ./mergesort -f all-files.txt >/dev/null
  2 
  3 real    0m12.994s
  4 user    0m12.349s
  5 sys 0m0.632s
  6 u804: haskell $ time sort all-files.txt >/dev/null
  7 
  8 real    0m10.675s
  9 user    0m10.613s
 10 sys 0m0.052s
 11 u804: haskell $ time sort all-files.txt >/dev/null

The difference in performance was moderate (22%) given that I did not tune the haskell implementation in any way.

However, I gasped when I observed the RAM utilisation while the haskell program was running: it used 726 MB of RAM!

Compare this to 34 MB that were used by the standard sort program.

I guess it’s time to get acquainted with ghc -prof and friends :-)

Last but not least I’d like to point to a great haskell resource, the Real World Haskell beta book.

Text filtering with erlang

April 30, 2008 by muharem

Introduction

After a long break I picked up the Erlang book again and my appetite for writing some erlang code was soon kindled.

A small Python component I produced at work seemed like a good candidate for my (sequential) erlang exercises. It is a fairly simple component that removes user/password data embedded in URLs.

Just so you know where I am coming from:

  • my main/favourite programming language is Python
  • my exercises are mainly about sequential, non-distributed and non-telecoms-related problems whereas erlang’s main strength and appeal lies in the area of parallel/distributed telecoms/networking systems
  • I have played with erlang a little bit before (ring benchmark, XML parsing) and liked it in general although IMHO it lacks severely when it comes to the availability and quality of standard library components.

Now that my particular set of preconceptions is clear and in the open, let’s look at the stuff below :-)

File processing with erlang’s regexp module

The initial implementation of the URL filter in erlang used its regexp library.

  1 -module(regex).
  2 -export([main/1]).
  3 
  4 isalphanum(C) when C > 47, C < 58; C > 64, C < 91; C > 96, C < 123 -> true;
  5 isalphanum(_) -> false.
  6 
  7 %% Generate a temporary file name of length N
  8 genname(0, L) -> L;
  9 genname(N, L) ->
 10     R = random:uniform(123),
 11     case isalphanum(R) of
 12         true -> genname(N-1, [R|L]);
 13         false -> genname(N, L)
 14     end.
 15 
 16 %% Returns a randomly generated temporary file path where the basename is
 17 %% of length N
 18 mktemppath(Prefix, N) -> Prefix ++ "/" ++ genname(N, []).
 19 

Please note how I had to implement functionality absent from the standard library above.

 20 %% Removes passwords embedded in URLs from a log file.
 21 scrub_file(Tmpdir, F) ->
 22     %% make a temporary directory if it does not exist yet.
 23     case file:make_dir(Tmpdir) of
 24         ok -> ok;
 25         {error,eexist} -> ok;
 26         _ -> exit({error, failed_to_make_tmpdir})
 27     end,
 28 
 29     %% Move the original file out of the way.
 30     T = mktemppath(Tmpdir, 16),
 31     case file:rename(F, T) of
 32         ok -> ok;
 33         _ -> exit({error, failed_to_move_file})
 34     end,
 35 
 36     %% Now open it for reading.
 37     {_, In} = file:open([T], read),
 38     %% Open the original path for writing.
 39     {_, Out} = file:open([F], write),
 40 
 41     %% Call the function that will scrub the lines.
 42     scrub_lines(In, Out),
 43 
 44     %% Close the file handles and return the path to the original file.
 45     file:close(Out),
 46     file:close(In),
 47     T.
 48 

The code that scrubs the URLs is below, the scrub_lines() function is tail recursive.

 49 %% This is where the log file is actually read linewise and where
 50 %% the scrubbing function is invoked for lines that contain URLs.
 51 scrub_lines(In, Out) ->
 52     L = io:get_line(In, ''),
 53     case L of
 54         eof -> ok;
 55         _ ->
 56             %% Does the line contain URLs?
 57             case string:str(L, "://") of
 58                 0 -> io:format(Out, "~s", [L]);
 59                 _ ->
 60                     case regexp:gsub(L, "://[^:]+:[^@]+@", "://") of
 61                         {ok, S, _} -> io:format(Out, "~s", [S]);
 62                         {R, S, _} -> io:format("Failed: {~p,~p}", [R,S])
 63                     end
 64             end,
 65             %% Continue with next line.
 66             scrub_lines(In, Out)
 67     end.
 68 
 69 %% Main function.
 70 main([A]) ->
 71     {A1,A2,A3} = now(),
 72     random:seed(A1, A2, A3),
 73 
 74     %% A single argument (the name of the file to be scrubbed) is expected.
 75     F = atom_to_list(A),
 76     T = scrub_file("tmp", F),
 77 
 78     %% The scrubbed file content will be written to a new file that's
 79     %% in the place of the original file. Where was the latter moved to?
 80     io:format("~s~n", [T]),
 81 
 82     init:stop().

The cursory benchmarks performed (on log files of varying size) using the python and the erlang code confirmed other people’s experiences with erlang’s regex performance (but see also this interesting “rebuttal”).

Log file size Python times Erlang times
1 MB 0m0.230s 0m1.896s
10 MB 0m1.510s 0m8.766s
100 MB 0m14.793s 1m17.662s
1 GB 2m55.012s 13m54.588s

The do-it-yourself construction

Curious to learn whether the performance can be improved by abstaining from regular expressions I came up with an alternative implementation that does not use regexp.

As you can see below the do-it-yourself construction is indeed performing slightly better at the expense of being very specialized and requiring 60% more code.

Log file size Python times Erlang regexp Erlang do-it-yourself
1 MB 0m0.230s 0m1.896s 0m1.969s
10 MB 0m1.510s 0m8.766s 0m8.459s
100 MB 0m14.793s 1m17.662s 1m12.448s
1 GB 2m55.012s 13m54.588s 13m3.360s

In conclusion

Every couple of months or so I develop a euphoria towards erlang which is consistently dampened by using the language to tackle problems for which the language admittedly was not designed in first place.

I guess most people start using a language for simple programming exercises first as opposed to building something like a Jabber/XMPP instant messaging server straightaway.

I hate to repeat myself but improving the standard library (by adding common functionality and making sure it performs decently) would do a lot to attract fresh talent to the erlang community and I hear that a certain rate of influx of “fresh blood” is a necessary prerequisite for success.

Ah, and no, you were not supposed to grok the sentence above unless you read it three times :-)

My new “baby laptop”

January 26, 2008 by muharem

I am using a MacBook Pro for roughly 10 months now and I am generally very happy with it. Nevertheless, I like experimenting and playing with other operating systems, primarily with various linux distributions and with members of the *BSD family.

Running Ubuntu or FreeBSD in a virtual machine (using VMWare’s fusion or Parallels desktop) is a good way to get a first impression of such a system but at some point you will want real hardware. I reached that point with Ubuntu when I took an interest in hard disk encryption (why you should encrypt your hard disk).

It just so happened that around that time Aldi (a German retailer) had a nice 12 inch subnotebook on offer. Aldi was targeting the female clientele (the notebook was even adorned with rhinestones etc.) which resulted in a 100 Euro price markup. The system was still a very good value for the money, however, and I even managed to get my hands on the black model :-)

I booted the system up (it had Windows Vista Home Premium pre-installed) in order to take a backup of the system in case I had to restore the original state (prior to sending in it for service for example).

Vista was atrocious, totally crippling the poor thing. I mean, it was horrible, even the simplest interactions or commands took a ridiculous amount of time to complete.
Anyway, I managed to take a snapshot of the hard disk, wiped it clean and installed Ubuntu-7.10.

After running Vista on the box for a few hours, Ubuntu was a relief :-) it felt very snappy and was generally performing very well. I could do all my development work on the notebook without any problems.

I thought that was very cool :-) I finally had a second box, a “baby laptop”, I could use for my experiments. Having two laptops may sound a little bit excessive unless you tamper frequently with all sorts of operating systems. You can keep one machine in a “stable” condition and do all the proper (money generating) work on it while totally rebuilding the other.

I am a slightly paranoid person :-) and like to have as much security on my machines as I can get. I hence wanted to install Ubuntu with full hard disk encryption but more on that topic in a forthcoming article..