Shifting gears

October 15, 2012

After almost 2 years with the OpenQuake project I will be joining Rackspace as a technical cloud advocate on 01-Nov-2012.

This is novel and exciting in many ways as I will have the opportunity to pursue long standing interests and passions (cloud computing, scalable and robust IT architectures, open source, strategic thinking, reaching out to technical audiences etc.) as part of my *day* job.

I am looking forward to working with the good folks at Rackspace, the cloud community at large and anybody interested in putting cloud technology to good use!

See you around!

First experiments with golang

August 1, 2011

I finally found some time to look at the Go programming language (aka golang). In order to get a feeling for it I picked a random Google code jam problem and programmed it in Go.

The code used in the experiments that follow is pretty simple

first impressions

My first impressions were mostly positive: Go has

  • decent documentation covering the language proper as well as the standard library
  • a fast compiler resulting in short edit-compile-test cycles
  • a nice standard library and a wealth of packages (provided by the community)
  • a lively and friendly mailing list and irc channel

The language has quite a “direct” feel to it: I could get to work and be productive almost immediately.
This is in stark contrast to other languages I tried to learn recently e.g. Scala (back in January): it required a lot of reading and even a couple of days into it I was not really productive in Scala.

Go is quite the opposite, the barrier to entry is low, the language is clean and simple. The combined declaration and initialisation operator (':=') alone is a godsend.

Coming from a Python background the main thing I was missing was the REPL. Who knows, maybe there is even one out there but I just did not find it yet..?

playing with goroutines

One of the most attractive golang features is its support for concurrent programming via goroutines and I wanted to play with these.

The programming problem chosen came with an input for 50 calculations. I used it to create inputs with 50, 100 and 200 *thousand* calculations. All calculations are independent of each other i.e. ideally parallelisable.

Being a fairly young language still Go does not parallelise code by default. If CPU parallelism is desired one must tell the run-time how many goroutines shall execute simultaneously.

The code I wrote starts each calculation in a separate goroutine and allows the user to specify the number of CPUs/cores that should be used to execute the program.

Using a bash script I ran the resulting program varying both the number of calculations and the number of CPU cores.

These experiments were conducted on a 32-core server (Quad-Core AMD Opteron Processor 8356) with 64GB of RAM running Ubuntu 11.04 server. Also, I ran each configuration for three consecutive times and used the average duration in the graph below.

Apparently the golang run-time was not able to utilise more than 8 cores when running this particular program.

50, 100 and 200 thousand calculations running on 1 through 16 CPU cores

As can be seen from the graph (full size) above, executing the program on more than 8 cores did not decrease its running time futher.

The 200K calculations input file is a bit over half a gigabyte so I suspected that the program is dominated by I/O and the goroutines cannote execute because the result channel is full.

That lead me to experiment with different result channel sizes. The resulting running times (e.g. for 200K calculations) can be seen in the graph (full size) below.

The 200K calculations running on 1 through 16 CPU cores and with varying result channel sizes

However, varying the result channel sizes did not seem to have a big effect.

Anyway, I am pretty happy with the code at this point but suggestions are always welcome, particularly those aiming at improving the degree of parallelism :-)

conclusions

I am amazed how far I got by investing approx. 10 hours in learning Go and programming in it.

Having used python almost exclusively for the last 5 years I am pretty spoiled when it comes to code conciseness and productivity.
Go is not too far away though, and, programming in it was fun and enjoyable.

I will definitely continue to explore it. Maybe you should give it a whirl as well :-)

EuroPython talk info

June 21, 2011

The slides from the EuroPython talk (python & amqp) I held this morning are here. I’ll post a link to the video when it becomes available.

There are two things I wanted to mention but did not get to:

  1. python-celery: if you are looking to partition and distribute computations do take a look at it. We are using it in the OpenQuake project and are very happy with it.
  2. RabbitMQ in Action: in case you are using RabbitMQ or plan to do so get this book. I started reading it recently and derived a lot of value from it.

How to dual-boot a stable/experimental system with minimal breakage

April 30, 2011

For a couple of years now I have been using a scheme that allows me to dual-boot a stable system (for work) and an experimental system (for fun) with minimal breakage.
Recent reports of people who upgraded their linux machines and ended up with a broken system prompted me to share it.

The idea is to divide your hard disk into at least 7 partitions

    root/usr system 1            12GB
    root/usr system 2            12GB
    /var partition system 1       6GB
    /var partition system 2       6GB
    shared /tmp partition         4GB
    shared swap partition         2GB
    shared home partition       100GB

Just in case you are wondering about the small partition sizes: I am using a 160 GB SSD. It was the best hardware investment in a long time and really makes a difference.
If you are using e.g. a 320/500GB hard disk feel free to double the partition sizes (and/or triple the size of the home partition).

When installing a new linux now only two partitions dedicated to that particular installation are needed:

  • a root/usr partition
  • a /var partition

All the others (/tmp, swap, and /home) are shared. This works particularly well when the two installed systems are reasonably similar (e.g. Ubuntu 10.10 and 11.04). What you can do with the set-up described above is a full/proper installation of the desired system as opposed to an upgrade.

Please note that backing up data you cannot afford to lose is a standard procedure before you tinker with your system (e.g. prior to OS installations and/or upgrades).

Sometimes the experimental system is so unstable that I use another technique: a chroot/schroot combination.

There was e.g. a period during which an installed Ubuntu 11.04 was “unusable” (for me) but I needed to run it for a number of reasons.
I resorted to running Ubuntu 10.10 as my main work system and having an 11.04 chroot. Entering the latter via the schroot utility made for a fairly seamless experience.

I hope this helps :-)

SSDs are the way to go!

March 12, 2011

I bought an intel X25-M SSD last week and it does make a *big* difference! It is faster, develops less noise and heat and the battery lasts longer.

I am using it with a lenovo thinkpad t410 laptop running Ubuntu 10.10 and it’s just great!

For what it’s worth I am running a pretty recent kernel in conjunction with the Ubuntu maverick userland. Not sure how well the normal 2.6.35 kernel supports SSDs.

Anyway, SSDs are the way to go :-)

What is the best way to reset a file in a git topic branch?

March 9, 2011

Sometimes, when reviewing topic branches, I like to reset a file (to whatever it was in the master branch) and play around with it.

I figured out how to do that (see below) but it’s a bit clunky. Please take a look and comment if you know of a better way.

Here goes the example: first a repository is initialised and a file is added to it.

$ mkdir -p gitreset

$ cd gitreset/

$ git init .
Initialized empty Git repository in /home/muharem/tmp/gitreset/.git/

$ cat > a
This is file a, rev. 1
^D

$ git add a

$ git commit -a -m "initial commit"
[master (root-commit) 3e74747] initial commit
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 a

Next a topic branch is created and the file is modified in the former.

$ git checkout -b topic-branch master 
Switched to a new branch 'topic-branch'

$ cat > a
This is file a, rev. 2
^D

$ git diff
diff --git a/a b/a
index 595c3aa..7eb0dca 100644
--- a/a
+++ b/a
@@ -1 +1 @@
-This is file a, rev. 1
+This is file a, rev. 2

$ git commit -a -m "change to file a"
[topic-branch 300108e] change to file a
 1 files changed, 1 insertions(+), 1 deletions(-)

Now I would like to reset the file to whatever it was in the master branch.

$ git reset master a
Unstaged changes after reset:
M	a

$ cat a
This is file a, rev. 2

$ git diff
diff --git a/a b/a
index 595c3aa..7eb0dca 100644
--- a/a
+++ b/a
@@ -1 +1 @@
-This is file a, rev. 1
+This is file a, rev. 2

$ git diff --staged
diff --git a/a b/a
index 7eb0dca..595c3aa 100644
--- a/a
+++ b/a
@@ -1 +1 @@
-This is file a, rev. 2
+This is file a, rev. 1

It appears the file was reset but the revision of interest is in the staging area. To get that revision into the working tree I need to do additional work.

$ git diff --staged | patch -p1
patching file a

$ cat a
This is file a, rev. 1

Is there a way to have the changes resulting from git reset in the working tree straightaway?

OpenQuake is hiring in Zürich and in Pavia

March 3, 2011

The OpenQuake project is looking to hire two Python developers (one in Zürich/Switzerland, the other in Pavia/Italy).

We are a global and public project, do our development in accordance with agile principles and all our code is open.

Please see [1] and [2] below for more details on what we do.

In case you are interested, please send me an email (muharem SPAM-SUCKS linux.com) with your date of availability, your CV as well as some (python) code samples.

[1] http://www.globalquakemodel.org
[2] http://www.openquake.org

The world has changed

February 20, 2011

Whether for the better or the worse is left as an exercise to the reader but it definitely has changed.

The other day I was whining diffusely about the breakage of my new lenovo thinkpad t410. A day later I get a response. And a useful one at that!

Think about it! When did we ever have this before? Random people from a different continent taking note of one’s utterances and sharing their knowledge?

Just in case you missed it, the world has changed. And in this particular case I quite like it :-)

Vim mappings for the win

September 27, 2010

I mostly work in source code hierarchies where for a given source file X.source the location of the file with the unit tests is tests/test_X.source and more often than not I need to do edit the unit tests after having opened the actual source file.

Being the geek that I am I *obviously* need to come up with some sort of optimisation or shortcut even if it takes 10x as long as stupidly typing in ":e tests/test_X.source" all the time :-)

Thankfully, the solution in vim turns out to be quite straightforward. The following mapping (conveniently added to your $HOME/.vimrc file) will open the unit test file when you type %%

nnoremap %% :e =escape(expand("%:h")."/tests/test_".expand("%:t"), "")^M

Please note that the last bit ("^M") is just one character (the Enter key) that you can get by typing ^V followed by the Enter key.

I guess what I should really do is write a configurable vim plugin that opens arbitrary files/locations based on the current buffer/location. Oh well, so much to do and so little time :P

Tool of the year!

June 12, 2010

I have been suffering from the multiple clipboards “feature” under X-Windows/Gnome for quite a while and finally found a solution: parcellite.

Install it, run it, right-click on the icon, select “preferences” from the menu, check “Use Primary (Selection)” and “Synchronize clipboards” in the “Behavior” tab.

.. and enjoy life again :-)


Follow

Get every new post delivered to your Inbox.