Erlang vs. Stackless python: a first benchmark

Change of Heart

Mon Aug 17 17:51:58 CEST 2009

PLEASE NOTE:

I have had more time to study erlang since writing the article below and
have come to realize that the conclusions I drew are skewed.

Erlang

  - is a language and a high-quality runtime system optimized for
    concurrent and distributed execution. The examples below do not
    really exercise any of these strengths.
  - has proven itself in quite a number of high-profile projects and
    deployments (ejabberd, CouchDb and RabbitMQ just to mention a few).
  - has a strong, vibrant and competent community that's driving the
    erlang system and improving it continuously.

For all these reasons stackless python -- although being an interesting
piece of technology in its own right -- is certainly no match for
erlang.

Please see http://pseudogreen.org/blog/erlang_vs_stackless_vs_multitask.html
for what appears to be a more even-handed comparison.

Introduction

I obtained a copy of Joe Armstrong’s Erlang book recently. After reading through the first half I have found it to be a very commendable work, making a strong case for message passing concurrency — a different approach to building large scale and highly concurrent systems.

After finishing chapter 8, I came across the following exercise:

Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N*M messages get sent. Time how long this takes [..]

Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the internet!

I liked it, it looked like a great opportunity to get my feet wet with Erlang and message passing concurrency :-)

So, here we are, that’s precisely what this article is about.

The “other programming language” is of course Python, in this particular case Stackless Python featuring tasklets and channels to counter Erlang’s concurrency machinery.

The source code

Both the Erlang and the Stackless python code set up a ring of N processes so that each of the latter knows its successor. The resulting “ring” has the shape of an open necklace:

  • there is a first and a last bead in the chain
  • each message
    • is sent to the first bead
    • is relayed by beads number [1 .. N-1] to their immediate neighbour
    • stops at bead number N

The Erlang program

This is my very first Erlang program, so please bear with me. If there’s anything that could be done better, feel free to comment on it (in a polite way).

  1 % sets up a ring of N processes, a message is passed M times around the ring.
  2 -module(oringb).
  3 -export([run_benchmark/2, loop/2, mloop/1, main/1]).
  4 % builds a ring of N processes and send a message around the ring M times.
  5 run_benchmark(N, M) ->
  6     io:format(">> Erlang R11B-5 here (N=~w, M=~w)!~n~n", [N, M]),
  7     % spawn the other N-1 processes needed for the ring
  8     LastP = lists:foldl(fun(S, Pid) -> spawn(?MODULE, loop, [N-S, Pid]) end,
  9                         self(), lists:seq(1, N-1)),
 10     spawn(fun() -> [ LastP ! R || R <- lists:reverse(lists:seq(0, M-1)) ] end),
 11     mloop(N).

The function above sets up the ring. The lists:foldl construct is equivalent to Python’s built-in reduce() function.

The process spawned on line 10 sends the M messages around the ring. The messages are just integers taken from the [M-1 .. 0] sequence.

The main process acts as process number N and invokes the function mloop() thus entering the appropriate receive loop on line 11.

 12 % receive loop executed by the first N-1 processes in the ring
 13 %   - 'S' is the sequence number allowing us to keep track of a
 14 %     process' position in the ring
 15 %   - 'NextP' is the next process in the ring
 16 %   - 'R' is the round/message number, zero indicates that the processes
 17 %      should terminate;
 18 loop(S, NextP) ->
 19     receive
 20         % the message number is above zero => forward the message to the next
 21         % process in the ring
 22         R when R > 0 -> NextP ! R,
 23             io:format(": Proc: ~8w, Seq#: ~w, Msg#: ~w ..~n", [self(), S, R]),
 24             loop(S, NextP);
 25         % the message number is zero => forward message and terminate
 26         R when R =:= 0 -> NextP ! R,
 27             io:format("* Proc: ~8w, Seq#: ~w, Msg#: terminate!~n", [self(), S]);
 28         % error: the message number is below zero => raise exception
 29         R when R < 0 ->
 30             erlang:error({"internal error", "invalid message number"})
 31     end.

The loop() function is executed by the first N-1 processes in the ring. These know their position in the ring (S) as well as their immediate neighbour (NextP) to which they forward all messages received.

The messages (R) are just integers. Values above zero keep the ring going, a zero value (the last message) results in the termination of the ring.

 32 % receive loop executed by the last (Nth) process;
 33 % it won't forward any messages
 34 mloop(S) ->
 35     receive
 36         R when R > 0 ->
 37             io:format("> Proc: ~8w, Seq#: ~w, Msg#: ~w ..~n", [self(), S, R]),
 38             mloop(S);
 39         0 ->
 40             io:format("@ Proc: ~8w, Seq#: ~w, ring terminated.~n", [self(), S])
 41     end.

The mloop() function acts as the receive loop of the process number N. It “terminates” the messages that go around the ring. Its main purpose is to produce diagnostic output.

 42 % 'main' function allowing the invocation from the shell as well as the
 43 % passing of command line arguments
 44 main(A) ->
 45     Args = [ list_to_integer(Litem) ||
 46              Litem <- [ atom_to_list(Atom) || Atom <- A ]],
 47     [N, M] = Args,
 48     run_benchmark(N, M).

What follows is an example invocation of the oringb module (a ring of 4 processes with 3 messages going around):

  1 bbox33:ring $ erl -noshell -s oringb main 4 3 -s init stop
  2 >> Erlang R11B-5 here (N=4, M=3)!
  3 
  4 : Proc: <0.29.0>, Seq#: 1, Msg#: 2 ..
  5 : Proc: <0.28.0>, Seq#: 2, Msg#: 2 ..
  6 : Proc: <0.27.0>, Seq#: 3, Msg#: 2 ..
  7 : Proc: <0.29.0>, Seq#: 1, Msg#: 1 ..
  8 : Proc: <0.28.0>, Seq#: 2, Msg#: 1 ..
  9 > Proc:  <0.1.0>, Seq#: 4, Msg#: 2 ..
 10 : Proc: <0.27.0>, Seq#: 3, Msg#: 1 ..
 11 * Proc: <0.29.0>, Seq#: 1, Msg#: terminate!
 12 * Proc: <0.28.0>, Seq#: 2, Msg#: terminate!
 13 > Proc:  <0.1.0>, Seq#: 4, Msg#: 1 ..
 14 * Proc: <0.27.0>, Seq#: 3, Msg#: terminate!
 15 @ Proc:  <0.1.0>, Seq#: 4, ring terminated.

Lines 4-6,9 are generated by the first message, lines 7,8,10 and 13 show the second message going around the ring — lines 11,12,14 and 15 are the result of the zero-valued (terminate ring!) message.

The Stackless Python program

I have made an attempt to keep the python code as similar as possible to its Erlang counterpart. The run_benchmark() function below builds the ring of N tasklets (lines 9-18) and sends the M messages around it (lines 19-21).

One minor difference to note: the main process/tasklet here is not the last process in the ring but sends the messages (after spawning the last process on lines 16-18).

  1 #!/Library/Frameworks/Python.framework/Versions/2.5/bin/python
  2 # encoding: utf-8
  3 import sys
  4 import stackless as SL
  5 
  6 def run_benchmark(n, m):
  7     print(">> Python 2.5.1, stackless 3.1b3 here (N=%d, M=%d)!\\n" % (n, m))
  8     firstP = cin = SL.channel()
  9     for s in xrange(1, n):
 10         seqn = s
 11         cout = SL.channel()
 12         # print("*> s = %d" % (seqn, ))
 13         t = SL.tasklet(loop)(seqn, cin, cout)
 14         cin = cout
 15     else:
 16         seqn = s+1
 17         # print("$> s = %d" % (seqn, ))
 18         t = SL.tasklet(mloop)(seqn, cin)
 19     for r in xrange(m-1, -1, -1):
 20         # print("+ sending Msg#  %d" % r)
 21         firstP.send(r)
 22     SL.schedule()

The loop() function is executed by the first N-1 processes in the ring as in the Erlang code. Apart from relaying the messages received, it mainly produces diagnostic output.

Each tasklet knows its position in the ring (S) and is given an input channel from which to read its messages (cin) as well as an output channel to which to write all the messages received (cout).

 23 def loop(s, cin, cout):
 24     while True:
 25         r = cin.receive()
 26         cout.send(r)
 27         if r > 0:
 28             print(": Proc: <%s>, Seq#: %s, Msg#: %s .." % (pid(), s, r))
 29         else:
 30             print("* Proc: <%s>, Seq#: %s, Msg#: terminate!" % (pid(), s))
 31             break

The last tasklet runs the mloop() function which mainly produces diagnostic output.

 32 def mloop(s, cin):
 33     while True:
 34         r = cin.receive()
 35         if r > 0:
 36             print("> Proc: <%s>, Seq#: %s, Msg#: %s .." % (pid(), s, r))
 37         else:
 38             print("@ Proc: <%s>, Seq#: %s, ring terminated." % (pid(), s))
 39             break
 40 
 41 def pid(): return repr(SL.getcurrent()).split()[-1][2:-1]
 42 
 43 if __name__ == '__main__':
 44     run_benchmark(int(sys.argv[1]), int(sys.argv[2]))

The following is an example invocation of the python code (a chain of 4 processes with 3 messages going around (like above)):

  1 bbox33:ring $ ./oringb.py 4 3
  2 >> Python 2.5.1, stackless 3.1b3 here (N=4, M=3)!
  3 
  4 > Proc: <6b870>, Seq#: 4, Msg#: 2 ..
  5 : Proc: <6b730>, Seq#: 1, Msg#: 2 ..
  6 : Proc: <6b770>, Seq#: 2, Msg#: 2 ..
  7 : Proc: <6b7f0>, Seq#: 3, Msg#: 2 ..
  8 > Proc: <6b870>, Seq#: 4, Msg#: 1 ..
  9 : Proc: <6b730>, Seq#: 1, Msg#: 1 ..
 10 : Proc: <6b770>, Seq#: 2, Msg#: 1 ..
 11 : Proc: <6b7f0>, Seq#: 3, Msg#: 1 ..
 12 @ Proc: <6b870>, Seq#: 4, ring terminated.
 13 * Proc: <6b730>, Seq#: 1, Msg#: terminate!
 14 * Proc: <6b770>, Seq#: 2, Msg#: terminate!
 15 * Proc: <6b7f0>, Seq#: 3, Msg#: terminate!

Line 4-7 show the first message going around, lines 8-11 stem from the second message and lines 12-15 make the last (terminating) message visible (I guess the funny “order of execution” above arises from the fact that the Stackless python scheduler by default prefers the message receiver tasklets).

The first benchmarks

Now for the actual benchmarks, all run on a machine with the following specs:

  Model Name:	MacBook Pro 15"
  Model Identifier:	MacBookPro2,2
  Processor Name:	Intel Core 2 Duo
  Processor Speed:	2.33 GHz
  Number Of Processors:	1
  Total Number Of Cores:	2
  L2 Cache (per processor):	4 MB
  Memory:	2 GB
  Bus Speed:	667 MHz

I ran two kind of benchmarks:

  1. fixed number of processes (ring size N = 100) and varying numbers of messages (10000 <= M <= 90000 stepped up in 10K increments)
  2. fixed number of messages (M = 100) and varying numbers of processes (10000 <= N <= 90000 stepped up in 10K increments)

In both cases the resulting total number of messages sent was between 1 and 9 million.

I expected that Erlang would clearly outperform Stackless python because this was the former’s “sweet spot” scenario after all.

I was pleasantly surprised by python not only holding up to the challenge but winning it!

As can be seen from the diagrams depicting the results of the two benchmarks, Stackless Python performed much better than Erlang!

100 processes varying message numbers
100 messages varying number of processes

The second round

Eventually I came to believe that Erlang’s I/O library is very slow and that that might be the reason for its sub-optimal performance in the benchmarks above.

I hence modified both sources (Erlang here, Stackless python here) and commented out all of the output.

Following that, I reran the benchmarks with the following results.

100 processes varying message numbers (no output)
100 messages varying number of processes (no output)

Now Erlang performed much more like a system that was built with message passing concurrency in mind BUT with Stackless python still close on its heels.

In conclusion

Even I as a Python aficionado am sometimes baffled in light of the gems that can be found in the Python treasure trove.

The Python community should stop worrying about the language falling out of fashion and/or being overtaken by RoR/Ruby or some other contender of the day.

Python’s potential is so huge, all it takes is attracting notice to it e.g. to the fact that Python is ready for Our Manycore Future.

Last but not least, I find it’s quite a shame that Stackless python seems to be treated like the “unloved stepchild” by the community.

Highly concurrent systems are the future of software and stackless facilitates the construction of such systems today.

Isn’t such a capability worthy of being maintained, improved and expanded upon as opposed to building yet another web framework or whatever ..?

About these ads

54 thoughts on “Erlang vs. Stackless python: a first benchmark

  1. perfect illustration of the problem of the microbenches. one might gleen that your benches imply stackless python might produce similar results for massively parallel messaging framework. at the end of the spectrum of your bench, erlang is nearly a magnitude of order more efficient at process scaling. but this isn’t even taking into account a modern computer architecture with multiple cores.

    great blog article though, we are all interested in erlang and how it stacks up against other languages in both efficiency and as a medium to “craft” code in. did you find the erlang environment pleasant to work in?

  2. @anonymous: Having played with Haskell and Ocaml and written production code in Python, Erlang does not feel alien to me at all.
    What I find quite sobering though is a look at Erlang’s standard library: did not spot any HTML/XML parsing libraries, it appears there’s generally not a lot of support for dealing with the web etc.

  3. @teki321: you are definitely right. Running a benchmark on a multi-CPU machine or even a cluster would certainly produce more authoritative results.

    Unfortunately, I do not have access to such systems at present ..

  4. Nice comparison, but I’m not certain that the comparison is as straightforward as you’ve presented. I’m not familiar with stackless, so please correct me if I’m wrong.

    While the end effect of the programs are the same in this example, the concurrancy model’s are not really identical. In Erlang, you pass messages to a known PID. In stackless, you read and write from predifined channels. Also, Erlang PIDs are not necessarily local.

  5. Muharem,

    Take a look at the packages in CEAN & Jungerl. An XML processing library (xmerl) is already a part of the standard release and there are various early versions of web tools, templating system, and even a rough version of something like Rails available via the previously mentioned repositories.

    One of the things I am looking forward to from the people reading through Programming Erlang are a few ports of useful modules/libraries from Python and Ruby over to Erlang.

  6. @Robert: you are definitely right and Erlang as a system and a language is certainly deserving of a large and strong community that will create the rich libraries it needs to be widely adopted.

  7. @Robert/2: regarding the comparison being straightforward etc.:

    I used channels in the stackless code because it was the equivalent paradigm. I did not find a way to address tasklets and send messages to them just by using their IDs.

    You are right regarding the distribution capabilities of Erlang being more advanced than python’s i.e. the Erlamg processes not being necessarily local etc., point taken.

  8. Good article. I’ve used stackless and I’ve been meaning to try erlang.

    I don’t understand one part of your conclusion though. You say “Python is ready for Our Manycore Future.” But as I understand it, stackless python is the same as python in terms of the GIL, and stackless “threads”, or “microthreads”, or whatever you call multiple lines of execution, all happen in a single system thread such that they don’t take advantage of multiple processors or cores.

  9. @Michael: Thanks four your comments: my understanding was that you can have several threads each with their own set of tasklets and inter-tasklet communication across the threads.

    It would seem that the GIL is not a black or white affair: depending on what you do inside a thread it may get released e.g. for network I/O.

    See e.g.: http://osdir.com/ml/python.stackless/2006-04/msg00034.html

    ============================================

    On second thought you may actually be right (and I may have to tone down my enthusiasm for python somewhat): the GIL seems to make it impossible for python to take advantage of a Manycore deployment environment.

    What’s worse: there are no plans for doing away with the GIL. See e.g.the thread starting with:
    http://mail.python.org/pipermail/python-dev/2005-September/056609.html

    And also: http://mail.python.org/pipermail/python-list/2002-May/145950.html

    So, maybe the fast advent of Manycore systems combined with the inability of the current programming languages to take advantage of such systems is all that is needed to catapult Erlang from relative obscurity to widespread adoption :-)

  10. Why is everyone saying “try it on a multicore system!”? You DID run it on a multicore system, your macbook pro is not a single core system.

    Pay attention people…

    Very interesting article though, I’d like to see the graphs in their full size so they’re easier to read, though.

  11. @ryan:

    100 processes, varying numbers of messages (with output):
    http://muharem.files.wordpress.com/2007/07/100p.png

    100 messages, varying numbers of processes (with output):
    http://muharem.files.wordpress.com/2007/07/100m.png

    100 processes, varying numbers of messages (no output):
    http://muharem.files.wordpress.com/2007/07/100p_no_io.png

    100 messages, varying numbers of processes (no output):
    http://muharem.files.wordpress.com/2007/07/100m_no_io.png

  12. @Bob: I tried “-smp” but my system does not support it..

    bbox33:ring $ erl -smp -noshell -s oringb main 4 3 -s init stop
    Argument ‘-smp’ not supported.
    Usage: erl [-version] [-sname NAME | -name NAME] [-noshell] [-noinput] [-env VAR VALUE] [-compile file ...] [-smp auto|disable] [-hybrid] [-make] [-man [manopts] MANPAGE] [-x] [-emu_args] [+A THREADS] [+B[c|d|i]] [+c] [+h HEAP_SIZE] [+K BOOLEAN] [+l] [+M ] [+P MAX_PROCS] [+R COMPAT_REL] [+r] [+S NO_OF_SCHEDULERS] [+T LEVEL] [+V] [+v] [+W] [args ...]

  13. ” BUT with Stackless python still close on its heels.”

    Maybe I misinterpreted the last two graphs, but the slope is quite different. I wouldn’t call that close at all, for the same reason I wouldn’t call Stackless and Erlang close when using IO.

    A small difference in time at a fixed scale is uninteresting when asking how well they scale, just as when considering the time complexity of heapsort versus bubblesort.

  14. Weird that you don’t have SMP support. Works fine with the R11B-5 we built here:

    ahi:~ bob$ uname -a
    Darwin ahi.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
    ahi:~ bob$ erl
    Erlang (BEAM) emulator version 5.5.5 [smp:2] [async-threads:4] [kernel-poll:true]

  15. Good article but you seem to have missed one important point of the exercise, the task was to send a single message around the ring M times rather than sending M messages around the ring a single time. The difference is that you are doing M tasks in parallel in your example rather than waiting for a single message to go around the loop M times.

    Try it the other way and let us know your results!

  16. Hello Jason: not quite sure what you mean by “The difference is that you are doing M tasks in parallel in your example..”

    Yes, in my example there are M messages going around the ring once as opposed to a single message going around the ring M times.

    However, the M messages are sent in sequential manner and not in parallel.

  17. @Bob: OK, will rebuild and see how things work out..

    bbox33:upgrade $ port variants erlang
    erlang has the variants:
    universal
    smp
    leopard
    i386
    bbox33:upgrade $ sudo port install erlang +smp +i386
    —> Extracting erlang
    —> Applying patches to erlang
    —> Configuring erlang
    —> Building erlang with target all

  18. OK, building erlang with smp support worked and I’ll probably post updated results in a few days:

    bbox33:upgrade $ sudo port install erlang +smp +i386
    —> Extracting erlang
    —> Applying patches to erlang
    —> Configuring erlang
    —> Building erlang with target all
    —> Staging erlang into destroot
    —> Installing erlang R11B-5_0+smp
    —> Activating erlang R11B-5_0+smp
    —> Cleaning erlang
    bbox33:upgrade $ erl -smp enable
    Erlang (BEAM) emulator version 5.5.5 [smp:2] [async-threads:0] [hipe] [kernel-poll:false]

    Eshell V5.5.5 (abort with ^G)
    1>

  19. For some reason the benchmark takes longer to run with ‘-smp enable’

    == ring of 100 processes with 10000 messages going around w/o ‘-smp enable’ ==
    bbox33:ring $ time erl -noshell -s oringb main 100 10000 -s init stop 1>/dev/null

    real 0m20.124s
    user 0m16.175s
    sys 0m2.900s

    == ring of 100 processes with 10000 messages going around with ‘-smp enable’ ==
    bbox33:ring $ time erl -smp enable -noshell -s oringb main 100 10000 -s init stop 1>/dev/null

    real 0m27.766s
    user 0m27.360s
    sys 0m8.854s

  20. Same benchmark but without output:

    bbox33:ring $ time erl -noshell -s ring main 100 10000 -s init stop 1>/dev/null
    real 0m1.458s
    user 0m0.285s
    sys 0m0.133s

    bbox33:ring $ time erl -smp enable -noshell -s ring main 100 10000 -s init stop 1>/dev/null

    real 0m7.809s
    user 0m5.949s
    sys 0m6.725s

  21. Hum… Maybe I’m wrong, but there is no parallelism here. I mean, you have a bunch of “nodes” waiting a SINGLE message to pass around. Basically, only one of those node is ever active at any time.

    So I don’t see why having multiple cores would make this go any faster (I would actually expect it to go slower a bit).

  22. @Luc: you are right, this particular program does not lend itself to parallelisation. However, the overhead is quite significant and — surprisingly — appears to be constant (6-7 seconds in both cases).

  23. @muharem: the difference between your code and the stated requirements are that to start the chain you send a single message to the start of your ring, then when that message gets to the end the tail node needs to decrement the number and send it back to the head node again until M = 0.

    You can see this is not the case in your initial output when some messages are getting printed out of order for example.

  24. The point of a ring benchmark is to measure communication overhead. So it will run slower on SMP, but if one language runs 2x slower on SMP and another runs 10x slower, you know which has less overheard. Of course, if one language doesn’t support SMP at all, then you really can’t compare.

  25. For stuff like this compiling with HiPE will probably make a big difference too, I think you can get it from erlc with +native. I don’t need to use HiPE in practice though, real-world (IO bound) Erlang code tends to run plenty fast on BEAM… and it wipes the floor with the Python/Twisted stuff we used to have. Some TCP/HTTP benchmarks would be more relevant to people actually deciding between Python and Erlang.

  26. More objections from the RSDN.ru – a Russian site:

    First comes from Gaperton here: http://rsdn.ru/forum/message/2608128.1.aspx

    In Erlang code Erlang is doing more cheks that Python does, so the correct version would be:

    18 loop(S, NextP) ->
    19 receive
    26 0 -> NextP ! 0,
    27 io:format(“* Proc: ~8w, Seq#: ~w, Msg#: terminate!~n“, [self(), S]);
    20 % the message number is above zero => forward the message to the next
    21 % process in the ring
    22 R -> NextP ! R,
    23 io:format(“: Proc: ~8w, Seq#: ~w, Msg#: ~w ..~n“, [self(), S, R]),
    24 loop(S, NextP);
    25 % the message number is zero => forward message and terminate
    31 end.

    (see the link above for a properly formatted version of the code)

    THe same guy also writes here: http://rsdn.ru/forum/message/2608161.1.aspx

    It’s not right to measure the startup time. It is impossible for curves to intersect like it’s shown in the graphs. Indeed, if we subtract the startup time, the test shows that Erlang is roughly four times faster when sending messages through long chains – all thanks to its more elaborate scheduler which sends the messages as a wave. Erlang is also roughly two times faster when sending ,essages through a short chain – thanks to optimal message sending algorithm. If this is called “on Erlang’s wheels” – then ok :) At least twofold isn’t tenfold

    The difference will get even much more noticeable when we start using pattern-matching on mailboxes. That is, when we get to implementing a more complex protocol than just sending messagies in a circle. This is where the real fun starts.

    Additionally it would be interesting to compare this code when it runs across different machines. Ooops. Stackless Python supports neither that nor SMP. What happens when Python process crashes? So how is Python concurrent and comparable to Erlang? :))

  27. First of all, thanks to Muharem for this nice blog.
    It is interesting, although I have no idea about Erlang.

    A few comments:

    Stackless Python has its concurrency support built in C, but Python
    is not a compiled language. I don’t know if the E code was compiled?

    Stackless development for CPython has not gone much further for the last 3 years
    because I was involved in the PyPy project, which is going to be
    the superior Python at some point. Stackless has ben implemented
    on top of PyPy, with support for multiple concurrency models at
    the same time, supporting coroutines directly in the core.
    At some point, I will write a similar benchmark that compiles
    to native code.

    This site might be interesting: http://codespeak.net/pypy/

    Stackless’ concurrency model is a little bit modelled after the Limbo language
    from Plan9. My intention at that point was to have a simple
    model with anonymous rendevouz points. Meanwhile, we have
    found a scalable way to handle multiple concurrency models
    in a composable way, with or without explicit naming of jump targets,
    and the old Stackless model is now just a special case.
    Stay tuned, there is a lot more to come.

    About the GIL:
    Yes, it is always a problem with CPython, and although there once was
    a huge patch to replace it by extensive object locking, this will probably
    not vanish from CPython in any near future. PyPy has the flexibility to try alternatives
    to a GIL, there was just not enough time during the EU funded
    project, but it will come for sure. We will be looking into ways
    to do better separation of objects, to avoid to much locking expense.

    On real threads:
    Stackless has limited support to control real threads with a tasklet.
    You can have tasklets be running in different threads. This helps
    with blocking OS calls, but not with the GIL problem. So again,
    if there will be a solution for real concurrency and multiple CPUs,
    this will first appear in PyPy.

    On number of processes:
    I’d like to point out that Stackless’ tasklets are very tiny objects.
    I would be interested to know how things scale in Erlang.
    Try your example with 100000 tasks. This is a cakewalk with
    Stackless, since a tasklet is a few 100 bytes, only.

    Thanks again for this nice blog, I will follow this.

    cheers – chris

  28. Dear all,

    thank you very much for your comments drawing attention to a number of interesting aspects I had not covered in the article. I have learned quite a bit by exchanging views with some of you which is one of the benefits of publishing in the blogosphere :-)

    In retrospect I think it is fair to state that running the benchmark on my single-CPU system did not really do justice to Erlang which appears to have advanced capabilities when it comes to distributing processes on multi-CPU machines or even clusters.

    Furthermore, a proper benchmark would have to tackle problems that are more amenable to parallelisation in order to allow the languages/systems involved to show off their true potential.

    Anyway, it was an interesting experience, and I for one will continue to experiment with both Erlang and Python.

    Best regards — Muharem

  29. Pingback: Erlang ring problem » SDLC Blog

  30. Pingback: Articles » Blog Archive » Concurrent Programming

  31. Pingback: Daniel C. Wang's Blog : Lightweight threading using C# Iterators

  32. Pingback: jessenoller.com - Stackless: You got your coroutines in my subroutines.

  33. I tried it with SMP on my Athlon 3800 X2.

    $ time erl -smp disable -noshell -s oringb main 100 90000 -s init stop >> Erlang R11B-5 here (N=100, M=90000)!

    real 0m5.398s
    user 0m3.920s
    sys 0m0.452s
    $ time erl -smp enable -noshell -s oringb main 100 90000 -s init stop
    >> Erlang R11B-5 here (N=100, M=90000)!

    real 0m10.070s
    user 0m13.733s
    sys 0m2.188s

  34. I’m a bit worried about your post because I think there is nothing concurrent in this example. It’s interesting though.

  35. Pingback: stackless python

  36. Pingback: Debunking the Erlang and Haskell hype for servers – Codexon

  37. Hello,

    I wanted to share an implementation I built with C# and the Reactive Extension for .NET. I’d say it’s pretty elegant. It prints only the very last message received by the last node.

    var lastMessage =
    Enumerable
    .Range(0, 100)
    .Select(_ => new Subject())
    .Memoize(nodes =>
    {
    nodes.Buffer(2, 1).Where(o => o.Count == 2).ForEach(o => o[0].Subscribe(o[1]));
    Observable.Range(0, 90000).Delay(TimeSpan.FromMilliseconds(100)).Subscribe(nodes.First());

    return nodes;
    }).Last().Last();

    Console.WriteLine(lastMessage);

    Here’s the benchmark for 100 nodes at various number of messages:

    https://lh3.googleusercontent.com/-rUEQtrp5bGo/Thmy1NUlMOI/AAAAAAAAAHE/DZkQwki3VDY/ErlangRingWithRx.png

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s