Project Inquiry
Published on

Multithreading in the MRI Ruby Interpreter

Multithreading in Ruby is still not widespread within the community, even though concurrency yields huge benefits for certain kinds of programs. This is especially true in server environments. For instance, using multithreaded HTTP servers increases a server’s throughput and reduces its memory footprint.

Now, there’s a lot confusion surrounding Ruby’s multithreading capabilities. Especially, with regards to all the changes between 1.8 and 1.9. So, let’s be absolutely clear: MRI Ruby does in fact support native threads. Prior to 1.9, only green threads used to be supported. Since Ruby 1.9 this isn’t true anymore: Nowadays, Ruby supports real OS-level threads.

Native vs. Green Threads

The difference between the two is that the kernel knows about native threads, but it doesn’t know about green threads. In other words, if your program is using only green threads it’s still a single threaded program from the kernel’s perspective. All of the thread creation, destruction, and scheduling takes place within your process (i.e. in user space) and is therefore hidden from the kernel.

To verify the difference between 1.8 and 1.9, we can use a test-program like the following (two-threads.rb):

t1 = Thread.new { while true ; end }
t2 = Thread.new { while true ; end }
t1.join # wait for thread 1 to finish
t2.join # wait for thread 2 to finish

Now, to perform our little experiment, we have to start the test-program and ask the kernel for the list of threads it knows about while the program is running. Fortunately, there’s no lack of utilities to accomplish exactly that. One these utilities is the good old ps with its -m option (ps is short for process status).

$ ruby -v
ruby 1.8.7 (2014-01-28 patchlevel 376) [x86_64-linux]
$ ruby two-threads.rb &
[2] 31239
$ ps -m -p $! # $! is the PID of the ruby process
  PID TTY          TIME CMD
31239 pts/2    00:00:02 ruby
    - -        00:00:02 -
$ kill $!

As you can see, ps tells us that the process is single-threaded – exactly what we expected for 1.8. On the other hand, if the process would be using native threads, we would have expected to see exactly three threads: The main thread in addition to the two that we created ourselves. Let’s perform the same experiment again with 1.9, to see if our assumptions are correct:

$ ruby -v
ruby 1.9.3p547 (2014-05-14 revision 45962) [x86_64-linux]
$ ruby two-threads.rb &
[1] 31369
$ ps -m -p $! # $! is the PID of the ruby process
  PID TTY          TIME CMD
31369 pts/2    00:00:03 ruby
    - -        00:00:00 -
    - -        00:00:00 -
    - -        00:00:01 -
    - -        00:00:01 -
$ kill $!

Surprisingly, we are seeing four threads instead of three. Let’s get back to that extra thread a little later. The important thing to takeaway here is that our two threads as well as the main thread are visible to the kernel. Therefore, we can conclude that Ruby >= 1.9 is indeed using native threads.

Native threads are superior, because they are the only way to achieve parallelism. Scheduling happens in the kernel (i.e. in kernel space) and only the kernel can simultaneously schedule your threads on multiple processors (in this context, a processor with multiple cores is really the same thing as having multiple processors).

You have to remember that concurrency does not equal parallelism. Parallelism implies concurrency, but the reverse is not true. Green threads are sufficient for concurrency, but parallelism requires native threads.

Obviously, the distinction doesn’t matter if your program is running on a single-processor machine. Nonetheless, concurrency still makes sense, because you can utilize your processor for other threads if your current thread is blocked for some reason (just think about blocking IO).

Wait a second! What about the “Global Interpreter Lock” or “GIL”?

While all of this sounds great in theory, parallelism isn’t possible in practice due the Global Interpreter Lock. The GIL is mechanism built directly into the interpreter to ensure that two threads belonging the same Ruby process can never be executed in parallel.

So, basically we are back to square one: It seems like we just lost all the advantages provided by the native threads in 1.9

What’s story behind the extra thread that Ruby >= 1.9 creates?

Ruby’s threading model is cooperative, even though it behaves as if it were preemptive from the developer’s point of view. As it turns out, a preemptive model wouldn’t be possible – yet another consequence of the GIL.

Whenever the interpreter decides to context-switch between threads, the following three steps must all happen (in the given order):

  1. The current thread has to release the GIL.
  2. The scheduler has to select the next thread.
  3. The new thread has to acquire the GIL.

All three of these steps are happening behind scenes and cannot be influenced by your Ruby code. The interesting thing here is, that the GIL has to be released voluntarily by the current thread (which is what makes the threading model cooperative). This implies that the context-switch has to be initiated by the current thread. But how does the current thread decide when a context-switch should happen?

While the interpreter is executing your Ruby code, it’s continuously checking a boolean flag indicating whether a context-switch should happen or not. If the flag is true, it’s initiating the context-switch after which the flag is reset to false.

Setting the flag to true is the responsibility of that extra thread – known as the timer thread. Its implementation is actually really simple and can be summarized as follows: (1) wait for a fixed period of time, (2) set the flag to true, (3) start over with (1).

The entire purpose of the timer thread is to make context-switch decisions as efficient as possible. As pointed out already, the interpreter only has to check a boolean flag to make its decision (opposed to a more complex algorithm). This is a significant win, if you consider that the interpreter has to make that context-switch decision over and over.

Conclusion

Ruby 1.9 replaced green threads with native threads. However, the GIL is still preventing parallelism. That being said, concurrency has been improved through better scheduling. The new scheduler makes context-switch decisions more efficient, by essentially moving them to a separate native thread, known as the timer thread.


Disclaimer
This article is about MRI Ruby only! If you are using another interpreter, such as JRuby the situation is quite different.
Notes
Software: MRI Ruby (versions 1.8.7-p376, 1.9.3-p547), Ubuntu 14.04.1 LTS (Kernel 3.13.0-32, 64-bit), and ps.