CodeSport (January 2011)

Concept of transactional memory

Welcome to this month’s CodeSport, in which we discuss the concept of transactional memory, with a focus on software transactional memory.

In last month’s column, we featured a bunch of questions on operating systems, computer architecture and algorithms. A thank-you to those who sent in their replies to the questions. Congratulations to R Sridhar and K Srividya for getting  correct solutions to 10 questions. Since I did not get solutions to all the problems that were posed, I am going to keep the unsolved problems open for this month as well. Please do send in your solutions, quickly.

I had a request from one of our student readers, R Balamurugan, who wanted me to discuss the idea of transactional memory in our column — which is what I will do this month. I will cover why it is needed, what are the types of transactional memory, etc., with a focus on software transactional memory. Here’s a request to our readers: if you would like me to discuss a specific topic in the column, please do drop me a mail, and I will try to address that topic within a couple of months of receiving your request.

Transactional memory — what is the need?

With the advent of multi-core processors, parallel programming is going mainstream. In addition to the programming complexities encountered in sequential programming, in parallel programming there is also the added burden of reasoning about data sharing and communication among concurrently executing tasks. Concurrently executing tasks are typically implemented on top of a threading layer such as POSIX Pthreads, or the WinThreads API. Shared data structures in a multi-threaded application need to be protected against concurrent accesses by threads. This is typically achieved by using synchronisation in the form of explicit locks, which are used to protect the shared data. However lock-based synchronisation has proved to be complicated and cumbersome, due to many reasons.

Problems with lock-based synchronisation

Locks can be employed at different granularities. For example, consider a hash table implemented as an array of buckets, with the array index decided by the hash key. Each bucket is a singly linked list. Now the programmer can choose to employ locks of different granularities to protect this hash table against concurrent accesses by multiple threads. The simplest and coarsest-grained locking would be to associate a single lock with the entire hash table. However, such coarse-grained locks inhibit scalability and concurrency.

Consider two threads, each of which are trying to insert elements e1 and e2 into the hash table. Even if their hash values are different, for them to hash to different buckets with our coarse-grained locking scheme, they cannot proceed in parallel, thereby reducing concurrency. All the threads’ access to the hash table is serialised, which also impacts the application’s scalability.

At the other extreme of locking granularity is extremely fine-grained locking, where we could associate a single lock with each node of each bucket. While this will improve concurrency, the locking overheads can become high (due to the large number of locks associated with the data structure). Also, reasoning about a large number of locks becomes difficult for the programmer.

For example, each of us learnt how to implement a queue, which supports enqueue/dequeue operations for sequential programs, as a programming assignment in our first course on programming. On the other hand, implementing a concurrent queue that supports parallel enqueue/dequeue operations can be a publishable result in software conferences, because of the difficulties in implementing it efficiently and correctly. Even a well-known sequential algorithm such as queue enqueue/dequeue becomes complicated because of programmers’ difficulties in reasoning about concurrent threads that could be operating on the queue simultaneously. The more the number of locks, the greater is the difficulty for software maintainers to reason about what data is protected by what locks, and what is the right lock-acquisition order to be followed.

Locks guarantee isolation only when a program consistently follows a locking discipline. Any violation of the locking discipline leads to concurrency bugs such as race conditions, atomicity violations and deadlocks. Furthermore, lock-based synchronisation mechanisms lack composability, which often precludes the modular design of concurrent components. For a detailed discussion on the perils and pitfalls associated with lock-based synchronisation, refer to Edward Lee’s article titled, “The problems with threads” [PDF].

Atomic sections

Therefore, of late, there has been considerable research work to come up with scalable and programmer-friendly alternatives to lock-based synchronisation. Atomic sections have been proposed as a scalable, performance- and programmer-friendly alternative. Atomic sections allow the programmer to express synchronisation at a higher level of abstraction than locks.

Programmers can specify what code has to execute atomically by simply enclosing the desired block of code with the keyword atomic. Atomic sections are an interesting alternative to locks, as they allow local reasoning, and are composable.

Consider a shared global counter incremented by multiple threads. In traditional lock-based code, we would associate a mutual exclusion lock to protect access to it from concurrent threads. For instance, the counter increment code would look as follows:

pthread_mutex_lock(&counter_lock);
counter++;
pthread_mutex_unlock(&counter_lock);

If we rewrite it using atomic sections, the code appears as shown below:

atomic {
    counter++;
}

The main difference between the two code snippets is that when using atomic sections, there is no need for the programmer to explicitly specify the locks. The underlying transactional memory implementation ensures that the critical section is executed atomically, without the programmer having to reason what locks need to be acquired to protect this critical section. This greatly simplifies writing concurrent code. At the same time, it moves the complexity to the underlying transactional memory implementation.

Transactional memory

Two types of transactional memory implementations are possible — one in hardware, and the other in software. Typically, hardware transactional memory is built on top of the cache-coherence protocol employed in multiprocessor systems. An excellent writeup of hardware transactional memory can be found in the technical report, “Experiences with a commercial hardware transactional memory implementation,” by David Dice et al. [PDF]. Note that this paper describes the only commercial hardware implementation to date — namely, in Sun’s Rock processor.

There has been considerable work on software transactional memory implementations. While there have been a number of academic implementations, there are also commercial implementations available. For instance, the Intel C/C++ compiler supports atomic sections using software transactional memory.

By now, you may be wondering about the origin of the term “transactional memory”. Well, just as database transactions support the “ACID” properties of (a) Atomicity (b) Consistency (c) Isolation and (d) Durability, transactional memory ensures atomicity, consistency and isolation for transactions in memory (since the transactions are performed on memory, the durability property is not needed). Hence the term “software transactional memory” is used to refer to the implementations that support atomic sections through software, without requiring any hardware support.

Software Transactional Memory (STM)

In simple terms, STM is implemented by associating a lock with every word of the memory. When a code enclosed in an atomic section is executed by the thread, the underlying STM acquires the locks associated with the shared data inside the atomic section. Note that it is possible that another transaction may have acquired some of the locks needed by the current transaction.

In such a case, only one of the two conflicting transactions can proceed, and the other needs to be aborted. The aborting transaction needs to roll back the effects of the transactional code it has executed so far.

Roll-back — UNDO or REDO

In order to support rollback, all transactional accesses are recorded. Transactions support rollback through either ‘UNDO’ logs or ‘REDO’ logs. In STMs employing ‘UNDO’ logs, transactions directly update the locations in memory, while recording the old values present in the updated memory locations into an ‘UNDO’ log.

In the event of an abort, the memory locations are rolled back to the old values stored in the UNDO log. In the case of ‘REDO’ logs, transactions do not directly update the memory locations. Instead, all updates are made to a ‘REDO’ log, which records the new values written by the transaction.

Once the transaction determines that it has acquired all the locks it needs, and that there are no conflicting transactions, it is ready to commit its changes to the memory. During the commit, the ‘REDO’ log is used to update the shared memory locations.

Other considerations

There is a lot more to STM than what I have described here. For instance, an atomic section can contain code that performs irrevocable actions, such as writing to a file on disk, or printing to an external device. How can such transactions be rolled back? What would happen if a transactional memory location is accessed outside an atomic section, while other transactions accessing that particular memory location are executing concurrently? Some of these issues need to be dealt with efficiently, if STM needs to be adopted in mainstream software.

There are also various other design considerations for STMs such as (a) What should be the locking granularity? (b) whether the STM should use visible reads or invisible reads, and (c) whether it should acquire locks eagerly or lazily. But do note that all these complexities are only faced by designers of the STM implementations. The programmers who will use STM only need to specify the atomic sections, and do not have to worry about any of these design issues. An excellent introductory article on parallel programming using software transactional memory, by Ulrich Drepper, Red Hat, is available here.

The fact that we need to record all transactional accesses in order to support rollback, leads to considerable book-keeping overheads, which impacts STM performance. There is an interesting paper titled “Software Transactional Memory — Is it only a research toy?” [link], which discusses some of the major performance bottlenecks encountered in STM. We will continue our discussion on STM in next month’s column.

My ‘must-read book’ for this month

This month’s “must-read book” is a long-time favourite of mine: Computer Architecture — a Quantitative Approach by Hennessy & Patterson, Third Edition, published by Morgan Kaufmann. Many of you may wonder why a book on computer architecture is a must-read for programmers. The reason is that programmers need to write code that works correctly and efficiently on the underlying hardware, and this would not be possible unless they have a good understanding of what happens ‘under the hood’ in hardware.

For instance, if you are writing concurrent code, you need to be aware of various architectural issues such as memory consistency, cache coherence, and false sharing — so a knowledge of computer architecture internals is a must for every programmer. I like this book especially because it provides an excellent introduction to the basics of computer architecture, with practical examples and exercises.

If you have a favourite programming book that you think is a must-read for every programmer, please do send me a note with the book’s name, and a short description of why you think it is useful. I will  mention it in this column. This would help many readers who want to improve their coding skills.

I look forward to your answers to this month’s questions. If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me, along with feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming and here’s wishing you the very best!

Feature image courtesy: equinoxefr. Reused under the terms of CC-BY-SA 2.0 License.
  • ktkaushik

    this is so informative, thanks.

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.