SPDK Spinlocks Introduction

Jan 4, 2023 • Mike Gerdts

The recently introduced spdk_spinlock offers safety and debugging improvements over pthread_mutex_t.

Introduction

While SPDK’s concurrency is primarily based on lock-free message passing, there are many cases where locks are used. Historically, these locks have tended to be POSIX mutexes, pthread_mutex_t. POSIX mutexes have proven to be functional but they offer room for improvement in the areas of compatibility with SPDK’s concurrency model and ability to be debugged. This post introduces SPDK spinlocks as an improvement over POSIX mutexes or POSIX spinlocks.

While the most natural path would have been to introduce SPDK mutexes, a spinlock is a better fit for use in an application that usually polls and strives to avoid system calls.

Quick Start

Later sections of this post go into a lot of details. If you are just trying to use an spdk_spinlock, this is for you.

Example

This example should look quite straight-forward if you have used pthread_mutex_t or pthread_spinlock_t. The key differences are that spdk_spin_init() only takes one argument and spdk_spin_held() exists.

#include "spdk/thread.h"

struct foo {
	/* ... */
	struct spdk_spinlock lock;
};

struct foo *
foo_alloc_and_init(void)
{
	struct foo *foo;

	foo = calloc(1, sizeof(*foo));
	spdk_spin_init(&foo->lock);
	return foo;
}

void
foo_destroy(struct foo *foo)
{
	spdk_spin_destroy(&foo->lock);
	free(foo);
}

void
do_something_locked(struct foo *foo)
{
	assert(spdk_spin_held(&foo->lock));
	/* ... */
}

void
do_something(struct foo *foo)
{
	spdk_spin_lock(&foo->lock);
	do_something_locked(foo);
	spdk_spin_unlock(&foo->lock);
}

The rules

Follow these rules for using SPDK spinlocks and you will be in good shape.

Initialize the lock before first use.
Only lock when running from an SPDK thread.
Unlock before returning from a poller or message function. This is most easily achieved by ensuring all locks taken in a function are released before returning from the function.
Do not call spdk_thread_exec_msg() while holding a lock. Use spdk_thread_send_msg() instead.
Destroy a lock when you are done with it.

Failure to follow these rules are programming errors and are likely to result in the SPDK thread library calling abort().

Friendly advice

Use assert(spdk_lock_held()) at the top of every function that assumes a lock is held. Do this instead of adding a comment.
When you suspect a locking related error and the code is currently using a normal or default pthread_mutex_t, convert it to an spdk_spinlock and sprinkle in assert(spdk_lock_held()).
Keep an eye out for future improvements in this area that make debugging with SPDK spinlocks even more of a joy.

Motivation

In the past, I did a lot of C development on Solaris and its derivative, illumos. On these platforms, mutexes are error check by default and have a handy function called mutex_owned() which is often invoked as MUTEX_HELD(). Most commonly, it is ASSERT(MUTEX_HELD(...)).

Assertions serve two important purposes:

They perform runtime checks to be sure you didn’t mess up.
With a simple glance it is easy to understand state, freeing the programmer’s mind to worry about other things that they are likely to mess up.

In bdev.c, concurrency was handled by a mixture of message passing and locking with mutexes. How concurrency was handled was sometimes obvious, sometimes documented in comments, and sometimes the comments and reality were not in agreement.

My first step was to implement a form of mutex_held() based on 17 year old advice from my former colleague, Casper Dik. As I started to consider the role of POSIX locks in SPDK, I came to realize that there were other problems to solve in this area. The following subsections describe those problems.

Safety of POSIX locks on SPDK threads

A pthread_mutex_t is designed to work with a pthread_t. Generally code that is running on a thread has that thread to itself until the thread terminates. In typical SPDK apps, each pthread_t will be running several pollers and is likely to have messages sent to it.

Suppose this happens on a pthread_t:

poller_a calls pthread_mutex_lock(&g_lock), then returns.
poller_b calls pthread_mutex_lock(&g_lock).

What happens when pthread_b tries to take that lock is undefined behavior (UB) in most cases in SPDK. It could be any of the following:

If g_lock is initialized as a normal mutex, pthread_mutex_lock() never returns: this is deadlock.
If g_lock is initialized as an errorcheck mutex, an error is returned.
If g_lock is initialized as a recursive mutex, the lock succeeds.

Most mutexes in SPDK are initialized with default attributes. Default is what leads to undefined behavior. In practice, today default means it is a normal mutex. So, in most cases today’s SPDK code would deadlock in this scenario. Surely that’s not very helpful.

While a small minority of the pthread_mutex_lock() calls check their return value, none of them are initialized with PTHREAD_MUTEX_ERRORCHECK. Thus, checking the return value is mostly pointless. Note that errorcheck mutexes can be very helpful and are useful for building functions like mutex_held().

There are some mutexes that are initialized with PTHREAD_MUTEX_RECURSIVE. In the flow described above poller_a and poller_b could both hold the lock, but only if they are scheduled on the same pthread_t. If allowed to happen, this could defeat the purpose of having a mutex. The risk can be avoided by ensuring that the lock is not held when switching pollers or message functions.

While this covers pthread_mutex_t, it is largely the same for pthread_spinlock_t, except pthread spinlocks do not have an errorcheck or recursive attribute.

Spinlocks are a better fit

In the best case, a mutex is not contended and can be locked with a simple atomic operation. If there is contention on the mutex, this leads to a system call that puts the pthread_t to sleep until the lock is available. This becomes somewhat inefficient in the typical SPDK application because a failure to obtain the lock has the overhead of a couple system calls: the one for the waiter and one for the thread releasing the lock. Since SPDK typically consumes 100% of each CPU thread that has runnable SPDK pollers, putting the pthread_t to sleep does not free up resources that will be used by other threads or processes. Even if it does allow something else to occupy that CPU thread, this may be undesirable because it may introduce scheduling latency when the lock becomes available for the waiting thread.

Spinlocks are obtained with atomic operations without the assistance of the kernel. Thus, when the first attempt to take a lock fails, the next try is just a few instructions later.

Some pthread implementations support adaptive mutexes. An adaptive mutex will act like a spinlock for a brief period, but then act like a mutex if a brief period of spinning cannot obtain the lock. This may be a better middle ground, but is not widely available.

SPDK spinlock implementation

The spdk_spinlock implementation is a wrapper around POSIX spinlocks. The core aspects of SPDK spinlocks are:

Each spdk_spinlock structure contains a link to the spdk_thread that holds the lock. This implies that a spdk_spinlock can only be locked from an SPDK thread. This value is updated only while the associated pthread_spinlock_t is held.
Include error checks and abort() if any error fails. All errors are programming errors, not runtime failures. Once a programming error involving locking has been identified, it is not safe to continue.
Provide spdk_spin_held() so that it may be used liberally as assert(spdk_spin_held()) as a way to check expectations on debug builds and as a form of documentation that cannot get out of sync with the implementation.

An early success story

Shortly after implementing SPDK spinlocks, the problem described in #2831 was reported. That is, there was a failed assertion while the blobstore was allocating metadata pages:

bs_claim_md_page: Assertion `spdk_bit_array_get(bs->used_md_pages, page) == false' failed.

This code had no comments or logic indicating how it should be called, so I first added a check to see if it was only called on the blobstore metadata thead.

diff --git a/lib/blob/blobstore.c b/lib/blob/blobstore.c
index 635eb99ef..32de508da 100644
--- a/lib/blob/blobstore.c
+++ b/lib/blob/blobstore.c
@@ -73,6 +73,7 @@ bs_get_snapshot_entry(struct spdk_blob_store *bs, spdk_blob_id blobid)
 static void
 bs_claim_md_page(struct spdk_blob_store *bs, uint32_t page)
 {
+       assert(spdk_get_thread() == bs->md_thread);
        assert(page < spdk_bit_array_capacity(bs->used_md_pages));
        assert(spdk_bit_array_get(bs->used_md_pages, page) == false);

This quickly showed that it was called from at least one other thread. I then took a few minutes to convert the lock to an SPDK spinlock.

Both used_md_pages and used_clusters tend to be updated near each other, and may be updated while holding bs->used_clusters_mutex. I did just a little more work than a global search and replace to convert this mutex to an SPDK spinlock. That was:

In blobstore.c, s/pthread_mutex/pthread_spin/g
In blobstore.c, remove the second argument (NULL) from spdk_spin_init() (was pthread_mutex_init() prior to step 1).
In blobstore.c, s/used_clusters_lock/used_lock/g
In the declaration of struct spdk_blob_store in blobstore.h, change pthread_mutex_t used_clusters_mutex to spdk_spin_lock used_lock.

I then updated the assertion I added earlier to:

diff --git a/lib/blob/blobstore.c b/lib/blob/blobstore.c
index 635eb99ef..496dbe4c4 100644
--- a/lib/blob/blobstore.c
+++ b/lib/blob/blobstore.c
@@ -73,6 +73,7 @@ bs_get_snapshot_entry(struct spdk_blob_store *bs, spdk_blob_id blobid)
 static void
 bs_claim_md_page(struct spdk_blob_store *bs, uint32_t page)
 {
+       assert(spdk_mutex_held(&bs->used_lock));
        assert(page < spdk_bit_array_capacity(bs->used_md_pages));
        assert(spdk_bit_array_get(bs->used_md_pages, page) == false);

I added the same assertion in bs_release_md_page(), bs_claim_cluster(), and bs_release_cluster().

I used the fio-based test described in #2831 to find a bunch of places that pthread_spin_lock() was missing, then submitted the change to gerrit to allow SPDK’s CI to find a few more.

Once I decided to convert the pthread_mutex_t to an spdk_spinlock, it took less than 15 minutes to have my first backtrace that pointed to a missing spdk_spin_lock().