SPDK Spinlocks Introduction
The recently introduced spdk_spinlock
offers safety and debugging improvements
over pthread_mutex_t
.
Introduction
While SPDK’s concurrency is primarily based on
lock-free message passing, there are many cases where locks are used.
Historically, these locks have tended to be POSIX mutexes, pthread_mutex_t
.
POSIX mutexes have proven to be functional but they offer room for improvement
in the areas of compatibility with SPDK’s concurrency model and ability to be
debugged. This post introduces SPDK spinlocks as an improvement over POSIX
mutexes or POSIX spinlocks.
While the most natural path would have been to introduce SPDK mutexes, a spinlock is a better fit for use in an application that usually polls and strives to avoid system calls.
Quick Start
Later sections of this post go into a lot of details. If you are just trying to
use an spdk_spinlock
, this is for you.
Example
This example should look quite straight-forward if you have used
pthread_mutex_t
or pthread_spinlock_t
. The key differences are that
spdk_spin_init()
only takes one argument and spdk_spin_held()
exists.
#include "spdk/thread.h"
struct foo {
/* ... */
struct spdk_spinlock lock;
};
struct foo *
foo_alloc_and_init(void)
{
struct foo *foo;
foo = calloc(1, sizeof(*foo));
spdk_spin_init(&foo->lock);
return foo;
}
void
foo_destroy(struct foo *foo)
{
spdk_spin_destroy(&foo->lock);
free(foo);
}
void
do_something_locked(struct foo *foo)
{
assert(spdk_spin_held(&foo->lock));
/* ... */
}
void
do_something(struct foo *foo)
{
spdk_spin_lock(&foo->lock);
do_something_locked(foo);
spdk_spin_unlock(&foo->lock);
}
The rules
Follow these rules for using SPDK spinlocks and you will be in good shape.
- Initialize the lock before first use.
- Only lock when running from an SPDK thread.
- Unlock before returning from a poller or message function. This is most easily achieved by ensuring all locks taken in a function are released before returning from the function.
- Do not call
spdk_thread_exec_msg()
while holding a lock. Usespdk_thread_send_msg()
instead. - Destroy a lock when you are done with it.
Failure to follow these rules are programming errors and are likely to result in
the SPDK thread library calling abort()
.
Friendly advice
- Use
assert(spdk_lock_held())
at the top of every function that assumes a lock is held. Do this instead of adding a comment. - When you suspect a locking related error and the code is currently using a
normal or default
pthread_mutex_t
, convert it to anspdk_spinlock
and sprinkle inassert(spdk_lock_held())
. - Keep an eye out for future improvements in this area that make debugging with SPDK spinlocks even more of a joy.
Motivation
In the past, I did a lot of C development on Solaris and its derivative,
illumos. On these platforms, mutexes are error check
by default and have a handy function called mutex_owned()
which is often
invoked as
MUTEX_HELD()
. Most commonly, it is
ASSERT(MUTEX_HELD(...))
.
Assertions serve two important purposes:
- They perform runtime checks to be sure you didn’t mess up.
- With a simple glance it is easy to understand state, freeing the programmer’s mind to worry about other things that they are likely to mess up.
In bdev.c
, concurrency was handled by a mixture of message passing and locking
with mutexes. How concurrency was handled was sometimes obvious, sometimes
documented in comments, and sometimes the comments and reality were not in
agreement.
My first step was to implement a form of mutex_held()
based on 17 year old
advice
from my former colleague, Casper Dik. As I started to consider the role of POSIX
locks in SPDK, I came to realize that there were other problems to solve in this
area. The following subsections describe those problems.
Safety of POSIX locks on SPDK threads
A pthread_mutex_t
is designed to work with a pthread_t
. Generally code that
is running on a thread has that thread to itself until the thread terminates. In
typical SPDK apps, each pthread_t
will be running several pollers and is
likely to have messages sent to it.
Suppose this happens on a pthread_t
:
poller_a
callspthread_mutex_lock(&g_lock)
, then returns.poller_b
callspthread_mutex_lock(&g_lock)
.
What happens when pthread_b
tries to take that lock is undefined behavior
(UB) in most cases in SPDK. It could be any of the following:
- If
g_lock
is initialized as a normal mutex,pthread_mutex_lock()
never returns: this is deadlock. - If
g_lock
is initialized as an errorcheck mutex, an error is returned. - If
g_lock
is initialized as a recursive mutex, the lock succeeds.
Most mutexes in SPDK are initialized with default attributes. Default is what leads to undefined behavior. In practice, today default means it is a normal mutex. So, in most cases today’s SPDK code would deadlock in this scenario. Surely that’s not very helpful.
While a small minority of the pthread_mutex_lock()
calls check their return
value, none of them are initialized with PTHREAD_MUTEX_ERRORCHECK
. Thus,
checking the return value is mostly pointless. Note that errorcheck mutexes can
be very helpful and are useful for building functions like mutex_held()
.
There are some mutexes that are initialized with PTHREAD_MUTEX_RECURSIVE
. In
the flow described above poller_a
and poller_b
could both hold the lock, but
only if they are scheduled on the same pthread_t
. If allowed to happen, this
could defeat the purpose of having a mutex. The risk can be avoided by ensuring
that the lock is not held when switching pollers or message functions.
While this covers pthread_mutex_t
, it is largely the same for
pthread_spinlock_t
, except pthread spinlocks do not have an errorcheck or
recursive attribute.
Spinlocks are a better fit
In the best case, a mutex is not contended and can be locked with a simple
atomic operation. If there is contention on the mutex, this leads to a system
call that puts the pthread_t
to sleep until the lock is available. This
becomes somewhat inefficient in the typical SPDK application because a failure
to obtain the lock has the overhead of a couple system calls: the one for the
waiter and one for the thread releasing the lock. Since SPDK typically consumes
100% of each CPU thread that has runnable SPDK pollers, putting the pthread_t
to sleep does not free up resources that will be used by other threads or
processes. Even if it does allow something else to occupy that CPU thread, this
may be undesirable because it may introduce scheduling latency when the lock
becomes available for the waiting thread.
Spinlocks are obtained with atomic operations without the assistance of the kernel. Thus, when the first attempt to take a lock fails, the next try is just a few instructions later.
Some pthread implementations support adaptive mutexes. An adaptive mutex will act like a spinlock for a brief period, but then act like a mutex if a brief period of spinning cannot obtain the lock. This may be a better middle ground, but is not widely available.
SPDK spinlock implementation
The spdk_spinlock
implementation is a wrapper around POSIX spinlocks. The
core aspects of SPDK spinlocks are:
- Each
spdk_spinlock
structure contains a link to thespdk_thread
that holds the lock. This implies that aspdk_spinlock
can only be locked from an SPDK thread. This value is updated only while the associatedpthread_spinlock_t
is held. - Include error checks and
abort()
if any error fails. All errors are programming errors, not runtime failures. Once a programming error involving locking has been identified, it is not safe to continue. - Provide
spdk_spin_held()
so that it may be used liberally asassert(spdk_spin_held())
as a way to check expectations on debug builds and as a form of documentation that cannot get out of sync with the implementation.
An early success story
Shortly after implementing SPDK spinlocks, the problem described in #2831 was reported. That is, there was a failed assertion while the blobstore was allocating metadata pages:
bs_claim_md_page: Assertion `spdk_bit_array_get(bs->used_md_pages, page) == false' failed.
This code had no comments or logic indicating how it should be called, so I first added a check to see if it was only called on the blobstore metadata thead.
diff --git a/lib/blob/blobstore.c b/lib/blob/blobstore.c
index 635eb99ef..32de508da 100644
--- a/lib/blob/blobstore.c
+++ b/lib/blob/blobstore.c
@@ -73,6 +73,7 @@ bs_get_snapshot_entry(struct spdk_blob_store *bs, spdk_blob_id blobid)
static void
bs_claim_md_page(struct spdk_blob_store *bs, uint32_t page)
{
+ assert(spdk_get_thread() == bs->md_thread);
assert(page < spdk_bit_array_capacity(bs->used_md_pages));
assert(spdk_bit_array_get(bs->used_md_pages, page) == false);
This quickly showed that it was called from at least one other thread. I then took a few minutes to convert the lock to an SPDK spinlock.
Both used_md_pages
and used_clusters
tend to be updated near each other, and
may be updated while holding bs->used_clusters_mutex
. I did just a little more
work than a global search and replace to convert this mutex to an SPDK spinlock.
That was:
- In
blobstore.c
,s/pthread_mutex/pthread_spin/g
- In
blobstore.c
, remove the second argument (NULL
) fromspdk_spin_init()
(waspthread_mutex_init()
prior to step 1). - In
blobstore.c
,s/used_clusters_lock/used_lock/g
- In the declaration of
struct spdk_blob_store
inblobstore.h
, changepthread_mutex_t used_clusters_mutex
tospdk_spin_lock used_lock
.
I then updated the assertion I added earlier to:
diff --git a/lib/blob/blobstore.c b/lib/blob/blobstore.c
index 635eb99ef..496dbe4c4 100644
--- a/lib/blob/blobstore.c
+++ b/lib/blob/blobstore.c
@@ -73,6 +73,7 @@ bs_get_snapshot_entry(struct spdk_blob_store *bs, spdk_blob_id blobid)
static void
bs_claim_md_page(struct spdk_blob_store *bs, uint32_t page)
{
+ assert(spdk_mutex_held(&bs->used_lock));
assert(page < spdk_bit_array_capacity(bs->used_md_pages));
assert(spdk_bit_array_get(bs->used_md_pages, page) == false);
I added the same assertion in bs_release_md_page()
, bs_claim_cluster()
, and
bs_release_cluster()
.
I used the fio-based test described in
#2831 to find a bunch of places that
pthread_spin_lock()
was missing, then submitted the change to gerrit to allow
SPDK’s CI to find a few more.
Once I decided to convert the pthread_mutex_t
to an spdk_spinlock
, it took
less than 15 minutes to have my first backtrace that pointed to a missing
spdk_spin_lock()
.