This programming guide is intended for developers authoring their own block device modules to integrate with SPDK's bdev layer. For a guide on how to use the bdev layer, see Block Device Layer Programming Guide.
A block device module is SPDK's equivalent of a device driver in a traditional operating system. The module provides a set of function pointers that are called to service block device I/O requests. SPDK provides a number of block device modules including NVMe, RAM-disk, and Ceph RBD. However, some users will want to write their own to interact with either custom hardware or to an existing storage software stack. This guide is intended to demonstrate exactly how to write a module.
Block device modules are located in subdirectories under lib/bdev today. It is not currently possible to place the code for a bdev module elsewhere, but updates to the build system could be made to enable this in the future. To create a module, add a new directory with a single C file and a Makefile. A great starting point is to copy the existing 'null' bdev module.
The primary interface that bdev modules will interact with is in include/spdk/bdev_module.h. In that header a macro is defined that registers a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a pointer spdk_bdev_module structure that is used to register new bdev module.
The spdk_bdev_module structure describes the module properties like initialization (
module_init) and teardown (
module_fini) functions, the function that returns context size (
get_ctx_size) - scratch space that will be allocated in each I/O request for use by this module, and a callback that will be called each time a new bdev is registered by another module (
examine_disk). Please check the documentation of struct spdk_bdev_module for more details.
New bdevs are created within the module by calling spdk_bdev_register(). The module must allocate a struct spdk_bdev, fill it out appropriately, and pass it to the register call. The most important field to fill out is
fn_table, which points at this data structure:
The bdev module must implement these function callbacks.
destruct function is called to tear down the device when the system no longer needs it. What
destruct does is up to the module - it may just be freeing memory or it may be shutting down a piece of hardware.
io_type_supported function returns whether a particular I/O type is supported. The available I/O types are:
For the simplest bdev modules, only
SPDK_BDEV_IO_TYPE_WRITE are necessary.
SPDK_BDEV_IO_TYPE_UNMAP is often referred to as "trim" or "deallocate", and is a request to mark a set of blocks as no longer containing valid data.
SPDK_BDEV_IO_TYPE_FLUSH is a request to make all previously completed writes durable. Many devices do not require flushes.
SPDK_BDEV_IO_TYPE_WRITE_ZEROES is just like a regular write, but does not provide a data buffer (it would have just contained all 0's). If it isn't supported, the generic bdev code is capable of emulating it by sending regular write requests.
SPDK_BDEV_IO_TYPE_RESET is a request to abort all I/O and return the underlying device to its initial state. Do not complete the reset request until all I/O has been completed in some way.
SPDK_BDEV_IO_TYPE_NVME_IO_MD are all mechanisms for passing raw NVMe commands through the SPDK bdev layer. They're strictly optional, and it probably only makes sense to implement those if the backing storage device is capable of handling NVMe commands.
get_io_channel function should return an I/O channel. For a detailed explanation of I/O channels, see Message Passing and Concurrency. The generic bdev layer will call
get_io_channel one time per thread, cache the result, and pass that result to
submit_request. It will use the corresponding channel for the thread it calls
submit_request function is called to actually submit I/O requests to the block device. Once the I/O request is completed, the module must call spdk_bdev_io_complete(). The I/O does not have to finish within the calling context of
Block devices are considered virtual if they handle I/O requests by routing the I/O to other block devices. The canonical example would be a bdev module that implements RAID. Virtual bdevs are created in the same way as regular bdevs, but take one additional step. The module can look up the underlying bdevs it wishes to route I/O to using spdk_bdev_get_by_name(), where the string name is provided by the user in a configuration file or via an RPC. The module then may proceed is normal by opening the bdev to obtain a descriptor, and creating I/O channels for the bdev (probably in response to the
get_io_channel callback). The final step is to have the module use its open descriptor to call spdk_bdev_module_claim_bdev(), indicating that it is consuming the underlying bdev. This prevents other users from opening descriptors with write permissions. This effectively 'promotes' the descriptor to write-exclusive and is an operation only available to bdev modules.