The NVMe over Fabrics target is a user space application that presents block devices over the network using RDMA. It requires an RDMA-capable NIC with its corresponding OFED software package installed to run. The target should work on all flavors of RDMA, but it is currently tested against Mellanox NICs (RoCEv2) and Chelsio NICs (iWARP).
The NVMe over Fabrics specification defines subsystems that can be exported over the network. SPDK has chosen to call the software that exports these subsystems a "target", which is the term used for iSCSI. The specification refers to the "client" that connects to the target as a "host". Many people will also refer to the host as an "initiator", which is the equivalent thing in iSCSI parlance. SPDK will try to stick to the terms "target" and "host" to match the specification.
The Linux kernel also implements an NVMe-oF target and host, and SPDK is tested for interoperability with the Linux kernel implementations.
If you want to kill the application using signal, make sure use the SIGTERM, then the application will release all the share memory resource before exit, the SIGKILL will make the share memory resource have no chance to be released by application, you may need to release the resource manually.
This guide starts by assuming that you can already build the standard SPDK distribution on your platform. By default, the NVMe over Fabrics target is not built. To build nvmf_tgt there are some additional dependencies.
Then build SPDK with RDMA enabled:
Once built, the binary will be in
Before starting our NVMe-oF target we must load the InfiniBand and RDMA modules that allow userspace processes to use InfiniBand/RDMA verbs directly.
Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
An NVMe over Fabrics target can be configured using JSON RPCs. The basic RPCs needed to configure the NVMe-oF subsystem are detailed below. More information about working with NVMe over Fabrics specific RPCs can be found on the NVMe-oF Target RPC page.
Using .ini style configuration files for configuration of the NVMe-oF target is deprecated and should be replaced with JSON based RPCs. .ini style configuration files can be converted to json format by way of the new script
Start the nvmf_tgt application with elevated privileges. Once the target is started, the nvmf_create_transport rpc can be used to initialize a given transport. Below is an example where the target is started and the RDMA transport is configured with an I/O unit size of 8192 bytes, 4 max qpairs per controller, and an in capsule data size of 0 bytes.
Below is an example of creating a malloc bdev and assigning it to a subsystem. Adjust the bdevs, NQN, serial number, and IP address to your own circumstances.
NVMe qualified names or NQNs are defined in section 7.9 of the NVMe specification. SPDK has attempted to formalize that definition using Extended Backus-Naur form. SPDK modules use this formal definition (provided below) when validating NQNs.
Please note that the following types from the definition above are defined elsewhere:
While not stated in the formal definition, SPDK enforces the requirement from the spec that the "maximum name is 223 bytes in length". SPDK does not include the null terminating character when defining the length of an nqn, and will accept an nqn containing up to 223 valid bytes with an additional null terminator. To be precise, SPDK follows the same conventions as the c standard library function strlen().
SPDK compares NQNs byte for byte without case matching or unicode normalization. This has specific implications for uuid based NQNs. The following pair of NQNs, for example, would not match when compared in the SPDK NVMe-oF Target:
In order to ensure the consistency of uuid based NQNs while using SPDK, users should use lowercase when representing alphabetic hex digits in their NQNs.
SPDK uses the DPDK Environment Abstraction Layer to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides functions to assign threads to specific cores. To ensure the SPDK NVMe-oF target has the best performance, configure the NICs and NVMe devices to be located on the same NUMA node.
-m core mask option specifies a bit mask of the CPU cores that SPDK is allowed to execute work items on. For example, to allow SPDK to use cores 24, 25, 26 and 27:
Both the Linux kernel and SPDK implement an NVMe over Fabrics host. The Linux kernel NVMe-oF RDMA host support is provided by the
The nvme-cli tool may be used to interface with the Linux kernel NVMe over Fabrics host.
SPDK has a tracing framework for capturing low-level event information at runtime. NVMe-oF Target Tracepoints enable analysis of both performance and application crashes.
As RDMA NICs put a limitation on the number of memory regions registered, the SPDK NVMe-oF target application may eventually start failing to allocate more DMA-able memory. This is an imperfection of the DPDK dynamic memory management and is most likely to occur with too many 2MB hugepages reserved at runtime. Some of our NICs report as many as 2048 for the maximum number of memory regions, meaning that exactly that many pages can be allocated. With 2MB hugepages, this gives us a 4GB memory limit. It can be overcome by using 1GB hugepages or by pre-reserving memory at application startup with
-s option. All pre-reserved memory will be registered as a single region, but won't be returned to the system until the SPDK application is terminated.