fi_verbs - The Verbs Fabric Provider
Contents
Limitations
MemoryRegions
Only FI_MR_BASIC mode is supported. Adding regions via s/g list is supported only up to a s/g list size
of 1. No support for binding memory regions to a counter.
Waitobjects
Only FI_WAIT_FD wait object is supported only for FI_EP_MSG endpoint type. Wait sets are not supported.
ResourceManagement
Application has to make sure CQs are not overrun as this cannot be detected by the provider.
UnsupportedFeatures
The following features are not supported in verbs provider:
UnsupportedCapabilities
FI_NAMED_RX_CTX, FI_DIRECTED_RECV, FI_TRIGGER, FI_RMA_EVENT
Otherunsupportedfeatures
Scalable endpoints, FABRIC_DIRECT
UnsupportedfeaturesspecifictoMSGendpoints
• Counters, FI_SOURCE, FI_TAGGED, FI_PEEK, FI_CLAIM, fi_cancel, fi_ep_alias, shared TX context, cq_read‐
from operations.
• Completion flags are not reported if a request posted to an endpoint completes in error.
Fork
The support for fork in the provider has the following limitations:
• Fabric resources like endpoint, CQ, EQ, etc. should not be used in the forked process.
• The memory registered using fi_mr_reg has to be page aligned since ibv_reg_mr marks the entire page
that a memory region belongs to as not to be re-mapped when the process is forked (MADV_DONTFORK).
XRCTransport
The XRC transport is intended to be used when layered with the RXM provider and requires the use of
shared receive contexts. See fi_rxm(7). To enable XRC, the following environment variables must usually
be set: FI_VERBS_PREFER_XRC and FI_OFI_RXM_USE_SRX.
Name
fi_verbs - The Verbs Fabric Provider
Overview
The verbs provider enables applications using OFI to be run over any verbs hardware (Infiniband, iWarp,
etc). It uses the Linux Verbs API for network transport and provides a translation of OFI calls to ap‐
propriate verbs API calls. It uses librdmacm for communication management and libibverbs for other con‐
trol and data transfer operations.
Requirements
To successfully build and install verbs provider as part of libfabric, it needs the following packages: *
libibverbs * libibverbs-devel * librdmacm * librdmacm-devel
You may also want to look into any OS specific instructions for enabling RDMA. e.g. RHEL has instruc‐
tions on their documentation for enabling RDMA.
The IPoIB interface should be configured with a valid IP address. This is a requirement from librdmacm.
Runtime Parameters
The verbs provider checks for the following environment variables.
Commonvariables:FI_VERBS_TX_SIZE
Default maximum tx context size (default: 384)
FI_VERBS_RX_SIZE
Default maximum rx context size (default: 384)
FI_VERBS_TX_IOV_LIMIT
Default maximum tx iov_limit (default: 4). Note: RDM (internal - deprecated) EP type supports on‐
ly 1
FI_VERBS_RX_IOV_LIMIT
Default maximum rx iov_limit (default: 4). Note: RDM (internal - deprecated) EP type supports on‐
ly 1
FI_VERBS_INLINE_SIZE
Maximum inline size for the verbs device. Actual inline size returned may be different depending
on device capability. This value will be returned by fi_info as the inject size for the applica‐
tion to use. Set to 0 for the maximum device inline size to be used. (default: 256).
FI_VERBS_MIN_RNR_TIMER
Set min_rnr_timer QP attribute (0 - 31) (default: 12)
FI_VERBS_CQREAD_BUNCH_SIZE
The number of entries to be read from the verbs completion queue at a time (default: 8).
FI_VERBS_PREFER_XRC
Prioritize XRC transport fi_info before RC transport fi_info (default: 0, RC fi_info will be be‐
fore XRC fi_info)
FI_VERBS_GID_IDX
The GID index to use (default: 0)
FI_VERBS_DEVICE_NAME
Specify a specific verbs device to use by name
FI_VERBS_USE_DMABUF
If supported, try to use ibv_reg_dmabuf_mr first to register dmabuf-based buffers. Set it to “no”
to always use ibv_reg_mr which can be helpful for testing the functionality of the dmabuf_peer_mem
hooking provider and the corresponding kernel driver. (default: yes)
VariablesspecifictoMSGendpointsFI_VERBS_IFACE
The prefix or the full name of the network interface associated with the verbs device (default:
ib)
VariablesspecifictoDGRAMendpointsFI_VERBS_DGRAM_USE_NAME_SERVER
The option that enables/disables OFI Name Server thread. The NS thread is used to resolve IP-ad‐
dresses to provider specific addresses (default: 1, if “OMPI_COMM_WORLD_RANK” and “PMI_RANK” envi‐
ronment variables aren’t defined)
FI_VERBS_NAME_SERVER_PORT
The port on which Name Server thread listens incoming connections and requests (default: 5678)
Environmentvariablesnotes
The fi_info utility would give the up-to-date information on environment variables: fi_info -p verbs -e
See Also
fabric(7), fi_provider(7),
Supported Features
The verbs provider supports a subset of OFI features.
Endpointtypes
FI_EP_MSG, FI_EP_DGRAM (beta), FI_EP_RDM.
FI_EP_RDM is supported via OFI RxM and RxD utility providers which are layered on top of verbs. To the
app, the provider name string would appear as “verbs;ofi_rxm” or “verbs;ofi_rxd”. Please refer the man
pages for RxM (fi_rxm.7) and RxD (fi_rxd.7) to know about the capabilities and limitations for the
FI_EP_RDM endpoint.
EndpointcapabilitiesandfeaturesMSGendpoints
FI_MSG, FI_RMA, FI_ATOMIC and shared receive contexts.
DGRAMendpoints
FI_MSG
Modes
Verbs provider requires applications to support the following modes:
FI_EP_MSGendpointtype
• FI_MR_LOCAL mr mode.
• FI_RX_CQ_DATA for applications that want to use RMA. Applications must take responsibility of posting
receives for any incoming CQ data.
AddressingFormats
Supported addressing formats include * MSG and RDM (internal - deprecated) EPs support: FI_SOCKADDR,
FI_SOCKADDR_IN, FI_SOCKADDR_IN6, FI_SOCKADDR_IB * DGRAM supports: FI_ADDR_IB_UD
Progress
Verbs provider supports FI_PROGRESS_AUTO: Asynchronous operations make forward progress automatically.
Operationflags
Verbs provider supports FI_INJECT, FI_COMPLETION, FI_REMOTE_CQ_DATA, FI_TRANSMIT_COMPLETE.
MsgOrdering
Verbs provider support the following message ordering:
• Read after Read
• Read after Write
• Read after Send
• Write after Write
• Write after Send
• Send after Write
• Send after Send
Fork
Verbs provider does not provide fork safety by default. Fork safety can be requested by setting
IBV_FORK_SAFE, or RDMAV_FORK_SAFE. If the system configuration supports the use of huge pages, it is
recommended to set RDMAV_HUGEPAGES_SAFE. See ibv_fork_init(3) for additional details.
MemoryRegistrationCache
The verbs provider uses the common memory registration cache functionality that’s part of libfabric util‐
ity code. This speeds up memory registration calls from applications by caching registrations of fre‐
quently used memory regions. Please refer to fi_mr(3): Memory Registration Cache section for more de‐
tails.
Troubleshooting / Known Issues
fi_getinforeturns-FI_ENODATA
• Set FI_LOG_LEVEL=info or FI_LOG_LEVEL=debug (if debug build of libfabric is available) and check if
there any errors because of incorrect input parameters to fi_getinfo.
• Check if “fi_info -p verbs” is successful. If that fails the following checklist may help in ensuring
that the RDMA verbs stack is functional:
• If libfabric was compiled, check if verbs provider was built. Building verbs provider would be
skipped if its dependencies (listed in requirements) aren’t available on the system.
• Verify verbs device is functional:
• Does ibv_rc_pingpong (available in libibverbs) test work?
• Does ibv_devinfo (available in libibverbs) show the device with PORT_ACTIVE status?
• Check if Subnet Manager (SM) is running on the switch or on one of the nodes in the cluster.
• Is the cable connected?
• Verify librdmacm is functional:
• Does ucmatose test (available in librdmacm) work?
• Is the IPoIB interface (e.g. ib0) up and configured with a valid IP address?
Otherissues
When running an app over verbs provider with Valgrind, there may be reports of memory leak in functions
from dependent libraries (e.g. libibverbs, librdmacm). These leaks are safe to ignore.
The provider protects CQ overruns that may happen because more TX operations were posted to endpoints
than CQ size. On the receive side, it isn’t expected to overrun the CQ. In case it happens the applica‐
tion developer should take care not to post excess receives without draining the CQ. CQ overruns can
make the MSG endpoints unusable.
