futhark-cuda - compile Futhark to CUDA
Contents
Copyright
2013-2020, DIKU, University of Copenhagen
0.25.27 Mar 02, 2025 FUTHARK-CUDA(1)
Description
futharkcuda translates a Futhark program to C code invoking CUDA kernels, and either compiles that C
code with a C compiler to an executable binary program, or produces a .h and .c file that can be linked
with other code. The standard Futhark optimisation pipeline is used.
futharkcuda uses -lcuda-lcudart-lnvrtc to link. If using --library, you will need to do the same when
linking the final binary.
The generated CUDA code can be called from multiple CPU threads, as it brackets every API operation with
cuCtxPushCurrent() and cuCtxPopCurrent().
Environment
If run without --library, futharkcuda will invoke a C compiler to compile the generated C program into a
binary. This only works if the C compiler can find the necessary CUDA libraries. On most systems, CUDA
is installed in /usr/local/cuda, which is usually not part of the default compiler search path. You may
need to set the following environment variables before running futharkcuda:
LIBRARY_PATH=/usr/local/cuda/lib64
LD_LIBRARY_PATH=/usr/local/cuda/lib64/
CPATH=/usr/local/cuda/include
At runtime the generated program must be able to find the CUDA installation directory, which is normally
located at /usr/local/cuda. If you have CUDA installed elsewhere, set any of the CUDA_HOME, CUDA_ROOT,
or CUDA_PATH environment variables to the proper directory.
Environment Variables
CC
The C compiler used to compile the program. Defaults to cc if unset.
CFLAGS
Space-separated list of options passed to the C compiler. Defaults to -O-std=c99 if unset.
Executable Options
Generated executables accept the same options as those generated by futhark-c. The -t option behaves as
with futhark-opencl.
The following additional options are accepted.
-h, --help
Print help text to standard output and exit.
--default-thread-block-size=INT
The default size of thread blocks that are launched. Capped to the hardware limit if necessary.
--default-num-thread-blocks=INT
The default number of thread blocks that are launched.
--default-threshold=INT
The default parallelism threshold used for comparisons when selecting between code versions
generated by incremental flattening. Intuitively, the amount of parallelism needed to saturate
the GPU.
--default-tile-size=INT
The default tile size used when performing two-dimensional tiling (the workgroup size will be the
square of the tile size).
--dump-cuda=FILE
Don’t run the program, but instead dump the embedded CUDA kernels to the indicated file. Useful
if you want to see what is actually being executed.
--dump-ptx=FILE
Don’t run the program, but instead dump the PTX-compiled version of the embedded kernels to the
indicated file.
--load-cuda=FILE
Instead of using the embedded CUDA kernels, load them from the indicated file.
--load-ptx=FILE
Load PTX code from the indicated file.
--nvrtc-option=OPT
Add an additional build option to the string passed to NVRTC. Refer to the CUDA documentation for
which options are supported. Be careful - some options can easily result in invalid results.
Name
futhark-cuda - compile Futhark to CUDA
Options
Accepts the same options as futhark-c.
See Also
futhark-opencl
Synopsis
futhark cuda [options…] <program.fut>
