lmbench - benchmarking toolbox
Contents
Description
Creating benchmarks using the lmbench timing harness is easy. Since it is so easy to measure performance
using lmbench, it is possible to quickly answer questions that arise during system design, development,
or tuning. For example, image processing
There are two attributes that are critical for performance, latency and bandwidth, and lmbench´s timing
harness makes it easy to measure and report results for both. Latency is usually important for
frequently executed operations, and bandwidth is usually important when moving large chunks of data.
There are a number of factors to consider when building benchmarks.
The timing harness requires that the benchmarked operation be idempotent so that it can be repeated
indefinitely.
The timing subsystem, benchmp, is passed up to three function pointers. Some benchmarks may need as few
as one function pointer (for benchmark).
voidbenchmp(initialize,benchmark,cleanup,enough,parallel,warmup,repetitions,cookie)
measures the performance of benchmark repeatedly and reports the median result. benchmp creates
parallel sub-processes which run benchmark in parallel. This allows lmbench to measure the
system's ability to scale as the number of client processes increases. Each sub-process executes
initialize before starting the benchmarking cycle with iterations set to 0. It will call
initialize,benchmark, and cleanup with iterations set to the number of iterations in the timing
loop several times in order to collect repetitions results. The calls to benchmark are surrounded
by start and stop call to time the amount of time it takes to do the benchmarked operation
iterations times. After all the benchmark results have been collected, cleanup is called with
iterationssetto0tocleanupanyresourceswhich may have been allocated by initialize or
benchmark. cookie is a void pointer to a hunk of memory that can be used to store any parameters
or state that is needed by the benchmark.
voidbenchmp_getstate()
returns a void pointer to the lmbench-internal state used during benchmarking. The state is not
to be used or accessed directly by clients, but rather would be passed into benchmp_interval.iter_tbenchmp_interval(void*state)
returns the number of times the benchmark should execute its benchmark loop during this timing
interval. This is used only for weird benchmarks which cannot implement the benchmark body in a
function which can return, such as the page fault handler. Please see lat_sig.c for sample usage.
uint64get_n()
returns the number of times loop_body was executed during the timing interval.
voidmilli(char*s,uint64n)
print out the time per operation in milli-seconds. n is the number of operations during the
timing interval, which is passed as a parameter because each loop_body can contain several
operations.
voidmicro(char*s,uint64n)
print the time per opertaion in micro-seconds.
voidnano(char*s,uint64n)
print the time per operation in nano-seconds.
voidmb(uint64bytes)
print the bandwidth in megabytes per second.
voidkb(uint64bytes)
print the bandwidth in kilobytes per second.
Futures
Development of lmbench is continuing.
Name
lmbench - benchmarking toolbox
See Also
lmbench(8), timing(3), reporting(3), results(3).
Synopsis
#include``lmbench.h''typedefu_longiter_ttypedef(*benchmp_f)(iter_titerations,void*cookie)voidbenchmp(benchmp_finitialize,benchmp_fbenchmark,benchmp_fcleanup,intenough,intparallel,intwarmup,intrepetitions,void*cookie)uint64get_n()voidmilli(char*s,uint64n)voidmicro(char*s,uint64n)voidnano(char*s,uint64n)voidmb(uint64bytes)voidkb(uint64bytes)
Using Lmbench
Here is an example of a simple benchmark that measures the latency of the random number generator
lrand48():
#include``lmbench.h''voidbenchmark_lrand48(iter_titerations,void*cookie){while(iterations-->0)lrand48();}intmain(intargc,char*argv[]){benchmp(NULL,benchmark_lrand48,NULL,0,1,0,TRIES,NULL);micro(lrand48()",get_n());"exit(0);}
Here is a simple benchmark that measures and reports the bandwidth of bcopy:
#include``lmbench.h''#defineMB(1024*1024)#defineSIZE(8*MB)struct_state{intsize;char*a;char*b;};voidinitialize_bcopy(iter_titerations,void*cookie){struct_state*state=(struct_state*)cookie;if(!iterations)return;state->a=malloc(state->size);state->b=malloc(state->size);if(state->a==NULL||state->b==NULL)exit(1);}voidbenchmark_bcopy(iter_titerations,void*cookie){struct_state*state=(struct_state*)cookie;while(iterations-->0)bcopy(state->a,state->b,state->size);}voidcleanup_bcopy(iter_titerations,void*cookie){struct_state*state=(struct_state*)cookie;if(!iterations)return;free(state->a);free(state->b);}intmain(intargc,char*argv[]){struct_statestate;state.size=SIZE;benchmp(initialize_bcopy,benchmark_bcopy,cleanup_bcopy,0,1,0,TRIES,&state);mb(get_n()*state.size);exit(0);}
A slightly more complex version of the bcopy benchmark might measure bandwidth as a function of memory
size and parallelism. The main procedure in this case might look something like this:
intmain(intargc,char*argv[]){intsize,par;struct_statestate;for(size=64;size<=SIZE;size<<=1){for(par=1;par<32;par<<=1){state.size=size;benchmp(initialize_bcopy,benchmark_bcopy,cleanup_bcopy,0,par,0,TRIES,&state);fprintf(stderr,d%dmb(par*get_n()*state.size);}}exit(0);}Variables
There are three environment variables that can be used to modify the lmbench timing subsystem: ENOUGH,
TIMING_O, and LOOP_O.
