lmbench - benchmarking toolbox

Author

       Carl Staelin and Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1998-2000 Larry McVoy and Carl Staelin            $Date:$                                          LMBENCH(3)

Description

       Creating benchmarks using the lmbench timing harness is easy.  Since it is so easy to measure performance
       using lmbench, it is possible to quickly answer questions that arise during system design,  development,
       or tuning.  For example, image processing

       There  are  two attributes that are critical for performance, latency and bandwidth, and lmbench´s timing
       harness makes it easy to measure  and  report  results  for  both.   Latency  is  usually  important  for
       frequently executed operations, and bandwidth is usually important when moving large chunks of data.

       There are a number of factors to consider when building benchmarks.

       The  timing  harness  requires  that  the  benchmarked operation be idempotent so that it can be repeated
       indefinitely.

       The timing subsystem, benchmp, is passed up to three function pointers.  Some benchmarks may need as  few
       as one function pointer (for benchmark).

       voidbenchmp(initialize,benchmark,cleanup,enough,parallel,warmup,repetitions,cookie)
              measures  the  performance of benchmark repeatedly and reports the median result.  benchmp creates
              parallel sub-processes which run benchmark in  parallel.   This  allows  lmbench  to  measure  the
              system's  ability to scale as the number of client processes increases.  Each sub-process executes
              initialize before starting the benchmarking  cycle  with  iterations  set  to  0.   It  will  call
              initialize,benchmark, and cleanup with iterations set to the number of iterations in the timing
              loop several times in order to collect repetitions results.  The calls to benchmark are surrounded
              by  start  and  stop  call  to  time  the  amount of time it takes to do the benchmarked operation
              iterations times.  After all the benchmark results have been collected,  cleanup  is  called  with
              iterationssetto0tocleanupanyresourceswhich may have been allocated by initialize or
              benchmark.  cookie is a void pointer to a hunk of memory that can be used to store any  parameters
              or state that is needed by the benchmark.

       voidbenchmp_getstate()
              returns  a  void pointer to the lmbench-internal state used during benchmarking.  The state is not
              to be used or accessed directly by clients, but rather would be passed into benchmp_interval.iter_tbenchmp_interval(void*state)
              returns the number of times the benchmark should execute its benchmark  loop  during  this  timing
              interval.   This  is used only for weird benchmarks which cannot implement the benchmark body in a
              function which can return, such as the page fault handler.  Please see lat_sig.c for sample usage.

       uint64get_n()
              returns the number of times loop_body was executed during the timing interval.

       voidmilli(char*s,uint64n)
              print out the time per operation in milli-seconds.  n is  the  number  of  operations  during  the
              timing  interval,  which  is  passed  as  a  parameter  because each loop_body can contain several
              operations.

       voidmicro(char*s,uint64n)
              print the time per opertaion in micro-seconds.

       voidnano(char*s,uint64n)
              print the time per operation in nano-seconds.

       voidmb(uint64bytes)
              print the bandwidth in megabytes per second.

       voidkb(uint64bytes)
              print the bandwidth in kilobytes per second.

Futures

       Development of lmbench is continuing.

Name

       lmbench - benchmarking toolbox

Synopsis

#include``lmbench.h''typedefu_longiter_ttypedef(*benchmp_f)(iter_titerations,void*cookie)voidbenchmp(benchmp_finitialize,benchmp_fbenchmark,benchmp_fcleanup,intenough,intparallel,intwarmup,intrepetitions,void*cookie)uint64get_n()voidmilli(char*s,uint64n)voidmicro(char*s,uint64n)voidnano(char*s,uint64n)voidmb(uint64bytes)voidkb(uint64bytes)

Using Lmbench

       Here is an example of a simple benchmark that  measures  the  latency  of  the  random  number  generator
       lrand48():

              #include``lmbench.h''voidbenchmark_lrand48(iter_titerations,void*cookie){while(iterations-->0)lrand48();}intmain(intargc,char*argv[]){benchmp(NULL,benchmark_lrand48,NULL,0,1,0,TRIES,NULL);micro(lrand48()",get_n());"exit(0);}

       Here is a simple benchmark that measures and reports the bandwidth of bcopy:

              #include``lmbench.h''#defineMB(1024*1024)#defineSIZE(8*MB)struct_state{intsize;char*a;char*b;};voidinitialize_bcopy(iter_titerations,void*cookie){struct_state*state=(struct_state*)cookie;if(!iterations)return;state->a=malloc(state->size);state->b=malloc(state->size);if(state->a==NULL||state->b==NULL)exit(1);}voidbenchmark_bcopy(iter_titerations,void*cookie){struct_state*state=(struct_state*)cookie;while(iterations-->0)bcopy(state->a,state->b,state->size);}voidcleanup_bcopy(iter_titerations,void*cookie){struct_state*state=(struct_state*)cookie;if(!iterations)return;free(state->a);free(state->b);}intmain(intargc,char*argv[]){struct_statestate;state.size=SIZE;benchmp(initialize_bcopy,benchmark_bcopy,cleanup_bcopy,0,1,0,TRIES,&state);mb(get_n()*state.size);exit(0);}

       A  slightly  more  complex version of the bcopy benchmark might measure bandwidth as a function of memory
       size and parallelism.  The main procedure in this case might look something like this:

              intmain(intargc,char*argv[]){intsize,par;struct_statestate;for(size=64;size<=SIZE;size<<=1){for(par=1;par<32;par<<=1){state.size=size;benchmp(initialize_bcopy,benchmark_bcopy,cleanup_bcopy,0,par,0,TRIES,&state);fprintf(stderr,d%dmb(par*get_n()*state.size);}}exit(0);}

Variables

       There are three environment variables that can be used to modify the lmbench  timing  subsystem:  ENOUGH,
       TIMING_O, and LOOP_O.

lmbench - benchmarking toolbox

Contents

Author

Description

Futures

Name

See Also

Synopsis

Using Lmbench

Variables

See Also