The dispatch_benchmark() function executes the given block multiple times according to the count variable
and then returns the average number of nanoseconds per execution. This function is for debugging and
performance analysis work. For the best results, pass a high count value to dispatch_benchmark(). When
benchmarking concurrent code, please compare the serial version of the code against the concurrent
version, and compare the concurrent version on different classes of hardware. Please look for inflection
points with various data sets and keep the following facts in mind:
• Code bound by computational bandwidth may be inferred by proportional changes in performance as
concurrency is increased.
• Code bound by memory bandwidth may be inferred by negligible changes in performance as
concurrency is increased.
• Code bound by critical sections may be inferred by retrograde changes in performance as
concurrency is increased.
• Intentional: locks, mutexes, and condition variables.
• Accidental: unrelated and frequently modified data on the same cache-line.