        VUCAKO Bench
        ============

Bench is simple benchmark program for at least 32-bit computers.
It consists of several tasks which are designed to measure
performance of CPU, FPU, main memory, memory cache and file-system.
Bench is a "Open Source" program, see "copyright.txt" for details.

Individual results are identified by "cpu-description" string
and "clock" - effective number of cpu-MHz (additive on parallel
systems).
  Actual result file is "results.txt", new results are appended
to this file. If test suite is run more than once, result file
holds the best times only. It is recommended to run program 2 to
4 times to reduce effect of various fluctuations (especially
on Win32 systems with bad system-time measuring).
  "clock" and "cpu-description" records are stored in "cpu.txt"
file. Program reads only the first record: 1st item holds
an e-mail address (and is ignored), 2nd item is "clock",
the rest of a line is interpreted as "cpu-description".

REMARK: on parallel systems (with parallelizing compiler!)
*cummulative* number of MHz should be used. But I'm not
sure whether any of test routines are parallelizable!

Results of test runs are printed in form: "elapsed time: TIME,
(act RATIO %), ref: REF". TIME is total cpu-time in seconds,
RATIO is ratio of actual time compared to another result record,
REF is reference number which eliminates nominal processor
clock-speeds (in MHz). Architectures with different clock-speeds
are comparable using this REF number (smaller values are
better).

If you get results on some interesting/new architectures,
please send me the "result.txt" and "cpu.txt" files:

mailto:Josef.Pelikan@mff.cuni.cz
http://sun3.ms.mff.cuni.cz/~pepca/bench/


List of tasks in version 1.003:
-------------------------------

 1 (mask 0x0001): Eratosthen
                  ----------
   Finds the 5555555-th prime number.
   Performance: CPU, memory system
   Memory: 12.5 MB
   Disk: no

 2 (mask 0x0002): Transposition of 16MB image
                  ---------------------------
   Transposes a 4096x4096 byte-image in main memory (96 times).
   Performance: CPU, memory system
   Memory: 32 MB
   Disk: no

 3 (mask 0x0004): Transposition of 64MB image
                  ---------------------------
   Transposes a 8192x8192 byte-image in main memory (24 times). Very time
     consuming task on systems with <=128MB RAM!
   Performance: CPU, memory system, virtual memory system
   Memory: 128 MB !
   Disk: in case of memory swapping only

 4 (mask 0x0008): Arbitrage sequence lookup
                  -------------------------
   Finds the best arbitrage-sequence in a set of 12 currencies.
     Actual exchange rates are defined by a 12x12 double matrix, maximum
     length of arbitrage sequence is 12.
   Performance: CPU, FPU
   Memory: <3 KB
   Disk: no

 5 (mask 0x0010): Needle-throwing simulation
                  --------------------------
   Does Monte-Carlo simulation of "needle-throw" experiment: a needle
     of length A falls to a regular infinite pattern of parallel lines
     with distance A. The goal is to determine probability of needle-line
     intersection.
   Performance: CPU, FPU
   Memory: <1 KB
   Disk: no

 6 (mask 0x0020): Fast memory-copy
                  ----------------
   The fastest memory-copy operation ("memcpy()" routine in libc) is
     performed on large arrays (16MB each).
   Performance: CPU, memory system
   Memory: 32 MB
   Disk: no

 7 (mask 0x0040): Monte-Carlo form-factor
                  -----------------------
   Monte-Carlo computation of form-factor between parallel equal sized
     rectangles. 50 mil. rays are shot from one rectangle to another.
   Performance: CPU, FPU
   Memory: <1 KB
   Disk: no

 8 (mask 0x0080): Merge-sort on disk
                  ------------------
   Merge-sort of large disk file (double[8M] array of random numbers).
     double[1024] segments are pre-sorted in memory, "Merge-and-split" routine
     is used - 4 disk files of 32MB each are allocated.
   Performance: CPU, file-system
   Memory: 8 KB
   Disk: 128 MB in 4 files

 9 (mask 0x0100): Wavelet transform in memory
                  ---------------------------
   1D T-S lifting transform is performed on various int32[] arrays in memory.
     Equal number of arithmetic operations is used in every stage (for every
     array size) - partial times should represent efficiency of memory cache
     system.
   Performance: CPU, memory system, memory cache
   Memory: 16 MB max. (16MB, 8MB, 4MB, ... 8KB)
   Disk: no

10 (mask 0x0200): Wavelet transform on disk
                  -------------------------
   1D T-S lifting transform is performed on large disk files (double[] type).
     Lifting and unlifting uses only sequential data access. Equal number
     of arithmetic operations is used in every stage (for every array size)
     - partial times should represent efficiency of disk cache system.
   Performance: CPU, file-system, disk cache
   Memory: <1 KB
   Disk: 128 MB max. (128MB, 64MB, ... 1MB) in 3 files

11 (mask 0x0400): Adaptive K-D tree on disk
                  -------------------------
   2D adaptive K-D tree is used for storing point data objects. Each object
     occupies 512 bytes of disk space, bucket (leaf node) size is 4KB (i.e.
     8 objects). 150000 operations (55% of insertions, 45% of searches) are
     performed. Disk file is cached in 8MB of main memory.
   Performance: CPU, file-system, disk cache
   Memory: 8 MB
   Disk: 57.3 MB max.
