Raid Performance Test User Manual

Raid Performance Test User Manual Release 1.0 - 21 June 1995

This software was developed using the Sun Solaris 2.4 x86 operating system on a Dell DimensionXPS Pentium 90 computer. It used SunPro's ProWorks/Teamware 1.00, ProWorks 2.01, ProCompiler-C 2.01, and ProCompiler-C++ 3.01 under OpenLook 3.4 OpenWin XWindows, SunOS 5.4. Rpt is posix1 compliant except for the curses screen library and the time facilities. This system has very limited posix time facilities, sleep/nanosleep with a minimum time delay of 1s and times with a minimum time measurement of 10ms, so the functions select and gethrtime were used. Posix1/posix4 time facilities are used when compiled without the __solaris__ flag.

The following assumes the disk array is mounted under /data and uses the data file raidtest.dat. You can create a symbolic link to it if it uses a different mountpoint, su; cd /; ln -s mountpoint data;. You can create the data file using rpt by specifying the data file size using the arguments, -B bytes. You can check the space available under the mountpoint using df -k.

The Objective

The rpt software processes an input workload test and parameter file and applies io load to a disk array by launching "transact" child processes or control threads. These transact threads report back messages to rpt concerning response times and throughput. Rpt collects these results, creating and destroying threads to measure response time and throughput vs. load and compute rabmarks. It sequences through the workload tests, producing run, log, and report files.

The software tests io speeds. Local disks are typically limited by media access speeds rather than channel communication speeds while the opposite is true for remote disks. Indeed, the limited timing facilities of this system require parameter adjustment so this is the case, as the channel communication speeds are too fast to be measured accurately. For example, a local disk with a 12ms average seek time can only support up to 83 iops in the randomized limit.

The io response time is defined to be the time from initiation of a seek to the completion of the io inclusive. It is averaged over a number of ios and a number of threads. It does not include any time to determine the occurrence, location, or type (read/write) of the next io or any delay until the next io is initiated. As the device driver performs seek and io together, seek alone cannot be measured.

The load is measured by the main process as the number of io completions reported divided by the duration over which they are reported assuming steady state operation. Throughput is measured as the total bytes of io divided by the duration over which they are reported assuming steady state operation. Some reported ios will have been completed prior to the collection interval and some ios will have completed but not yet been reported during the collection interval. Steady state operation assumes that these numbers are equal.

Consider a single thread. The response time is the active io duration, while the load is the number of io completions per unit time. The reciprocal of the load is the interarrival time, that is, the response time plus the delay time between ios, where the delay time includes calculation and sleep time. In this case, response <= 1/load, where equality would occur only during saturation and if the calculation time is insignificant.

When many threads are active, many ios may be outstanding at any time. The average number of ios outstanding may be computed as the response time divided by the reciprocal of the load, that is, the product of response time and load.

The Software

Rpt accepts the following switches and arguments. These values can be and are overridden by the input test and parameter files.

	Rpt switch argument	Notes (default)
	-B datafilesize		Data file size to create bytes 
				(data file size)
	-f datafile		Data file name (/data/raidtest.dat)
	-m mode			Mode  0 - Normal 1 - Continuous 2 - Reduced 
				3 - Rebuild 4 - Exchange (0)
	-n nruns		Number of runs in normal mode (1)
	-r rsmin		Minimum, low load, response time ms  (0)
	-R rsmax		Maximum, max load, response time ms (50)
	-t testfile		Test file (rpt.rpt)
	-v verbose		Verbosity 0,1,2 (0)

The data file size must be specified to create it, but once created it will use the existing size if unspecified.

The mode is user specified; the software cannot determine this. Nruns cycles of workload tests are run in normal mode, continuously in the other modes. The only other effect of specifying the mode is to time stamp the log with the specified mode. The operator may change the mode at will anytime.

The response time levels used are bounded between rsmin and rsmax. Rsmax limits the maximum load applied.

The test file may be specified, or left as rpt.rpt. Rpt.rpt and rpt.rpp are text files which define the io files and workload parameters for the standard RABmark test workloads. These files are discussed below.

A verbose level of 1 (summary) will display message response times, iops, and rabmarks. A verbose level of 2 (detail) will display raw messages with times in clock ticks and non-central response time variance. Rpt will print out the workload definitions when run with a verbose setting on the command line.

Test files

The rpt.rpt test file defines the standard test run and the rpt.rpp parameter file defines the standard RABmark test workloads. These text files may be edited by the user to change the test or workloads. The syntax consists of

	1			Variables - alphanumeric
	2			Operators ;, =, [ ], and " "
	3			; Comments beginning with ;
	4			Variable=value definitions 
	5			[Object definitions]
	6			Object invocations

Variables may be any alphanumeric string and are case sensitive. Leading and trailing but not embedded whitespace is ignored. Comments begin with a semicolon and terminate with the line. They may appear on any line. No declaration is necessary before definition and redefinitions are allowed. Definition values may be terminal strings, with or without quotes, or nonterminal variables previously defined. Object definitions consist of a name in brackets or quotes followed by a number of symbolic definitions up to a blank line. The objects known to the parser are files, flags, workloads, and threads. An object invocation is the object name alone. IMPORTANT: Users should double check any changes they make as typos are accepted without objection.

The test file begins by defining the names of the test file itself, the hardware configuration file, the input parameter file, and the output report file. The test file name should be changed if the test file itself changes, as it is self-referential. Files are defined with a variable=value definition and invoked by name. The test file, rpt.rpt, defines the parameter file, ParmFile=rpt.rpp, and invokes it, ParmFile. The parameter file is opened, read by the parser, and closed returning to the test file. Only one input depth level is provided, so all file invocations should be restricted to the test file and only parameter files should be invoked. Nothing is done with the configuration file. This is a template for the tester to record test details. The parameters in the test file are known to and retrieved by the software. Do not change the names of the variables in this file. Only the values in the parameter file are retrieved. The names of the variables can be anything in this file.

The parameter file begins by defining unchangeable global constants. These values are used throughout the software so should not be changed. The threadflags object defines the valid transact thread flags. Only one flags section is supported, shared between all threads. The changeable global constants follow. These include test run defaults Mode and Verbose, MaxPossibleThreads, MaxActualThreads, the data file and its size, and standard test parameters.

The maximum possible control threads determines the division of the data file, while the maximum actual control threads to use determines how many threads are actually created to use these divisions, the latter being limited by system resources. Controlthreads greater than the maximum possible would simulate contention. None of the workloads use contention. Rpt currently uses one data file for all threads and all workload tests, but this can be specified for each thread. DataSize may be added to the thread definitions to make sure the data file is created if it does not exist. The number of ios / thread message affect both the duration of the test and the reliability of the results. A transaction thread having a load of 20.0 iops has a mean interarrival time of 0.05s and therefore averaging 100 ios will generate a message every 5s. The controlthread must be present although its value is replaced by the thread count of the thread being generated.

The following workloads are defined:

	WorkloadNThreadsLThreadsTest name
	1	0	1	Transaction
	2	0	3	Multiuser
	3	0	1	Database query
	4	5	1	Batch
	5	5	2	Video multimedia
	6	1	1	Archive save
	7	2	2	Scientific

NThreads is the number of threads in this workload (batch only). Batch cases are run with a fixed number of threads. Refer to the load discussion below for more on this. A value of zero means 0<NThreads<MaxActualThreads. LThreads is the number of threads of this type. The first lthreads created will be of this type before the next type is created. If no more types are defined, the next thread will resort to the first type. Finally a set of definitions follow evaluated as thread flag value pairs, the first pair being the thread and process name, " " transact. Only the flags and values listed are created. The transact process actually simulates all batch and transaction processes. The lefthand variable must be defined in the flags section. The flags, workload, and thread sections should end their name with their name, that is, ThreadFlags, TransactionWorkload, and TransactionThread.

The input parameters may be validated by running the performance test with verbose specified on the command line (-v 1) and exiting. All workload threads will be printed.

Running the Software

This assumes you have successfully loaded and compiled the Raid Performance Test software on your system and are running it under XWindows. It also assumes /usr/openwin/bin/xterm and . are in your search path for this system and it is being run from the rpt directory. Alternatively, it can be run from the command line without XWindows.

No command line parameters are necessary unless the data file is being created, in which case the data file size must be specified. Specifying verbose on the command line has one difference from specifying it in the test file; it will print a list of the workloads which can be used to validate the parameter file input.

The program can be run with its own window, the display disappearing when finished. or in its own window, leaving the display intact after finishing. The latter is preferable to examine the results, but a Bourne shell script rpt.sh is provided that can do either.

	
	Command			Result
	% rpt.sh rpt		Run and exit when finished, or 
	% rpt.sh    		Start window, in which enter
	% rpt			Run and stay when finished, or
	% rpt			Without XWindows

The user interface appears below.

+-Status------------------------------+  +-Configuration----------Release 1.00-+
|Run:             1  Mode:      Normal|  |Test File:                    rpt.rpt|
|Current Workload:         Transaction|  |Config File:                  rpt.rpc|
|# Threads                           0|  |Param File:                   rpt.rpp|
|Test Time:                    0:00:00|  |Report File:                  rpt.rpr|
+-------------------------------------+  +-------------------------------------+
+-Summary Results--------------------------------------------------------------+
| Normal             WL[1]   IOPS   WL[2]   IOPS   WL[3]   IOPS   WL[4]   IOPS |
| Max Response:                                                                |
| 2/3 Response:                                                                |
| 1/3 Response:                                                                |
| N/L Response:                                                                |
| RABmark(94)[n]:                                                              |
| RABmark(94)[s]:                                                              |
+------------------------------------------------------------------------------+
+-Verbose Window---------------------------------------------------------------+
|                      Cnt   Thrd    Nios    Nkb    R ms   Iops    R[n]   R[s] |
| Curr Message                                                                 |
| Curr Thread                                                                  |
| Last Thread                                                                  |
| Last Response                                                                |
+------------------------------------------------------------------------------+
+-Operator-------------------------------------------------++-State------------+
|Mode, Quit, Restart, Start/Stop, Test, Verbose, eXit>     ||Stopped           |
+----------------------------------------------------------++------------------+

The interface screen status subwindow displays the current run, mode, workload, number of threads and elapsed time. A run is a set of workloads, seven in the standard set. The initial workload is the transaction test with zero threads running, that is, it is in a stopped state. Entering an "s" command, will start execution. When run non-interactively, the program will start itself. The following case insensitive commands are available:

	Key			Command
	M			Mode
	Q			Quit
	R			Restart
	S			Start/Stop
	T			Test
	V			Verbose
	X			Exit

The modes are

				Mode
	1			Normal
	2			Continuous
	3			Reduced
	4			Repair
	5			Rebuild
	6			Exchange

Mode counts through the mode levels, returning to normal mode, time stamping the results with the mode. The normal mode runs through the set of workloads nruns times. Modes other than normal run continuously.

Test changes the workload test cycling back to the initial after the last test, restart restarts the current test with the initial number of threads. Start and stop start and stop execution of the current test with the current number of threads. Verbose counts through the verbose levels, none, summary, detail, returning to no messages. Quit and exit both exit the program. The status subwindow displays the run, mode, workload name, number of active threads and elapsed time. It is updated with a change in number of threads or test. The configuration subwindow displays the file names read from the test file.

The summary results subwindow displays the response time vs. iops and rabmarks for the first four workloads (three transaction and one batch) and just the rabmarks[s] (throughput (Mb/s)) for the last three workloads (all batch).

The verbose subwindow displays the current (latest) message, the current (latest) thread results, the last (previous) thread results, and the last interpolated response. The current message is updated with the arrival of each message. The current and last thread results response time is useful for detecting how far the test is from updating the next response level and for displaying fluctuations in response time with load.

The operator subwindow displays user selectable options, denoted by upper case letters although it is not case sensitive, and the state window displays the current state. The states are

	1			Stopped
	2			Execute
	3			Finished

Rpt creates a run file that records the launch parameters of each thread, a log file that records the creation and destruction of each thread, each message received, each measurement accumulated, and each result computed, and a report file summarizing the results. The run and log file names are created by replacing the last letter of the report file, ReptFile nominally rpt.rpr, with e and l for rpt.rpe and rpt.rpl. The log file should be checked for any warnings and error messages after a run.

Users should start a testing sequence by running a quick test of the workloads. This will not converge ordinarily but will indicate the number of threads and messages necessary for convergence. This can be accomplished by running rpt -t rptq.rpt, and examining the warning messages in rptq.rpl. It takes approximately 1/2 to 1 hr. The maximum number of transaction threads, MaxPossibleThreads and MaxActualThreads, and the maximum number of messages collected, Dnmax, may be adjusted. Rptq.rpt has Dnmax set to 32 which is not enough to ordinarily converge to the suggested 3%, Dlmax=0.03, accuracy, but enough to estimate how much it will require. Both the number of threads and number of messages needed increase with narrower random walks (larger alpha). The running time will increase with both of these. For example, the number of threads and messages suggested for the transaction workload and various spatial access distributions and scales for this system are listed below. A run with an alpha of 0.5 takes ~6 hours. Increasing the accuracy to 1% would take 3^2 = 9 times as long. Alphas in the range of 2.0-2.5 would take orders of magnitude longer.

	Workload 1 Transaction (3%)
	Threads	Messages	Spatial access distribution and scale
	2	128		Uniform
	3	256		Hyperbolic, alpha=0.5
	10	4096		Hyperbolic, alpha=2.0
	14	8196		Hyperbolic, alpha=2.5

Raid Performance Test Technical Overview
Workload Definition

A workload can be composed of any number of threads of different types. Threads can be created one at a time sequencing through each type (transaction), or a fixed number of threads of these types can be created (batch).

A workload is defined as a linked list of control thread structures:

	Thread struct		Notes	
	int lthreads		Number of threads of this type
	int mthreads		Number of threads remaining in list (computed)
	int nthreads		Number of threads in this workload 
				0 - transaction  n > 0 - batch 
	char**params	 	controlthread name (), 
				controlprocess name (transact), 
				switches and arguments of the controlprocess 
				(transact switches and arguments)
	struct thread* nthread	Pointer to next thread in list

The transact thread parameters in turn are:

	Transact switch arg	Notes (default)
	-a  accessmode		0-contiguous 1-interleaved 2-contentious (0)
	-b  blocksize		Bytes > 0 (1)
	-B  datafilesize	Bytes > 0 (data file size)
	-c  controlthread	Control thread number (0)
	-C  ncontrolthreads	Maximum possible control threads > 0 (0)
	-i  iooffset		Io offset from beginning of block (-1)
				Ios are packed within a block if < 0
	-I  iosize		Io size, 1 < I < blocksize, 0 is blocksize (0)
	-n  nios		Number of ios / thread message (100)
	-r  nreads		Relative number of reads (0)
	-w  nwrites		Relative number of writes (0)
	-s  spatialscale	Spatial access scale, mean step (seq,exp) (1)
	   			1 for sequential, 
				2-2.5 for hyperbolic (alpha), 
				B/3 for exponential,
	-S  spatialdistr	Spatial access distribution (0)
				0 - sequential  
				1 - uniform  
				2 - hyperbolic  
				3 - exponential
	-t  timescale		Time arrival scale, mean iops (5.0)
	-T  timedistr		Time arrival distribution (0)
				0 - constant (no waiting)   
				1 - uniform  
				2 - poisson (exponential)  
				3 - exponential
	-f  datafile		Data file name (/data/raidtest.dat)
	-v  verbose  		Verbosity (0)
				3 - fine  
				4 - very fine detail (exits after n ios)

Address space partitioning

Begin defining a workload by defining the path, name, and size of a data file or files for the workload. The existing file size will be used if unspecified. While the file has no internal structure, all threads accessing the file must do so in a manner consistent with common accessmode, blocksize, and maximum controlthreads. While these define the file divisions, the actual io area can be less than a block and the actual number of threads created can be less than the maximum number of threads.

For ios less than a block, one can either use 1) (iooffset>=0) one io per block of size iosize, offset from the beginning of the block by iooffset, or 2) (iooffset<0) blocksize/iosize ios per block each of size iosize, packed within a block. There should be an integral number of ios in a block in the latter case. For the actual number of threads less than the maximum, the space reserved for the unactualized threads is never accessed, so to provide as much disk coverage as possible the maximum and maximum actual should be equivalent. The resources of this system are limited to 50-75 maximum actual threads.

For example, a 800k2 (k2=1024^2 bytes) file divided among 50 threads using 64k blocks results in 64k bytes/block x 256 blocks x 50 threads. Using contiguous access, thread 0 accesses the range 0-256x64k, while thread 1 accesses the range 256x64k-512x64k, and thread 2 accesses 512x64k-640x64k. If thread 0 has an iosize of 4k and an iooffset of -1 (packed), it will access the entire thread 0 range 0-512x64k in 4k ios, while with an offset of 0 it will access the ranges 0-4k, 64k-68k, 128k-132k, ..., and with an offset of 4k it will access the ranges 4k-8k, 68k-72k, 132k-136k.... Using interleaved access, thread 0 accesses the ranges 0-64k, 50x64k-51x64k, 100x64k-101x64k, ..., while thread 1 accesses the ranges 64k-128k, 51x64k-52x64k, 101x64k-102x64k, ..., and thread 2 accesses 128k-192k, 52x64k-53x64k, 102x64k-103x64k, .... If thread 0 has an iosize of 4k and an iooffset of -1 (packed), it will access the entire thread 0 range of 0-64k, 50x64k-51x64k, 100x64k-101x64k, ..., while with an offset of 0 it will access the ranges 0-4k, 50x64k-50x64k+4k, 100x64k-100x64k+4k, ..., and with an offset of 4k it will access the ranges 4k-8k, 50x64k+4k-50x64k+8k, 100x64k+4k-100x64k+8k, .... Any portion of the file beyond int[datafilesize / (maxthreads*blocksize)] * (maxthreads*blocksize) will be ignored. This is summarized below.

	Contiguous and interleaved address ranges for 64k x 256 x 50 = 800k2
	Contiguous
	Thread\Block	0		1		2
	0		0-64k		64k-128k	128k-192k
	1		256x64k-257x64k	258x64k-259x64k	260x64k-261x64k
	2		512x64k-513x64k 513x64k-514x64k	514x64k-515x64k
	IoOffset\Io	0		1		2
	-1		0-4k		4k-8k		8k-12k
	0		0-4k		64k-68k		128k-132k
	4k		4k-8k		68k-72k		132k-136k

	Interleaved
	Thread\Block	0		1		2
	0		0-64k		50x64k-51x64k	100x64k-101x64k
	1		64k-128k	51x64k-52x64k	101x64k-102x64k
	2		128k-192k	52x64k-53x64k	102x64k-103x64k
	IoOffset\Io	0		1		2
	-1		0-4k		4k-8k		8k-12k
	0		0-4k		50x64k-		100x64k-
	4k		4k-8k		50x64k+4k-	100x64k+4k-

Reads and/or writes should be specified, but will default to readonly (1:0) otherwise. The following constraints should be observed:

	Constraints

	The data file is accessed consistently
	for threads i and j, datafile(i) = datafile(j)
	accessmode(i) = accessmode(j), 
	blocksize(i) = blocksize(j), 
	ncontrolthreads(i) = ncontrolthreads(j)

	There is data for io
	blocksize > 0, ncontrolthreads > 0

	The io fits within a block
	ioOffset < blocksize - iosize, iosize <= blocksize

	There are an integral number of ios in a block if packed (iooffset<0)
	iosize = 0 || blocksize%iosize = 0

	The file is created (if necessary)
	datafilesize >> blocksize*ncontrolthreads

	The is no unrequested contention between threads
	controlthread < ncontrolthreads

	The are some ios to measure
	nios > 0

Data file size

Be careful of using too small data files. The number of distinct ios a thread ranges over, datafilesize / (blocksize * ncontrolthreads) or datafilesize / (iosize * ncontrolthreads) for packed io, should be calculated for each case. For a 200k file with 4k size blocks and 50 threads, this amounts to only 1 block. For a 12.5 Mbyte (1024^2) file with 4k size blocks and 50 threads, this amounts to 64 blocks, or 256 kbytes, within the range of many hard drive caches. Once the blocks are cached the disk will not need to be accessed again. Even for a 1 Gbyte (1024^3) file with 4k size blocks and 50 threads, this amounts to only 5242. It takes very large file sizes to reach even moderate ranges.

The following table shows the number of independent samples for various data file sizes with 50 max threads and total percentage cached assuming a 256k cache. Note that files less than several hundred megabytes (100k2) have a large degree of caching and are simply too small for workloads 5 and 7.

		Block	Data file size
	Workld		50k2	100k2	500k2	1000k2
		Cached	25%	12.5%	2.5%	1.25%
	1	4k	256	512	2560	5120
	2	64k	16	32	160	320
	3	4k	256	512	2560	5120
	4	64k	16	32	160	320
	5	1024k	1	2	10	20
	6	32k	32	64	320	640
	7	1024k	1	2	10	20

Files of ~800MB have been used for testing. Transaction cases use a maximum of 50 threads and batch cases a maximum of 5 threads. The actual and maximum number of threads created during a run and the file coverages for the workloads for this system are

	Workld	NThr	MThr	Coverage
	1	3	50	6%	(alpha 0.5)
	2	3	50	6%
	3	4	50	8%

	1	10	50	20%	(alpha 2.0)
	2	7	50	14%
	3	27	50	54%

	4	5	5	100%
	5	5	5	100%
	6	1	5	20%
	7	2	5	40%

Access modes and distributions

Various spatial access and time arrival distributions can be specified; the most common are dependent on accessmode. An exponential time arrival distribution results in sums following a gamma distribution and the number of arrivals within a given period following a discrete Poisson distribution. As it is the time arrival distribution that is controlled, the Poisson distribution has been dropped from the following discussion in favor of the exponential distribution.

Interleaved access modes are used in transaction tests use hyperbolic spatial accesses with an alpha coefficient of 2.0 and exponential time queuing with a mean io rate of 20.0 iops. This represents a design change. The original io rate of 1/m = 1/400ms = 2.5 iops is too low to achieve results in a reasonable amount of time, nor could it assure saturation within 50 maximum threads.

Contiguous access modes are used in batch tests use sequential spatial accesses with a step size of one and constant time queuing (iops unused). Please note however that this is not a constraint. An interleaved access mode can be used with sequential spatial access distributions and a contiguous access mode can be used with random spatial access distributions.

Contentious access allows all threads to access all blocks as one contiguous area, essentially by setting the maximum possible control threads to one. This does not prevent there from being more threads, only from reserving space for them. Contention is not used by any of the standard workloads.

Each thread begins at a uniformly random location within its allotted space. A sequential spatial access distribution will increment the location by the step size on each access, wrapping to the beginning at the end. A random spatial access distribution will increment or decrement the location by a random step size drawn from that distribution, wrapping at the beginning and end. Sequential spatial access will traverse the space in one direction while random spatial access will traverse the space in both directions. A constant queuing time arrival distribution will issue subsequent io requests as soon as the previous one is complete. A random queuing time arrival distribution will increment a program clock by a random step size drawn from that distribution and compares it to the system clock. If the program clock is later, the program will wait, otherwise the next io request is issued immediately.

Be careful of random walks with small steps for the transaction cases. Once a neighborhood has been cached in the hard drive, the drive will not need to be accessed for a long time, leading to unmeasurable response times and unbounded rabmarks. This leads to extremely long integration times to achieve reliable results. The median Hyperbolic step size is s0*(1-p)^(-1/alpha), s0=1, p=0.5, or

	Hyperbolic Distribution
	Alpha	Median		Ps(>32)	Ns(>32)	Nn(>32)
	2.5	1.3195		0.017	17621	588
	2.0	1.4142		0.098	3056	512
	1.5	1.5874		0.552	542	406
	1.0	2.0000		3.13	95	256
	0.5	4.0000		17.7	16	64
	0.0<-	Infinite

A cache of 256k can cache 64 4k blocks. The values of n necessary to assure some values would not be cached are calculated. Assuming a neighborhood +/- 32 has been cached, the probability of a single step, Ps, beyond this neighborhood (D>32), and the number of steps, Ns, necessary for this probability to exceed a 95% confidence level, (ln(1-0.95)/ln(1-Ps)), are calculated. Assuming a symmetric random walk with a constant step size of the median (invalid but useful), the deviation after n steps would be on the order of median*sqrt(n). The number of steps, Nn, for this to occur, (32/median)^2, are calculated. The number of ios integrated should be greater than the minimum of these two. An alpha of 0.5 was chosen for reproducibility and speed.

Load Control

Load can be controlled by controlling the number of threads or the load applied per thread. The two are not equivalent as each thread carries a memory and a disk context and adding threads results in more context switching. As a result, tests run with different load applied per thread are not directly comparable. The tests as written assume load is controlled by adding threads rather than increasing the load per thread, the equivalent of adding users rather than assuming the users work harder. This seems the most appropriate for a real model and is used in these tests. Once the system becomes saturated though, adding threads increases response time while decreasing load as the system must spend more time seeking, causing the response curve to bend back over itself.

Setting the load applied per thread determines the minimum load and the load resolution. Together with a maximum number of threads it also determines the maximum load measurable. A load per thread of 20 iops and 50 threads results in measurements from 20 to 1000 iops. A higher alpha coefficient would likely need higher load per thread to achieve reliable results in a reasonable time.

Batch loads cannot be controlled as constant queuing by any thread will saturate the io channel. The addition of threads decreases throughput more than proportionally due to context switching, sequential accesses becoming randomized with multiple threads. The number of threads can only be set and the throughput measured at that number of threads. For the batch tests the number of threads are set to Batch 5, Video 5, Archive 1, Scientific 2. The number of ios each thread does is scaled as well to provide a constant number of Megabytes / message to approximately equalize the time statistics gathered. The batch threads report results every 5 megabytes. This is summarized below.

	Nios / message		Test
	100			Transaction
	100			Multiuser
	100			Database Query
	80			Batch
	5			Video Multimedia
	160			Archive Save
	5			Scientific

In general, one should adjust the number of ios per message to provide a regular number of messages at low load but not excessive number of messages at high load and let the procedure sum as many messages as necessary to provide convergence.

Workload composition

A workload can be composed of many threads each with their own weight. For example, the multiuser case has 8 threads of 3 types, three quarters (6) are transaction threads divided equally (3-3) between threads types that perform 4k and 8k ios, and one quarter (2) are batch threads of the same type. The threads are created, one by one, in this order, returning to the beginning if more threads are needed.

Constant queuing by any thread will lead to complete saturation. As a result, transaction threads are not really compatible with batch threads as in the multiuser case, workload 2, and batch workloads 4-7 are always saturated. The relevant measurement parameter, response time or throughput, becomes open to question in the multiuser case. The batch workloads can only be measured at a specified load.

The workloads need careful selection for compatibility. Workloads may saturate quickly or not at all. They may lead to unmeasurable response times and irreproducible results. Nor are tests with different parameters directly comparable.

Measurements

The measured quantities are response time, rs (ms), load (iops), and throughput (Mb/s). Rabmarks are computed from these, averaged over the different workloads:

	Test			Rabmarks
	Random transaction 	Rabmark[r]==
	Sequential batch	Rabmark[s]==

Response time is defined to be the time from initial seek to io completion inclusive. For transaction tests, the response times are collected for a number of control threads until the results converge and then another thread is added. When the response time exceeds a given level, results are linearly interpolated between the current and last number of control threads. These response time levels, rs[i], are the noload case, rs[nl], (a minimum of rsmin) and (rsmax - rs[nl])/3, (rsmax - rs[nl])*2/3, and rsmax ms. Rsmax is 50ms. For batch tests, the response time and throughput at a fixed number of threads are measured.

The detailed transaction convergence procedure follows. An initial thread is created. NThread plus one messages are discarded to allow the system to reach equilibrium. The results of dnmin messages are collected. If the mean response time is within a desired factor dlmax (3%) of its true value (3 sigma), the results are accepted as converged. If not, the number of messages is doubled and results compared until the results have converged or the number of messages exceed dnmax (1024). A thread is then added, nthreads plus one messages are discarded, dnmin messages are collected and the response time measured. This is repeated until the response time exceeds the next response level. Two threads are then shed. Convergent results are then determined and threads are added and convergent results determined again. When the response level has been bounded results are linearly interpolated in response time at the response level and the next response level computed. This continues until the maximum response level has been determined and all threads are shed in preparation for the next workload.

The detailed batch convergence procedure is this. The set of threads are created. NThread plus one messages are discarded to allow the system to reach equilibrium. The results of dnmin messages are collected. If the mean total time is within a desired factor dlmax (3%) of its true value (3 sigma), the results are accepted as converged. If not, the number of messages is doubled and results compared until the results have converged or the number of messages exceed dnmax (1024).

Assuming normality, measurement of the mean response time, m, with n samples having an individual sample standard deviation s to within a factor f of its true value requires 3s/sqrt(n) = f m, or n = (3s/fm)^2. For f = 3%, n = (100 s/m)^2. Consider the following reported sum with the hyperbolic distribution parameter alpha of 0.5 and times in seconds. This states 2:50 into the test, a sum from one thread of 32 messages transferred 12800 kbytes in 3200 ios. Seek, seek+transfer, and total times in seconds, and their squared sums are reported.

	         0:02:50 s0:  1 32  12800 3200  0.421868 59.3275 163.3  
			0.00170609 3.40486 879.616  0.0767826

The mean response time is 59.3275 s * 1000 ms/s / 3200 ios, or 18.54 ms. The standard deviation, s, is sqrt( 3200 ios * 3.40486s^2 - (59.3275s)^2) * 1000 ms/s / 3200, or 26.8383 ms and s/m is 1.4476. The number of samples should be greater than 20955, or at 100 ios / message and 20 iops per thread, greater than 209 messages and 17 min for a single thread. A set of workloads takes around 6 hr with these parameters.

The standard workloads

The defined workload tests and the parameters for them are as follows.

	Workload		Test name
	1			Transaction
	2			Multiuser
	3			Database query
	4			Batch
	5			Video multimedia
	6			Archive save
	7			Scientific

Test Parameters

Workload 1 Transaction Test Scenario
Parameter			Value
Number of Threads		Variable <= 50
Maximum possible threads	50
Access mode			Interleaved
Block size			4k
Thread types			1 Transaction
				Transaction Threads 1
Io size				4k
Reads:Writes			2:1
Spatial access distribution	Hyperbolic
Spatial access scale		Alpha
Time access distribution	Exponential
Time access scale		20.0 iops


Workload 2 Multiuser Test Scenario
Parameter			Value
Number of Threads		Variable <= 50
Maximum possible threads	50
Access mode			Contiguous
Block size			64k
Thread types			6 Transaction 2 Batch
				Transaction Threads, 3@4k, 3@8k
Io size				4k, 8k
Reads:Writes			2:3
Spatial access distribution	Hyperbolic
Spatial access scale		Alpha
Time access distribution	Exponential
Time access scale		20.0 iops
				Batch Threads, 2@64k
Io size				64k
Reads:Writes			1:1
Spatial access distribution	Sequential
Spatial access scale		1.0
Time access distribution	Constant
Time access scale		NA


Workload 3 Database Query Test Scenario
Parameter			Value
Number of Threads		Variable <= 50
Maximum possible threads	50
Access mode			Interleaved
Block size			4k
Thread types			1 Transaction
				Transaction Threads 1
Io size				4k
Reads:Writes			100:1
Spatial access distribution	Hyperbolic
Spatial access scale		Alpha
Time access distribution	Exponential
Time access scale		20.0 iops


Workload 4 Batch Large Transfers Test Scenario
Parameter			Value
Number of Threads		5
Maximum possible threads	5
Access mode			Contiguous
Block size			64k
Thread types			1 Batch
				Batch Threads 1
Io size				64k
Reads:Writes			4:1
Spatial access distribution	Sequential
Spatial access scale		1.0
Time access distribution	Constant
Time access scale		NA


Workload 5 Video Multimedia Test Scenario
Parameter			Value
Number of Threads		5
Maximum possible threads	5
Access mode			Contiguous
Block size			1024k
Thread types			5 Batch
				Batch Threads 4@1:0, 1@0:1
Io size				1024k
Reads:Writes			1:0, 0:1
Spatial access distribution	Sequential
Spatial access scale		1.0
Time access distribution	Constant
Time access scale		NA


Workload 6 Archiving Save Test Scenario
Parameter			Value
Number of Threads		1
Maximum possible threads	5
Access mode			Contiguous
Block size			32k
Thread types			1 Batch
				Batch Threads 1
Io size				32k
Reads:Writes			1:0
Spatial access distribution	Sequential
Spatial access scale		1.0
Time access distribution	Constant
Time access scale		NA


Workload 7 Scientific Test Scenario
Parameter			Value
Number of Threads		2	
Maximum possible threads	5
Access mode			Contiguous
Block size			1024k
Thread types			2 Batch
				Batch Threads 1@1:0 & 1@0:1
Io size				1024k
Reads:Writes			1:0, 0:1
Spatial access distribution	Sequential
Spatial access scale		1.0
Time access distribution	Constant
Time access scale		NA

All transaction tests use a maximum possible number of threads of 50 and all batch test use a maximum number of threads of 5. All transaction tests (1-3) except multiuser (2) use interleaved access mode. Multiuser (2) and all batch tests (4-7) use contiguous access mode. The transaction tests all use a hyperbolic spatial access distribution with a scale (alpha) of 0.5 and an exponential time access distribution of 20.0 iops. The batch tests all use a sequential spatial access distribution with a scale (step) of 1.0 and constant time access.

The workload definitions may be verified by launching rpt in the verbose mode, rpt -v 1, and exiting. Refer to the Appendices for example input files and printouts.

Output files

Three files are created by the program, a run file recording the details of every thread launched, a log file summarizing the results of every thread applied, and a report file summarizing the end results.

The run file

The run file records the details of every thread launched during the run. It can be used to verify the workloads ran. Here are typical lines from the run file for transaction and batch tests:

	File:	rpt.rpe

	Thread
	|	 Data file
	|	 |		       Interleaved access mode
	|	 |		       |    Blocksize
	|	 |		       |    |       Controlthread 0
	|	 |		       |    |	    |	 Maxcontrolthreads
	|	 |		       |    |	    |	 |
	transact -f /data/raidtest.dat -a 1 -b 4096 -c 0 -C 50
	 -n 100 -r 2 -w 1 -S 2 -s 0.5 -T 2 -t 20.000000
	 |	|    |	  |    |      |    |
	 |	|    |	  |    |      |    Exponential coefficient (load iops)
	 |	|    |	  |    |      Exponential time distribution
	 |	|    |	  |    Hyperbolic alpha coefficient
	 |	|    |	  Hyperbolic spatial distribution
	 |	|    Writes
	 |	Reads
	 Number of ios per message

	Thread
	|	 Data file
	|	 |		       Interleaved access mode
	|	 |		       |    Blocksize
	|	 |		       |    |        Controlthread 0
	|	 |		       |    |	     |	  Maxcontrolthreads
	|	 |		       |    |	     |	  |
	transact -f /data/raidtest.dat -a 0 -b 65536 -c 0 -C 5 
	 -n 80 -r 4 -w 1 -S 0 -s 1 -T 0 -t 20.000000
	 |     |    |	  |    |    |    |
	 |     |    |	  |    |    |    N/A
	 |     |    |	  |    |    Constant time distribution
	 |     |    |	  |    Sequential stepsize
	 |     |    |	  Sequential spatial distribution
	 |     |    Writes
	 |     Reads
	 Number of ios per message

The transaction test uses the data file /data/raidtest.dat as interleaved 4k blocks, thread 0 of a possible 50 with 4k io transfers coterminous with a block. One hundred ios in a ratio of 2 reads to 1 write are summed. A hyperbolic spatial step size distribution with an alpha of 0.5 and an exponential time interarrival distribution with a mean of 20.0 iops. The batch test uses the data file /data/raidtest.dat as contiguous 64k blocks, thread 0 of a possible 5 with 64k io transfers coterminous with a block. Eighty ios in a ratio of 4 reads to 1 write are summed. A sequential spatial access with step size of 1 and a constant time queuing (iops unused) is used. Note that these threads are not consistent with each other due to different blocksizes and maxcontrolthreads and should not be used in the same workload.

Transact can be run alone in verbose mode to examine an actual run and any error messages that may be produced. Refer to the code for the output.

The log file

The log file summarizes the results of every thread applied. It is useful for detailed study of the operation and convergence of a run. A system timestamp appears followed by the program and its arguments and the name of the control thread program. Results appear in the first column, control thread messages (#n:) in the second when run in detailed verbose mode, and timestamped actions, summations (sn:), calculations (cn:), and interpolations (in:) in the third column. The times are from the beginning of the run in hr:mn:sc.

Individual control thread messages are displayed only when run with a detailed verbose level of two. The individual messages include the reporting thread, kbytes transferred, number of ios, seek, seek+transfer, total (sleep+seek+transfer) times (ticks), and sum squared of seek, seek+transfer, and total (sleep+seek+ transfer+report) times. As the device driver combines seek and transfer, seek is nominally zero, so seek+transfer is used for response time. The totals are scaled by the number of threads as the threads are run concurrently. This is replaced by the time measured by the main process to avoid bias in the thread estimates.

The sums display the number of threads, the number summed, the kbytes transferred, the number of ios, the summed seek, seek+transfer, and total times, and summed squared seek, seek+transfer, and totals times, half the total elapsed time and the convergence measure, the factor that the mean is within 3 sigma of the true mean.

		 Time	 
		 |	 Sum 1
		 |	 |    NThreads
		 |	 |    | NMessages
		 |	 |    | |   Kbytes
		 |	 |    | |   |	  Nios
		 |       |    | |   |	  |	Sum Seek 
		 |       |    | |   |	  |	|	 Sum Seek+transfer 
		 |       |    | |   |	  |	|	 |	 Sum Total 
		 |       |    | |   |	  |	| 	 |	 | 
	         0:02:50 s0:  1 32  12800 3200  0.421868 59.3275 163.3  
			0.00170609 3.40486 879.616  0.0767826
			|	   |	   |        |
			|	   |	   |        Convergence factor
			|	   |	   Sum total squared
			|	   Sum Seek+transfer squared
			Sum Seek squared

The calculations display the response time (ms), load (iops), random and sequential rabmarks r and n, that is, load/response (iops/ms) and throughput (Mb/s).

		 Time
		 |	 Calculation 1
		 |	 |    Response time (ms)
		 |	 |    |	      Load (iops)
		 |	 |    |	      |        Rabmark[r] 
		 |	 |    |	      |        |       Rabmark[n] 
		 |	 |    |	      |	       |       |		
	         0:02:50 c1:  18.5398 19.5958  1.05696 0.0765462

		Response (ms) = 1000 * Sum Seek+transfer / Nios
		Load (iops) = Nios / Totals
		Rabmark[r] =  Load (iops) / Response time (ms)
		Rabmark[n] =  Kbytes / ( 1024 * Totals )

The interpolations display the sums interpolated at the response level and the relative distance of the interpolated values between the preceding and the current loads, 0 meaning the preceding load and 1 meaning the current load.

		 Time
		 |	 Interpolation 1
		 | 	 |    NThreads
		 | 	 |    | NMessages
		 | 	 |    | |   Kbytes
		 |	 |    | |   |     Nios
		 |  	 |    | |   |     |     Sum Seek 
		 |	 |    | |   |     |     |	 Sum Seek+tranfer
		 |	 |    | |   |     |     |	 |       Sum Total
		 |	 |    | |   |     |     |	 |       | 
	         0:02:50 i1:  1 32  12800 3200  0.421868 59.3275 163.3  
			0.00170609 3.40486 879.616  1
			|	   |	   |        |     
			|	   |	   |        Interpolation factor
			|	   |	   Total squared
			|	   Sum Seek+transfer squared
			Sum Seek squared

The results display the workload number, name, mode, time, response level, response time (ms), load (iops), and sum of random and sequential rabmarks thus far in the workload. On transitions between transaction and batch workloads, the average of the preceding workload rabmarks are displayed as well.

Run #  Mode  mode		  Time
Workload #  WorkloadName
Level Response  Load
0     ms	iops
Rabmarks [r] [s]

Run 1  Mode   Normal           0:02:50
Workload 0  TransactionWorkload
n  Rs[r]   IOPS
0   19ms     20
Rabmarks 1.056959 0.076546


	Fri Jun 23 12:36:08 1995

	rpt -t rptq.rpt 
		" "
	initialized at 1 clock ticks/s
	#thr: kbytes nios seek xfer total seeksq xfersq totalsq
		time snum: nthr count kbytes nios seek xfer total seeksq xfersq totalsq
		time inum: nthr count kbytes nios seek xfer total seeksq xfersq totalsq

	         0:00:01 #00:  active, workload 0
	         0:00:54 s0:  1 8  3200 800  0.084905 15.6658 46.88  
			0.000220158 0.711372 301.058  0.121885
	         0:00:54 c1:  19.5823 17.0648  0.871443 0.0666596
	         0:01:32 s0:  1 16  6400 1600  0.171505 30.9565 85.21  
			0.000443511 1.72418 489.408  0.102832
	         0:01:32 c1:  19.3478 18.7771  0.970504 0.0733482
	         0:02:50 s0:  1 32  12800 3200  0.421868 59.3275 163.3 
			0.00170609 3.40486 879.616  0.0767826
	         0:02:50 c1:  18.5398 19.5958  1.05696 0.0765462
Run 1 Workload 0 Threads 1 Warning:  Convergence failure - 32 of 209.62 
			messages necessary - continuing anyway
	         0:02:50 c1:  18.5398 19.5958  1.05696 0.0765462
	         0:02:50 i1:  1 32  12800 3200  0.421868 59.3275 163.3  
			0.00170609 3.40486 879.616  1
Run 1  Mode   Normal           0:02:50
Workload 0  TransactionWorkload
n  Rs[r]   IOPS
0   19ms     20
Rabmarks 1.056959 0.076546

	         0:02:50 #01:  active, workload 0
...

The report file

The report file has only a timestamp and the results. It summarizes the final results of run. The results identify the workload number and name, mode of operation, a timestamp since the beginning of the test, the response levels, response times (ms), loads (iops), and the rabmarks (both load/rs (iops/ms) and throughput (Mb/s)). The average rabmarks, are summarized at transitions between transaction and batch cases. Therefore all transaction tests must be run together and all batch tests must be run together for overall averages. IMPORTANT: Only those tests run as a group contribute the average rabmarks. Users running tests separately or not as a group will need to compute their own average. Refer to the Appendices for more examples.

	File:	rpt.rpr
	Fri Jun 23 12:36:08 1995

Run 1  Mode   Normal           0:02:50
Workload 0  TransactionWorkload
n  Rs[r]   IOPS
0   19ms     20
1   29ms     32
2   40ms     39
3   50ms     40
Rabmarks 3.959126 0.155583

Summary

Avoid small files. Determine the number of independent samples for each workload (datafilesize)/(maxthreads*iosize).

Avoid excessive caching. Determine the amount of caching possible (cachesize/(datafilesize/maxthreads) and cachesize/iosize).

Avoid mixed transaction/batch loads. The relevant measurement becomes uncertain. As batch loads are always fully saturated, only throughput at a predetermined number of threads can be measured.

Avoid constrained random walks. Determine the minimum number of ios that should be summed for transaction cases (((cachesize/iosize) / (2*stepsize(alpha)))^2). Higher alphas require much greater integration and higher load per thread to maintain speed, but speed cannot be maintained for reliable measurements.

Avoid batch cases of greatly differing duration. Determine the number of ios that should be summed to maintain a constant duration (5MB/blocksize ios were used for these tests).

Avoid transaction cases having very different number of threads to reach saturation. Avoid batch cases accessing very limited portions of the disk (threads/maxthreads). The load per thread and the maximum number of threads determine the range and resolution of the load. The product of load per thread and maximum number of threads determines the maximum measurable load. Unless the maximum load changes, these should be scaled together.

Run a quick test, rpt -t rptq.rpt, to see the number of threads created and number of messages necessary for statistical reliability.