RHEALSTONE: A REAL-TIME BENCHMARKING PROPOSAL

An independently verifiable metric for complex multitaskers

Rabindra P. Kar and Kent Porter

Rabindra P. Kar is a senior engineer with the Intel Systems Group in Hillsboro, Oregon, Kent Porter is senior technical editor for DDJ. Kent can be reached through CompuServe at 76704.51 or through MCI: KPORTER


Despite the growing importance of real-time systems, the industry lacks any meaningful, objective way to measure real-time performance. We use Whetstones and Dhrystones to benchmark the code generated by compilers and or the throughput of hardware platforms, but to date there has been no equivalent objective measure for real-time systems. In an effort to plug the gap, this article proposes a standard methodology for objectively measuring real-time performance and summarizing the components of performance in a figure of merit called Rhealstones.

The Rhealstone metric chiefly helps developers select real-time computer systems appropriate for their applications. A "real-time computer system" is any marriage of hardware and systems software -- operating system, kernel, executive, or these elements of systems software in combination -- that forms a platform for a real-time application. A "real-time application" is a limited-purpose computerized system that responds to defined circumstances as they arise in time to influence subsequent events. An example is an aircraft autopilot, which detects and corrects deviations from the plane's intended flight profile.

A complete real-time solution consists of the computer system plus the application software plus external devices. Rhealstones won't measure how good the complete solution is, and thus they may not be an appropriate measure for end users of real-time systems. Instead, Rhealstones are an engineering measurement targeted specifically toward true multitasking solutions.

A multitasking system is one in which more than two tasks having different priority levels run concurrently. By this definition, then, Rhealstones do not apply to synchronous polling solutions, in which all tasks have equal priority and are granted equal time and access to resources if active. Rhealstones apply chiefly to complex systems running five to thirty concurrent processes.

The derivation of the name "Rhealstone" is obvious and -- to those concerned with performance measurement -- so is the application of the result. The similarities to Whetstone and Dhrystone end there.

The Whetstone and Dhrystone benchmarks are synthetic programs that try to achieve a statistically balanced set of operations reflecting an "average" workload: Whetstone for floating-point applications and Dhrystone for systems programs such as compilers and operating system utilities. The benchmark programs loop some number of times, and this number divided by the elapsed time in seconds yields the figure of merit: the bigger, the better. The result serves as a predictor of a given platform's performance given the "average" job. Though useful, the Whetstone and Dhrystone metrics reveal nothing about the relative performances of the various operations that contribute to the single-figure result --and they have no applicability to real-time systems.

The Rhealstone measure, on the other hand, proceeds from the observation that every real-time application is unique. One system may be highly interrupt-driven, another rely heavily on message-passing among tasks, and still another deal with contention for resources, and so on. Real-time systems are almost always multitasking, with one task or another capturing the attention of the processor based upon conditions within the specific application's limited universe. The processor relies on some sort of real-time systems software to help it do its job. Moreover, real-time systems react to circumstances and by definition are endless programs that stop only when the plug is pulled. All this makes the notion of statistical balance irrelevant.

The Rhealstone figure is consequently a sum obtained from six categories of activity most crucial to the performance of real-time systems, irrespective of the actual application. The categories, or components, of the Rhealstone sum are:

Using coefficients we'll discuss later, the system engineer can assign each Rhealstone component a weight that reflects its relative importance in the target application. But first, let's describe the components.

What Makes up a Rhealstone Number?

The Rhealstone metric consists of quantitative measurements of the six components that most influence the performance of real-time systems:

Task switching time is the average time the system takes to switch between two independent and active (that is, not suspended or sleeping) tasks of equal priority, as Figure 1 illustrates. Task switching is synchronous and nonpreemptive, as when, for example, the real-time control software implements a time-slice algorithm for multiplexing equal-priority tasks.

Task switching time is a fundamental efficiency measure of any multitasking system. Measurement seeks to assess the compactness of task control data structures and the efficiency with which the executive manipulates the data structures in saving and restoring contexts. Task switching time is also influenced by the host CPU's architecture, instruction set, and features (provided the executive uses them).

Additionally, task switching time is a measure of the executive's list management capabilities because an executive typically organizes its data structures into ordered lists and shuffles the nodes according to circumstances.

Preemption time is the average time it takes a higher-priority task to wrest control of the system from a running task of lower priority. Preemption usually occurs when the higher-priority task moves from an idle to a ready state in response to some external event. For example, when an attached device generates an interrupt, the interrupt service routine attempts to wake up the task to service the request. Preemption time is the average time the executive takes to recognize an external event and switch control of the system from a running task of lower priority to an idle task of higher priority.

Though conceptually similar to task switching (Figure 1), preemption usually takes longer. This is because the executive must first recognize the wake-up action and assess the relative priorities of the running and requested tasks, and only then switch tasks if appropriate.

Virtually all multiuser/multitasking executives assign task priorities, and many let the application designer change priorities dynamically. For this reason preemption, along with interrupt latency, is the most significant real-time performance parameter.

Interrupt latency time, illustrated in Figure 2, is the time between the CPU's receipt of an interrupt request and the execution of the first instruction in the interrupt service routine. Interrupt latency time reflects only the delay introduced by the executive and the processor and does not include delays occurring on the bus or interfaces to external devices.

Semaphore shuffling time is the delay between a task's release of a semaphore (usually by calling the executive's "relinquish semaphore" primitive) and the activation of another task blocked on the "wait semaphore" primitive. No other tasks should be scheduled in between, although at least three tasks with different priorities should be active.

The focus on semaphore shuffling time is to measure the overhead associated with mutual exclusion. In most real-time systems, multiple tasks compete for the same resources. Semaphore-based mutual exclusion is a convenient way to ensure that a nonshareable resource serves only one master at a time.

Figure 3 illustrates semaphore shuffling. Here Task 1 runs for a time and then takes control of a resource by requesting its semaphore. Eventually Task 1 is suspended and Task 2 starts. After a time, Task 2 requests the semaphore presently owned by Task 1. Unable to continue, Task 2 stops, and Task 1 again awakens. At the end of its period, Task 1 relinquishes the semaphore. The executive recognizes that the semaphore is now free/available for suspended Task 2. Semaphore shuffling time is the period between Task 1 releasing the semaphore and the resumption of Task 2.

Advanced real-time executives recognize relative priorities in a "wait semaphore" situation; that is, when multiple tasks are waiting for a semaphore, the executive schedules them so that the highest-priority task goes to the head of the queue. The Rhealstone measure doesn't give an executive extra credit for prioritized semaphore queues, but such a capability may be important to the software designer.

Deadlock breaking occurs when a higher-priority task preempts a lower-priority task that holds a resource needed by the higher-priority task. The deadlock breaking metric measures the average time it takes the executive to resolve the conflict.

Deadlocks are a common multitasking problem, yet not all executives handle deadlocks effectively, if at all. A common executive solution is to temporarily raise the priority of the running task above that of the interrupting task until the lower-priority task releases the needed resource. At that point, the temporary priority is lowered and the new task can run.

Figure 4 illustrates deadlock breaking along a time line. Here, low-priority Task 1 takes ownership of a resource and is then preempted by Task 2, which has medium priority. This task runs until the highest-priority task preempts it. Task 3 presently requests the critical resource, which is still held by the suspended Task 1. The first phase of deadlock breaking time then occurs as the executive decides what to do about the contention. Eventually the executive raises the priority of Task 1 so that it can run. When Task 1 releases the critical resource, the executive suspends it and enters the second phase of deadlock breaking, which entails reinstating Task 3 and giving it control of the resource.

Deadlock breaking is thus the sum of times required to resolve an ownership dispute between a low-priority task holding a resource and a higher-priority task that needs it.

Datagram throughput time is the number of kilobytes per second one task can send to another via calls to the primitives of the real-time executive --without using a pre-defined common message buffer in the system's address space or passing a pointer. The sending task must receive an acknowledgement, as Figure 5 depicts. Executives typically provide pipes, message queues, and/or stream files for this purpose.

The goal of measurement in the datagram throughput category is to average intertask communications speed. Datagram throughput is of primary importance in applications where one task collects data from the outside world and sends it to another task for processing. An acknowledgement of receipt is essential to assure the sending task that the data is safely in the hands of the receiver before the sender overwrites its buffers with new data.

Computing the Rhealstone Number

Measurement of these six aspects of real-time performance yields a set of time values in the microsecond to millisecond range. In order to combine the six results into a single meaningful figure, express all times in seconds and invert them arithmetically. For example, if activity N takes 200 microseconds, the time in seconds is 0.0002 and the Rhealstone component is 1/0.0002 = 5000. The frequency of N is 5000 per second. (Note: Datagram throughput time is already expressed in kilobytes per second, so no further conversion is necessary.)

More Details.

There are two related reasons for expressing Rhealstone components in terms of frequency per second. Performance becomes directly proportional to value --the bigger the number, the better the performance. This leads to the second reason --the Rhealstone metric is then consistent with other industry benchmarks such as Whetstones and Dhrystones.

An objective Rhealstone number can now be calculated as

  r1 + r2 + r3 + r4 + r5 + r6 =   objective Rhealstones/second

where r1 is the task switching time component, r2 is preemption time, and so on.

The objective Rhealstone number sum is useful for expressing the overall performance of a real-time platform: for example, the "claim" number used by the vendor of real-time executive A running on microprocessor X. The objective Rhealstone sum is based on the assumption that all Rhealstone components are equally influential in determining system performance. This may be true in general, but it is probably not true of any specific real-time application.

Weighting the Rhealstone Components

Evaluators can tailor the Rhealstone figures to their applications by using weighting coefficients. Herein lies one of the strongest features of the Rheastone measurement.

Say a designer estimates that interrupts will occur five times as often as task switches; semaphores and intertask messaging will not be used at all. In this case, then, the weighting coefficients for semaphores and datagrams are 0 and the weighting coefficient for interrupts is numerically five times as great as the weight for task switching, say 10 and 2, respectively. The other two components (preemption and deadlock breaking), having unknown weights, receive "default" coefficients of 1.

Such information leads us to an application-specific Rhealstone equation of

  n1*r1 + n2*r2 + n3*r3 + n4*r4 + n5*r5 + n6*r6 = application Rhealstones/second

where the n factors are weight coefficients. Note that a coefficient must be either zero (when the component is irrelevant to the specific application's performance) or a positive value that gives the Rhealstone component its relative importance in the application's performance.

An application designer typically has several alternatives among microprocessors and real-time systems software. Using a matrix in which the coefficients are uniformly applied to each alternative, the designer can easily select the best platform for the application: The platform with the largest application-specific Rhealstone value "wins."

The beauty of the Rhealstone metric lies in its ability to express real-time performance objectively, while allowing designers to tailor and measurements to specific applications. The Rhealstone number allows vendors to state --in terms universally understood --the relative performances of products and application designers to extrapolate these performances to a given real-time system.

Data Acquisition

Real-time systems are, in general, "black boxes" operating within closed, well-defined universes of possibilities that change dynamically. A real-time system runs forever, dealing with circumstances as they arise. Unlike a compiler or some other utility program, a real-time system has no defined beginning and ending. This makes it difficult to measure real-time performance.

A number of tools let us look into the black box and find out where it spends its time. One is a hardware probe or in-circuit emulator, which feeds software that reports how many times each machine instruction executes. Another is a statistical profiler, a software monitor awakened by an event such as a clock tick, which collects and reports information about what the observed system is doing each time it looks. There are other tools as well, such as software simulators driven by scripts.

The Rhealstone methodology specifies which aspects of performance must be measured and how they are to be treated in order to arrive at a figure of merit; the methodology does not describe how data for each component should be obtained. The only stipulation is that someone else, using the same configuration and measurement techniques, must be able to arrive independently at the same performance components. The verifiability of both the individual component figures and the Rhealstone sum depends, of course, on compliance with a uniform method for reporting Rhealstones.

Reporting Rhealstone Results

A Rhealstone report must include the following:

The last is the most stringent requirement of a Rhealstone report. Inclusion of the individual Rhealstone components allows others to apply performance figures to their circumstances. In the absence of by-category results, an objective Rhealstone number has little meaning.

The overall intent is reproducibility; anyone should be able to take the same configuration, apply the same measurement techniques, and come up with the same results with a reasonable margin for error, regardless of the actual application. Independent verification is the key to reliable Rhealstone measurements.

Giving Credit Where Due

The Rhealstone concept was originally devised by Rabindra P. Kar of the Intel Systems Group, Hillsboro, Ore., in response to the need to measure enhancements to iRMX and Intel's other real-time executives. Recognizing that this idea could lead to an industry standard for measuring real-time performance, Frank Vaughan, the division's PR manager, approached DDJ's editor-in-chief Jon Erickson. Jon enthusiastically endorsed the concept, and senior technical editor Kent Porter is coordinating it.

From its inception, the objective of this project has been to remove any vendor preference from the Rhealstone metric. This paper has already been circulated to a number of vendors, users, and academics for comment. The Rhealstone measurement was also the subject of a panel discussion at the Real-Time Programming Conference held in Anaheim, Calif., last November. Now DDJ invites interested readers to submit written suggestions and criticisms, (see sidebar) so that we can forge this proposal into an industrywide, vendor-independent measure of real-time performance.

Request for Suggestions

The Rhealstone proposal published here is a draft. DDJ invites interested readers to submit suggestions for improvements. To the fullest extent possible, we will incorporate reader feedback into the Rhealstone standard. Current plans call for the final version to be published (along with a model set of benchmark tasks written in C) in the June 1989 issue of DDJ.

Please send suggestions in writing to Kent Porter at DDJ, 501 Galveston Dr., Redwood City, CA 94063. Alternatively, address them to Kent at CompuServe 76704,51 or MCI KPORTER. Include your name, title, company, and a brief summary of your real-time experience.

We cannot accept suggestions by telephone, so please DO NOT CALL.

Deadline for suggestions is March 1, 1989. A list of all contributors to this standard will be published with the final specification.

--K.P.


Copyright © 1989, Dr. Dobb's Journal