Performance Evaluation and Benchmarking

Kurt J. Engemann (Hagan School of Business, Iona College, New Rochelle, New York, USA)

Benchmarking: An International Journal

ISSN: 1463-5771

Article publication date: 1 September 2006

754

Keywords

Citation

Engemann, K.J. (2006), "Performance Evaluation and Benchmarking", Benchmarking: An International Journal, Vol. 13 No. 5, pp. 629-631. https://doi.org/10.1108/14635770610690456

Publisher

:

Emerald Group Publishing Limited

Copyright © 2006, Emerald Group Publishing Limited


Performance Evaluation and Benchmarking is a well‐conceived text that appropriately collects important recent advances in the field into a research book. The text's editors wrote several chapters themselves and invited leading experts in the field to contribute chapters on their recent research efforts. The book covers the important areas of computer performance evaluation and benchmarking and the entire text is consistently of high quality. This book deals with a large variety of state‐of‐the‐art performance evaluation and benchmarking techniques that are at the heart of computer architecture research and development. It would be useful to computer architects and designers as well as graduate students in computer architecture.

Performance Evaluation and Benchmarking evaluates the strengths and weaknesses of a variety of evaluation methods and benchmark suites. Designing and evaluating microprocessors is challenging especially considering the fact that one second of program execution on these processors involves several billion instructions. Performance evaluation is an overwhelming task because of the large number of potential designs and the constantly evolving nature of workloads. Design decisions are made based on performance models before any prototyping is done. Because building hardware prototypes of state‐of‐the‐art microprocessors is expensive and time consuming, design analysis is usually accomplished by simulation models. Performance measurements with a prototype are more accurate; however, a prototype needs to be available in order to understand the performance of the actual system under various real‐world workloads.

The book begins with an overview of modern performance evaluation techniques. Various simulation methods and hardware performance‐monitoring techniques are described. Performance evaluation can be classified into performance modeling and performance measurement. Performance measurement can be done only if the actual system or a prototype exists. Performance modeling is typically used in early stages of the design process, when actual systems are not available for measurement. Performance modeling may further be divided into simulation‐based modeling and analytical modeling. Trace‐driven simulation, execution‐driven simulation, complete system simulation, event driven simulation, and program profilers are discussed. The book also discusses analytical modeling including probabilistic methods, queuing theory, Markov models, Petri nets.

Performance Evaluation and Benchmarking describes most of the recent state‐of‐the‐art benchmark suites. These benchmark suites reflect different types of workload behavior including general‐purpose media workloads, and embedded workloads. Benchmarks used for performance evaluation of computers should be representative of applications that run on actual systems. Several popular benchmarks for different classes of workloads are discussed, including current industry standard CPU benchmarks, embedded and media benchmarks, Java benchmarks, transaction processing benchmarks, and web server benchmarks.

A major issue in performance evaluation is the issue of reporting performance with a single number. The use of multiple benchmarks for performance analysis makes it necessary to use some kind of average. The text presents appropriate methods to use for various common metrics used while designing and evaluating microprocessors. Performance can be summarized over a benchmark suite by using arithmetic or harmonic means with appropriate weights.

Computer performance measurement experiments come in one of two different forms – measurements of real systems or simulation‐based studies. Measurement experiments are subject to errors due both to noise in the system being measured and to noise in the measurement tools themselves. One difficulty with a large simulation study is the large amount of data produced by varying the simulation inputs. The book addresses how a statistical design of experiments approach can be used to sort through a large number of simulation results to aggregate the data into meaningful conclusions. The text provides an explanation of how confidence intervals can be used to extract quantitative information. In order to fully understand the complex interaction of a computer program's execution with the underlying microprocessor, a huge number of simulations are required. Statistics can be helpful for simulation‐based design studies to cut down the number of simulations that need to be done without compromising the end result.

Detailed processor simulations using real‐life benchmarks are the standard for early stage performance analysis. A disadvantage of this approach is that it is prohibitively time consuming. Therefore, researchers have proposed several techniques for speeding up these simulations. These approaches are discussed in the text. Sampling is one approach for reducing the total simulation time. Simulations can be obtained more efficiently while attaining highly accurate performance estimates.

Understanding the cycle‐level behavior of a processor during the execution of an application is crucial to computer architecture research. Researchers need techniques that can reduce the time required to estimate the impact of an architectural modification. The authors discuss SimPoint to satisfy this need. SimPoint is an intelligent sampling approach that selects samples called simulation points, based on a program's phase behavior. SimPoint automates the process of picking simulation points using an offline phase classification algorithm, which significantly reduces the amount of simulation time required.

Performance Evaluation and Benchmarking discusses statistical simulation as a viable tool for efficient early design stage explorations. The idea of statistical simulation is to collect a number of important program execution characteristics and generate a synthetic trace from it. Because of the statistical nature of this technique, simulation of the synthetic trace quickly converges to a steady‐state value. As such, a very short synthetic track suffices to attain a performance estimate. The statistically generated synthetic trace is several orders of magnitude smaller than the original program execution; hence a simulation finishes very quickly.

Multiple benchmarks with multiple input data sets are simulated from multiple benchmark suites in contemporary research and development. However, there exists significant redundancy across inputs and across programs. Performance Evaluation and Benchmarking describes methods to identify such redundancy in benchmarks so that only relevant and distinct benchmarks need to be simulated. General issues related to measuring benchmark similarity are discussed, and a workload analysis methodology that can be used to measure benchmark similarity in a reliable way is detailed. Three important applications are discussed‐program behavior analysis, workload design, and validation of reduced input sets.

Measurement is the most credible approach to accurately evaluate the performance of a system, but it is very costly. Simulation is an effective technique to predict the performance of any existing or new system. A simulation program, usually written in a high‐level programming language, still requires tremendous amount of time and computing resources in order to model a relatively complex system precisely. On the other hand, analytical models attempt to capture the behavior of a computer system quite effectively and can provide quick answers to many questions. However, the model can become intractable as the system complexity increases. The book reviews mathematical techniques that can be used for performance analysis of computer systems. It contains two different techniques that have been widely used: queueing theory and Petri net models. Queueing theory is the study of waiting in line, and is used to predict, for example, how long a job stays in a queue, and what the system throughput is. A Petri net is a graphical and mathematical modeling tool, which is particularly useful for capturing concurrency and synchronization behaviors.

Performance‐monitoring hardware (typically referred to as EMON, or event monitoring hardware) provides a low‐overhead mechanism for collecting processor performance data. Once enabled, the EMON hardware can detect and non‐intrusively count any of a set of performance events while the processor is running application and the operating system. The text describes performance‐monitoring facilities on three state‐of‐the‐art microprocessors.

Performance Evaluation and Benchmarking covers the most important aspects of the field and does so in a readable fashion, despite the fact that the area is quite technical. The chapters, even though contributed by several authors, have a consistent quality and similarity of style so that it appears to be more of a text, rather than a collection of disparate papers. The work should be quite useful to students and professionals working in the field of computer performance evaluation and benchmarking.

Related articles