Download The Cache Visualisation Tool

Transcript
Chapter 6
Conclusions
Because of high memory and network latencies, the cost of cache misses is very high. Because of the
complexity of cache phenomena such as cache interferences, it is often dicult for programmers and hardware
designers to precisely understand the causes and origins of this poor behavior. Most analytical tools simply
provide bottom lines, such as the hit ratio obtained after executing a code segment. Therefore, to improve
software and hardware performance, better analytical tools are needed to help in this regard. The CVT is
designed to ll this gap.
In this report we have rst presented some basic cache theory, which have led to the implementation
of the CVT. A complete description of the functionality of the CVT is described, which includes a cache
simulator (others can be easily plugged in), an input program and trace emulator, a display environment
based on Motif and a tool-box for setting breakpoints, displaying statistics, specifying methods for coloring
cache-lines, etc.
In chapter 4, an overview of current cache issues and how software optimizations can address them has
been given, by describing current methods and techniques. Another (more) important goal of this chapter
was to give a (potential) user an idea on what benets the CVT can bring in understanding the exact cache
behavior of codes restructured by software optimizations.
Chapter 4 starts of with describing how cache interferences are eectively displayed by the CVT, by
looking at a dicult to understand loop nest and spotting bottlenecks in the code. Next, one of the most
well-known software optimizations, blocking, is described by presenting a model and it is shown that the
CVT is an eective tool for nding the optimal blocking size.
Nonsingular loop transformations, which is a more elaborate class of software optimizations, are presented
by describing some theory and models that try to optimize data locality through these transformations. It
is important to note, that code which is restructured by these transformations is very dicult to read and
some leads are presented to the user to let the CVT help him understand phenomena coming from this
transformations.
Software prefetching is included in this report for two reasons, it is one of the few software techniques
for reducing compulsory misses and it shows how a dierent simulator can be easily plugged in. A software
prefetched matrix-matrix multiply is analyzed with the CVT and the results are discussed.
The last part of this chapter is on sparse codes, which represent a special class of numerical codes, which
have usually more dicult reference-patterns, due to the indirectly addressing of at least one array. This part
is important, because it shows how traces can be eectively used to gain insight on sparse codes behavior.
The last chapter discussed hardware organizations and their impact on the performance. Analyzing a very
simple nested DO-loop used in this chapter already unveiled unpredictable cache behavior. Increasing the
set-associativity, the hardware is expected to avoid certain conicts in cache. But this hardware improvement
will not always obtain the expected reduction of conicts. The advantage of more way set associative cache
is the choice of the location of data in cache in several cache lines. The additional hardware required should
decrease the the conict misses, but introduces another phenomenon; the address range is getting smaller
when we increase the set associativity, which might cause extra conicts. The performance is so unpredictable
because of dependencies in numerous parameters; hardware organization, software techniques, size, order
and access patterns to the used data-structures etc. You might say that relatively small caches suer more
82