[ Pobierz całość w formacie PDF ]

unless the large memory chunk is exhausted, which is, processed sequentially (to take maximum advantage of
depending on the requested allocation sizes, pretty rare. prefetching), the processor would read all the header and
Obstacks are not a complete replacement for a memory padding words into the cache, even though they are never
84 Version 1.0 What Every Programmer Should Know About Memory
supposed to be read from or written to by the application information related to the branches in the code. It must
itself. Only the runtime uses the header words, and the be preserved for later.
runtime only comes into play when the block is freed.
Once the program binary is available, it should be used
One could at this point argue that the implementation to run a representative set of workloads. Whatever work-
should be changed to put the administrative data some- load is used, the final binary will be optimized to do this
where else. This is indeed done in some implementa- task well. Consecutive runs of the program are possible
tions, and it might prove to be a good idea. There are and, in general necessary; all the runs will contribute to
many aspects to be kept in mind, though, security not be- the same output file. Before the program terminates, the
ing the least of them. Regardless of whether we might data collected during the program run is written out into
see a change in the future, the padding issue will never files with the extension.gcda. These files are created
go away (amounting to 16% of the data in the example, in the directory which contains the source file. The pro-
when ignoring the headers). Only if the programmer di- gram can be executed from any directory, and the binary
rectly takes control of allocations can this be avoided. can be copied, but the directory with the sources must be
When alignment requirements come into play there can available and writable. Again, one output file is created
still be holes, but this is also something under control of for each input source file. If the program is run multiple
the programmer. times, it is important that the.gcdafiles of the previous
run are found in the source directories since otherwise
the data of the runs cannot be accumulated in one file.
7.4 Improving Branch Prediction
When a representative set of tests has been run, it is time
to recompile the application. The compiler has to be able
In section 6.2.2, two methods to improve L1i use through
to find the.gcdafiles in the same directory which holds
branch prediction and block reordering were mentioned:
the source files. The files cannot be moved since the com-
static prediction through__builtin_expectand pro-
piler would not find them and the embedded checksum
file guided optimization (PGO). Correct branch predic-
for the files would not match anymore. For the recom-
tion has performance impacts, but here we are interested
pilation, replace the-fprofile-generateparameter
in the memory usage improvements.
with-fprofile-use. It is essential that the sources
The use of__builtin_expect(or better thelikely
do not change in any way that would change the gener-
andunlikelymacros) is simple. The definitions are
ated code. That means: it is OK to change white spaces
placed in a central header and the compiler takes care
and edit comments, but adding more branches or basic
of the rest. There is a little problem, though: it is easy
blocks invalidates the collected data and the compilation
enough for a programmer to uselikelywhen really
will fail.
unlikelywas meant and vice versa. Even if somebody
This is all the programmer has to do; it is a fairly sim-
uses a tool like oprofile to measure incorrect branch pre-
ple process. The most important thing to get right is the
dictions and L1i misses these problems are hard to detect.
selection of representative tests to perform the measure-
There is one easy method, though. The code in sec-
ments. If the test workload does not match the way the
tion A.2 shows an alternative definition of thelikely
program is actually used, the performed optimizations
andunlikelymacros which measure actively, at run-
might actually do more harm than good. For this reason,
time, whether the static predictions are correct or not.
is it often hard to use PGO for libraries. Libraries can
The results can then be examined by the programmer or
be used in many sometimes widely different scenarios.
tester and adjustments can be made. The measurements
Unless the use cases are indeed similar, it is usually bet-
do not actually take the performance of the program into
ter to rely exclusively on static branch prediction using
account, they simply test the static assumptions made by
__builtin_expect.
the programmer. More details can be found, along with
A few words on the.gcnoand.gcdafiles. These are
the code, in the section referenced above.
binary files which are not immediately usable for inspec-
PGO is quite easy to use with gcc these days. It is a three-
tion. It is possible, though, to use the gcov tool, which is
step process, though, and certain requirements must be
also part of the gcc package, to examine them. This tool
fulfilled. First, all source files must be compiled with the
is mainly used for coverage analysis (hence the name) but
additional-fprofile-generateoption. This option
the file format used is the same as for PGO. The gcov tool
must be passed to all compiler runs and to the command
generates output files with the extension.gcovfor each
which links the program. Mixing object files compiled
source file with executed code (this might include sys-
with and without this option is possible, but PGO will
tem headers). The files are source listings which are an-
not do any good for those that do not have it enabled.
notated, according to the parameters given to gcov, with
branch counter, probabilities, etc.
The compiler generates a binary which behaves normally
except that it is significantly larger and slower because it
records (and stores) information about whether branches
are taken or not. The compiler also emits a file with the
extension.gcnofor each input file. This file contains
Ulrich Drepper Version 1.0 85
0 0x3000000000 C 0 0x3000000B50: (within /lib64/ld-2.5.so)
1 0x 7FF000000 D 3320 0x3000000B53: (within /lib64/ld-2.5.so)
2 0x3000001000 C 58270 0x3000001080: _dl_start (in /lib64/ld-2.5.so)
3 0x3000219000 D 128020 0x30000010AE: _dl_start (in /lib64/ld-2.5.so)
4 0x300021A000 D 132170 0x30000010B5: _dl_start (in /lib64/ld-2.5.so)
5 0x3000008000 C 10489930 0x3000008B20: _dl_setup_hash (in /lib64/ld-2.5.so) [ Pobierz całość w formacie PDF ]

  • zanotowane.pl
  • doc.pisz.pl
  • pdf.pisz.pl
  • alwayshope.keep.pl
  •