1. 程式人生 > >Optimized libraries for Linux on Power

Optimized libraries for Linux on Power

Introduction

There are several techniques to squeeze maximum performance from your library. This article discusses various ways to improve performance during compilation, linking, and runtime phases of library development. Many of these methods rely on system-specific data, so we briefly discuss about the auxiliary vector

(AUXV) and how to use it to make runtime optimization choices.B We also discuss about GNU Indirect Function (IFUNC) support to make target specification optimizations at run time. Employing one or a combination of these techniques should improve a library’s performance.

The auxiliary vector

During system boot, the LinuxB. kernel discovers the system’s processor and platform information from firmware. This information and other details about the platform is passed on to every process through AUXV. AUXV is a hidden parameter to the application’s main()function. The AUXV parameter follows the application envp[] parameter and is an array of the following structure (64 bit definition):

typedef struct
{
   uint64_t a_type;              /* Entry type */
   union
     {
       uint64_t a_val;         /* Integer value */
       /* We use to have pointer elements added here.  We cannot do that,
          though, since it does not work when using 32-bit definitions
          on 64-bit platforms and vice versa.  */
     } a_un;
} Elf64_auxv_t;

The AUX vector contains information about the system’s platform and hardware capabilities. The following example shows the information provided by the AUX vector:

On an IBMB. POWER8B. processor-based system

~/sandbox/> LD_SHOW_AUXV=1 /bin/true
AT_DCACHEBSIZE:  0x80
AT_ICACHEBSIZE:  0x80
AT_UCACHEBSIZE:  0x0
AT_SYSINFO_EHDR: 0x7fffabd50000
AT_HWCAP:        true_le archpmu vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_PAGESZ:       65536
AT_CLKTCK:       100
AT_PHDR:         0x1074f0040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0x7fffabd70000
AT_FLAGS:        0x0
AT_ENTRY:        0x1074f1d10
AT_UID:          1000
AT_EUID:         1000
AT_GID:          100
AT_EGID:         100
AT_SECURE:       0
AT_RANDOM:       0x7ffffda40df2
AT_HWCAP2:       htm-nosc vcrypto tar isel ebb dscr htm arch_2_07
AT_EXECFN:       /bin/true
AT_PLATFORM:     power8
AT_BASE_PLATFORM:power8

On an IBM POWER9™ processor-based system:

~/sandbox/> LD_SHOW_AUXV=1 /bin/true
AT_DCACHEBSIZE:  0x80
AT_ICACHEBSIZE:  0x80
AT_UCACHEBSIZE:  0x0
AT_SYSINFO_EHDR: 0x74dbafe20000
AT_HWCAP:        true_le archpmu vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_PAGESZ:       65536
AT_CLKTCK:       100
AT_PHDR:         0x74dbafe40040
AT_PHENT:        56
AT_PHNUM:        7
AT_BASE:         0x0
AT_FLAGS:        0x0
AT_ENTRY:        0x74dbafe41400
AT_UID:          1006
AT_EUID:         1006
AT_GID:          1006
AT_EGID:         1006
AT_SECURE:       0
AT_RANDOM:       0x7fffdb503172
AT_HWCAP2:       darn ieee128 arch_3_00 vcrypto tar isel ebb dscr arch_2_07
AT_EXECFN:       /bin/true
AT_PLATFORM:     power9
AT_BASE_PLATFORM:power9

The following fields are important to the dynamic linker, as demonstrated later in this article:

  • AT_BASE_PLATFORM – The name of the actual hardware platform.
  • AT_PLATFORM – The name of the compatibility platform for the IBM PowerVMB. partition. It is often the same as AT_BASE_PLATFORM.
  • AT_HWCAP – A bit vector of architecture versions (for example, arch_2_06) and optional feature categories (such as vsx, dfp, altivec, and so on).
  • AT_HWCAP2 – An additional bit vector of architecture versions (for example, arch_2_07) and optional feature categories (such as htm, dscr, ebb, and so on).

Accessing the auxiliary vector

There are several ways to access the AUX vector.

  • Read the /proc/<pid>/auxv file and manually parse the information to confirm if the kernel is new enough to support a specific feature.
  • Locate the AUX vector following the application argument envp[] pointer and manually parse the information found there, as in the following example.
Elf64_auxv_t *auxv;
   . . .
   while(*envp++ != NULL);
   auxv = (Elf64_auxv_t *)envp;

Note: Shared objects that need to access the AUXV information are at the mercy of the application developer and may find that this method might not work if the environment is changed by the application, for example by a call to setenv(), because the environment pointer may have been moved.

  • Use getauxval() to query the AUX vector in a structured manner as described in the following section.

Dynamic linker search path

When a program is run, the dynamic linker (loader) will use the information in the auxiliary vector to construct its library search path. For each directory in the library search path, the dynamic linker will search for libraries in subdirectories of that directory. The dynamic linker derives the names of the subdirectories using the AT_PLATFORM and AT_HWCAP information from the auxiliary vector.

To demonstrate this, let’s set LD_LIBRARY_PATH=”/somedir:$LD_LIBRARY_PATH”. We can see which directories are searched and in which order, when we run a program as demonstrated in the following example.

~/sandbox/> LD_LIBRARY_PATH="/somedir:$LD_LIBRARY_PATH" LD_DEBUG=libs ./auxv
29320:     find library=libc.so.6 [0]; searching
29320:      search
path=somedir/tls/power7/altivec/dfp:somedir/tls/power7/altivec:somedir/tls/power7
/dfp:somedir/tls/power7:somedir/tls/altivec/dfp:somedir/tls/altivec:somedir
/tls/dfp:somedir/tls:somedir/power7/altivec/dfp:somedir/power7/altivec:somedir/power7
/dfp:somedir/power7:somedir/altivec/dfp:somedir/altivec:somedir/dfp:somedir (LD_LIBRARY_PATH)
29320:       trying file=somedir/tls/power7/altivec/dfp/libc.so.6
29320:       trying file=somedir/tls/power7/altivec/libc.so.6
29320:       trying file=somedir/tls/power7/dfp/libc.so.6
29320:       trying file=somedir/tls/power7/libc.so.6
29320:       trying file=somedir/tls/altivec/dfp/libc.so.6
29320:       trying file=somedir/tls/altivec/libc.so.6
29320:       trying file=somedir/tls/dfp/libc.so.6
29320:       trying file=somedir/tls/libc.so.6
29320:       trying file=somedir/power7/altivec/dfp/libc.so.6
29320:       trying file=somedir/power7/altivec/libc.so.6
29320:       trying file=somedir/power7/dfp/libc.so.6
29320:       trying file=somedir/power7/libc.so.6
29320:       trying file=somedir/altivec/dfp/libc.so.6
29320:       trying file=somedir/altivec/libc.so.6
29320:       trying file=somedir/dfp/libc.so.6
29320:       trying file=somedir/libc.so.6
29320:      search cache=/etc/ld.so.cache
29320:       trying file=/lib64/power7/libc.so.6
29320:
29320:
29320:     calling init: /lib64/power7/libc.so.6
29320:
29320:
29320:     initialize program: auxv
29320:
29320:
29320:     transferring control: auxv
29320: AT_UID is: 1014
29320:
29320:     calling fini: auxv [0]
29320:
29320:
29320:     calling fini: /lib64/power7/libc.so.6 [0]
29320:

We can see from this example that the search path is modified using the information from the AUX vector (that is, ../power7/altivec/dfp). We can also see from the example, if the library is not found in somedir, then the dynamic linker continues to search in the library search path. The dynamic linker recognizes an LD_HWCAP_MASK environment variable which may be used to enable or disable the additional directory searches based on the AT_HWCAP bits. For example LD_HWCAP_MASK=0 will disable all AT_HWCAP based searches, leaving only the AT_PLATFORM based search. This may improve application startup for large applications.

Platform-specific optimization techniques

There are a few methods that a developer may use to optimize a library or application to run on a particular platform:

  • Dynamic Code Path Code Path Selection Based on AUXV Information – This method checks the AUX vector for the availability of certain features and makes code branch decisions based on what is found.
  • Processor Tuned Libraries – This method directs the compiler to generate the code that is tuned to a particular processor. Developing a processor-tuned library is perfect for those libraries that contain expensive function calls, for example, C library functions such as memcpy, memset, and math routines (libm). An advantage of using processor tuned libraries is that only one installation media is needed to support multiple systems. This mechanism is currently being used in RHEL5 and SLES10 and later releases. This mechanism is portable only for shared libraries but may be used for applications which aren’t intended to be portable to different platforms.
  • Target Specific Optimization With The GNU Indirect Function Mechanism – This mechanism allows multiple optimized versions of a routine to exist within the same library whereby the selection of the routine to satisfy a function call is deferred until the platform information is queried at run time.

Dynamic code path selection based on AUXV information

The information from the auxiliary vector may be used to dynamically determine which code to run based on the hardware capabilities of the system. The auxv/hwcap.h system header file defines platform-specific hardware capabilities that map to bits in the auxiliary vector’s HWCAP field. Employing one of the methods mentioned above to gain access to the AUXV, and using the auxv/hwcap.h header file, one could dynamically select the best function to use for a given system.

Example:

/*  Author(s):  David Flaherty  <[email protected]>
 *
 *  Copyright (c) 2011, IBM Corporation
 *
 *  Redistribution and use in source and binary forms, with or without
 *  modification, are permitted provided that the following conditions are met:
 *      * Redistributions of source code must retain the above copyright
 *        notice, this list of conditions and the following disclaimer.
 *      * Redistributions in binary form must reproduce the above copyright
 *        notice, this list of conditions and the following disclaimer in the
 *        documentation and/or other materials provided with the distribution.
 *      * Neither the name of the IBM Corporation nor the names of its
 *        contributors may be used to endorse or promote products derived from
 *        this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL IBM CORPORATION BE LIABLE FOR ANY
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 * DAMAGE.
 */

1 #include <stdio.h>
2 #include <elf.h>
3 #include <auxv/hwcap.h>
4
5 main(int argc, char* argv[], char* envp[])
6 {
7    Elf64_auxv_t *auxv;
8    unsigned long int hwcap_mask;
9
10    while(*envp++ != NULL);
11
12    for (auxv = (Elf64_auxv_t *)envp; auxv->a_type != AT_NULL; auxv++){
13       if( auxv->a_type == AT_HWCAP){
14          hwcap_mask = (unsigned long int) auxv->a_un.a_val;
15          if (hwcap_mask & PPC_FEATURE_HAS_ALTIVEC){
16              printf("\n\tThis system has SIMD/Vector Unit support\n");
17              printf("\tWe could call Altivec specific code here...\n");
18          }
19       }
20    }
21 }
22

Here is another example of dynamic code path selection using getauxval():

/*  Author(s):  David Flaherty  <[email protected]>
 *              Tulio Magno Quites Machado Filho  <[email protected]>
 *
 *  Copyright (c) 2018, IBM Corporation
 *
 *  Redistribution and use in source and binary forms, with or without
 *  modification, are permitted provided that the following conditions are met:
 *      * Redistributions of source code must retain the above copyright
 *        notice, this list of conditions and the following disclaimer.
 *      * Redistributions in binary form must reproduce the above copyright
 *        notice, this list of conditions and the following disclaimer in the
 *        documentation and/or other materials provided with the distribution.
 *      * Neither the name of the IBM Corporation nor the names of its
 *        contributors may be used to endorse or promote products derived from
 *        this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL IBM CORPORATION BE LIABLE FOR ANY
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 * DAMAGE.
 */

#include <assert.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/auxv.h>

int main()
{
  unsigned long hwcap;
  char * platform = NULL;

  hwcap = getauxval (AT_HWCAP2);

  printf("AT_HWCAP2=0x%0*lx\n",2 * (int) sizeof(unsigned long), hwcap);

  #ifdef __powerpc__
    if (hwcap & PPC_FEATURE2_HAS_VEC_CRYPTO)
        printf("  Vector crypto support\n");

    if (hwcap & PPC_FEATURE2_ARCH_3_00)
        printf("  Power ISA 3.0 support\n");
  #endif

  platform = (char*) getauxval (AT_PLATFORM);

  if (platform)
    printf("PLATFORM=%s\n", platform);
  else
    printf("AT_PLATFORM not supported\n");

  assert (getauxval (AT_PAGESZ) == getpagesize ());

  return 0;
}

Output:

AT_HWCAP2=0x00000000bee00000
Vector crypto support
Power ISA 3.0 support
PLATFORM=power9

Processor tuned libraries

Processor tuned libraries are libraries that are optimized for the specific hardware capabilities of the platform the libraries are running on. This is accomplished by a combination of optimized (tuned) code generated by the compiler, the link editor (linker), post link optimization, optimized library selection by the dynamic linker (loader), and platform identification information from the kernel.

In order to create a processor tuned library, the library source code must be compiled with the -mcpu=cpu_type switch to optimize the library for the hardware on which it will be run. This switch sets the architecture type, register usage, choice of mnemonics, and instruction scheduling parameters for the machine’s cpu_type. The library should be compiled multiple times with the -mcpu switch, once for each processor where unique optimization paths are possible. A default library is also built to support any processor on which the library may run (that is, -mcpu=powerpc64le).

The processor tuned libraries must be installed in a library directory structure recognized by the dynamic linker, which is a hardware/platform specific subdirectory under the canonical system /lib[64]/ directory. The subdirectory name is the name of the processor, or the name of the hardware capability as described by the auxiliary vector AT_PLATFORM field. The numeric chip_type names (970, 7450, …) have subdirectory names that are prefixed with ppc (for example, ppc970, ppc7450). If the cpu_type has aliases (for example, G5 is an alias for 970) the more generic name is used for the directory name, that is, ppc970 not G5 and ppc7450 not G4. The processor-tuned libraries can also be installed in other locations (other than /lib[64]). For example /usr/lib[64] or /usr/local/lib[64]. After the library is installed in the appropriate platform qualified directory (that is, /usr/local/lib64/power7), all you need to do is include the unqualified directory (that is, /usr/local/lib64) in LD_LIBRARY_PATH.

When creating a final executable file or library, the Link Editor links against the default library shared object files, not the optimized libraries. Optimized library selection is made at run time by the dynamic linker based on the platform that the application is running on. The default library shared object file is installed in the canonical system /lib[64]/ directory.

Source code file selection

If the library has source code that requires specific optimizations that need to be selected at library build time then the library configure and make framework need to be enabled. GLIBC uses a sysdeps/ directory structure mechanism to accomplish this.

Note: Most projects will not have this elaborate source selection mechanism.

The configure and make scripts for GLIBC have a search path mechanism whereby it searches through the directories in the project directory tree in a prescribed sequence for each platform attempting to discover files (by precedence) which satisfy the cpu_type with which the library was configured.

${glibc_source}/sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/${cpu_type}/fpu
${glibc_source}/sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/${cpu_type}.
${glibc_source}/sysdeps/powerpc/powerpc[32|64]/${cpu_type}/fpu
${glibc_source}/sysdeps/powerpc/powerpc[32|64]/${cpu_type}.

For example, a hand-optimized implementation of memcpy for the IBM OWER7B. chip will have memcpy.S placed in ${source}/sysdeps/powerpc/powerpc64/power7/memcpy.S. This version will be selected before the default version, ${glibc_source}/string/memcpy.c when GLIBC is configured using --with-cpu=power7

Target-specific optimization with the GNU Indirect Function mechanism

Target-specific optimization allows for optimizing the critical path functions in an application or library without having to have multiple copies of a library on the system. Effectively, it embeds a number of platform-optimized routines within a library where most of the generated code is common to all the processors of a platform. The appropriate optimized branch is resolved at run time by the dynamic linker the first time the routine is hit.

This is accomplished using the GNU Indirect Function mechanism (using STT_GNU_IFUNC relocations). These ELF relocations represent a relocation type that does not resolve to an actual symbol location, but rather resolve to the address of an indirectly optimized determination function. This indirect function returns a pointer to the correctly optimized function.

Simple example:

/*  Author(s):  Tulio Magno Quites Machado Filho  <[email protected]>
 *
 *  Copyright (c) 2018, IBM Corporation
 *
 *  Redistribution and use in source and binary forms, with or without
 *  modification, are permitted provided that the following conditions are met:
 *      * Redistributions of source code must retain the above copyright
 *        notice, this list of conditions and the following disclaimer.
 *      * Redistributions in binary form must reproduce the above copyright
 *        notice, this list of conditions and the following disclaimer in the
 *        documentation and/or other materials provided with the distribution.
 *      * Neither the name of the IBM Corporation nor the names of its
 *        contributors may be used to endorse or promote products derived from
 *        this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL IBM CORPORATION BE LIABLE FOR ANY
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 * DAMAGE.
 */

#include <stdlib.h>
#include <time.h>

void foo1(void);
void foo2(void);
void foo3(void);
void foo_default(void);

void foo (void) __attribute__ ((ifunc ("foo_resolver")));

static void (*foo_resolver (void)) (void)
{
  int rnum;
  srand((unsigned int) time(NULL));
  rnum = rand() % 4;

  switch (rnum)
  {
    case 1:
      return foo1;
    case 2:
      return foo2;
    case 3:
      return foo3;
    default:
      return foo_default;
  }
}

In the previous example, foo() is declared as an indirect function. When an application uses the foo function, the dynamic linker calls foo_resolver() to determine which foo function should be called. In the example above, the decision is based on a random number. A more appropriate method would be to use the information in the AUX vector to make the correct determination.

Here’s an example using the AUX vector through compiler built-ins:

/*  Author(s):  Tulio Magno Quites Machado Filho  <[email protected]>
 *
 *  Copyright (c) 2018, IBM Corporation
 *
 *  Redistribution and use in source and binary forms, with or without
 *  modification, are permitted provided that the following conditions are met:
 *      * Redistributions of source code must retain the above copyright
 *        notice, this list of conditions and the following disclaimer.
 *      * Redistributions in binary form must reproduce the above copyright
 *        notice, this list of conditions and the following disclaimer in the
 *        documentation and/or other materials provided with the distribution.
 *      * Neither the name of the IBM Corporation nor the names of its
 *        contributors may be used to endorse or promote products derived from
 *        this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL IBM CORPORATION BE LIABLE FOR ANY
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 * DAMAGE.
 */

#include <stdlib.h>
#include <time.h>

void foo_power9(void);
void foo_power8(void);
void foo_power7(void);
void foo_default(void);

void foo (void) __attribute__ ((ifunc ("foo_resolver")));

static void (*foo_resolver (void)) (void)
{
  if (__builtin_cpu_is ("power9"))
    return foo_power9;
  else if (__builtin_cpu_is ("power8"))
    return foo_power8;
  else if (__builtin_cpu_is ("power7"))
    return foo_power7;
  else
    return foo_default;
}

Each of the foo functions would be optimized for a specific platform or processor. The remaining library functions will be optimized for a common platform. This gives the user a single library binary that contains optimization for specific platforms.

Compile time code path selection based on C preprocessor (cpp) macros

Code path selection can also be decided at compile time using C preprocessor (cpp) macros. These macros can be used with the #ifdef or #if defined() statements to make a discussion as to which code path to take. There are many cpp macros available. To see a list of cpp macros (and their values) that are available with specific compile options, use the cpp -dM command.

Example: The resulting list from a cpp -dM command can be extensive, so we will use grep to show a subset.

~/sandbox/> cpp -dM auxv3.c | grep '_ARCH_PWR'
#define _ARCH_PWR4 1
~/sandbox/> cpp -dM -mcpu=power7 auxv3.c | grep '_ARCH_PWR'
#define _ARCH_PWR4 1
#define _ARCH_PWR5 1
#define _ARCH_PWR6 1
#define _ARCH_PWR7 1
#define _ARCH_PWR5X 1

Using the cpp macros, we can make compile time decisions as to which code path to take

#ifdef _ARCH_PWR7
   ... some Power7 specific code
#else
   ... generic code
#endif

Comparing processor-tuned libraries with target-specific optimization

Processor tuned

  • Good – The entire library has the opportunity to be optimized for the processor, with additional optimization possible for specific functions.
  • Bad – Multiple copies of same library on the system.
  • Test effort – Full library on multiple platforms.

Target-specific optimization

  • Good – Single binary with optimized code for various functions.
  • Bad – Most of the libraries are compiled for a common platform, only some functions have special optimization.
  • Test effort – The generic paths tested for the common platform, but the optimized paths need to be tested for each supported platform.