PGI Compiler
The Portland Group (PGI) compiler suite is the default compiler suite on Rosa, so the associated programming environment module file, PrgEnv-pgi, is loaded at login time. The PGI compiler suite includes Fortran 77, Fortran 90/95, C and C++ compilers.
Versions
The current default version of the compiler is shown here. Older and/or newer versions of the compiler may be available: to see which versions are available issue module avail pgi. To use a different version of the PGI compiler issue module switch pgi pgi/<new_version>.
Invocation
To compile a Fortran 90 MPI code on Rosa invoke the Cray compiler wrapper:
> ftn [compiler options] example.f90 -o example.x
Likewise for C and C++ codes:
> cc [compiler options] example.c -o example.x
> CC [compiler options] example.C -o example.x
The man pages (man pgf95; man pgcc) provide information on all the compiler options available. Note that if two compiler options conflict, the last option on the command line takes precedence!
Source files suffix rules
The Portland Group Fortran compiler supports the following file extensions:
.f, .F, .for, .fpp, .f90, .F90, .f95, and .F95
By default, the compiler expects fixed form Fortran source for file suffixes .f, .F, .for and .fpp, and free form for suffixes .f90, .F90, .f95 and .F95.
Suffix | Processing to be done |
.f | fixed form Fortran source; compile |
.F | fixed form Fortran source; preprocess, compile |
.f90 | free form Fortran source; compile |
.F90 | free form Fortran source; preprocess, compile |
.f95 | free form Fortran source; compile |
.F95 | free form Fortran source; preprocess, compile |
.for | fixed form Fortran source; compile |
.FOR | fixed form Fortran source; preprocess, compile |
.fpp | fixed form Fortran source; preprocess, compile |
.s | assembler source; assemble |
.S | assembler source; preprocess, assemble |
.o | object file; passed to linker |
.a | library archive file; passed to linker |
To override the default suffix rules use the following compiler flags:
- -Mextend: Allow 132-column source lines
- -Mfixed: Indicates source code uses fixed form specifications
- -Mfreeform: Indicates source code uses free form specifications
Optimization
Using the appropriate compiler optimization flags is essential for reasonable performance of your application on the XT5 platform. We recommend in the first instance to use the following optimization flags:
- -fastsse -tp shanghai-64
The -fastsse flag is equivalent to -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mscalarsse -Mcache_align -Mflushz, where:
- -O2 specifies general optimization level 2
- -Mnoframe prevents the generation of code to set up a stack frame
- -Munroll=c:n completely unrolls loops with loop count of n or less
- -Mlre indicates loop-carried redundancy elimination
- -Mautoinline enable automatic function inlining in C/C++
- -Mvect=sse generates SSE and SSE2 instructions for the Opteron
- -Mscalarsse generates scalar SSE code with xmm registers
- -Mcache_align aligns long objects on cache-line boundaries
- -Mflushz flushes SSE denormal numbers to zero
and the -tp shanghai-64 flag optimizes code specifically for the AMD Opteron quad-core Shanghai architecture.
More aggressive optimization can be obtained by adding -O3, ie:
- -O3 -fastsse
At the -O3 level, all level 2 optimizations are performed, and in addition, more aggressive code hoisting and scalar replacement optimizations are performed. These optimizations may speed up your code but might also slow it down, so it is always recommended to benchmark the performance of your code with a variety of options enabled/disabled. It may be worth experimenting with the -Munroll, -Minline, -Mmovnt and -Mconcur options in particular. Use -help to list the compiler options available or to see details on how to use a given option, e.g. pgf95 -Munroll -help.
SSE vector instructions
As mentioned above, the -fastsse flag is used to enable SSE vectorization and is a key in getting good performance from the AMD Opteron processor. Information regarding the optimizations achieved by the compiler can be written to standard error with the -Minfo flag. The following are potential barriers to SSE vectorization:
- Apparent dependencies and C pointers: give the compiler information on what can be vectorized by using the -Msafeptr flag, via pragmas, or by employing the restrict type qualifier
- Function calls: try to inline the functions by using the -Minline or -Mipa=inline flags
- Type conversions: manually convert constants or use compiler flags
- Large number of statements: try the -Mvect=nosizelimit flag
- Too few iterations: unroll the loops
- Genuine dependencies: try to restructure the loop manually
Interprocedural analysis
In addition to -fastsse, the -Mipa option for interprocedural analysis and optimization (IPA) can in some cases improve performance by 5-10%. We suggest using the following IPA options:
- -Mipa=fast,inline
Note that the interprocedural analysis flag must be used at both compile and link time.
OpenMP
For the PGI compiler use the -mp=nonuma option to enable OpenMP 2.5 support.
Debugging
The following compiler flags may be useful for helping debug your code:
Flag | Meaning |
-g | Generate symbolic debugging information (useful at -O0) |
-gopt | Generate symbolic debugging information in the presence of optimization |
-Mbounds | Adds array bounds checking |
-v | Give verbose output |
-Mlist | Generate a listing file |
-Minfo | Provide information on the optimizations performed by the compiler |
Unsupported options
The PGI Cluster Development Kit (CDK) options -Mprof=mpi, -Mmpi, and -Mscalapack are not supported on the Cray XT5.
Further Information
See the man pages for detailed information on the compilers and compiler flags (man pgcc, man pgf95)
Refer to the online documentation from the Portland Group.


