Performance Profiling of MPI Code with MPE on Glacier

Introduction

MPE (Multi-Processing Environment) is a software package for performance analysis of MPI programs. An example of using MPE on the WestGrid Glacier cluster is given below. For an overview of MPE and links to examples on other WestGrid systems, click here or use the menus in the left side bar.

Running MPE-instrumented code produces log files that can be post-processed to get a graphical representation of the communication pattern in an MPI program. By looking at the time course of the MPI calls by the different processes, it is possible to see problems with synchronization or load-balancing. This can be done without changing the program, but, it still needs to be recompiled with additional libraries as shown in the following example.

Compiling

The source code for a Fortran 77 program used in this example is example1.f . To compile the code on the WestGrid Glacier system and link with MPE libraries, type:

mpif77 -o example1 example1.f -mpilog

For a C program, using the source code example1.c , compile with:

mpicc -o example1 example1.c -mpilog

Running

Before running the program, an environmental variable needs to be set to choose a particular log file format. For example, to select the ALOG format, bash shell users can type:

export MPE_LOG_FORMAT=ALOG

The program can then be run on Glacier using the following line in a batch script (such as described in the Glacier parallel programming documentation):

mpirun -np 4 ./example1

This will produce a file called "example1.alog". In general, the log file will be called <program>.alog, where "<program>" is replaced by the program name.

Viewing and interpreting the results

For a graphical view of the log file contents (assuming that you have an X display server running and have logged into Glacier with X Window tunneling enabled), use logviewer:

logviewer example1.alog

The logviewer is a wrapper script that selects the appropriate viewer program, upshot, jumpshot 2 or jumpshot 3, depending on the log file format used. In the example below, the main upshot window is shown. Click on 'Setup" to continue.

Glacier logviewer example 1

You will then get two new windows. One of them is an error message, such as shown here. Just click "OK" to continue.

Glacier logviewer error window

The other window shows the progress of the program as a function of time for each of the processes and the communication that occurs between the processes. In this example, process 0 uses MPI_Send to send messages to the other processors at the beginning of the program and receives messages using MPI_Recv at the end of the program. In the time intervals that are represented by just a thin line the process is doing non-MPI tasks (that is, computing). An efficient parallel program should mainly consist of such lines. During the run of the program, the MPI calls should represent an insignificant part of the total time used.

Glacier Logviewer example 2

 


Updated 2008-01-31.