Hands-On Exercises for ScaLAPACK

Exercise 4: ScaLAPACK - Example Program 1

Information: This is a very simple program that solves a linear system by calling the ScaLAPACK routine PSGESV.  More complete details of this example program can be found in Chapter 2 of the ScaLAPACK Users' Guide. This example program demonstrates the basic requirements to call a ScaLAPACK routine: initializing the process grid, assigning the matrix to the processes, calling the ScaLAPACK routine, and releasing the process grid.

This example program solves the 9-by-9 system of linear equations given by:

/  19   3   1   12   1   16   1   3  11 \   / x1 \   / 0 \
| -19   3   1   12   1   16   1   3  11 |   | x2 |   | 0 |
| -19  -3   1   12   1   16   1   3  11 |   | x3 |   | 1 |
| -19  -3  -1   12   1   16   1   3  11 |   | x4 |   | 0 |
| -19  -3  -1  -12   1   16   1   3  11 | * | x5 | = | 0 |
| -19  -3  -1  -12  -1   16   1   3  11 |   | x6 |   | 0 |
| -19  -3  -1  -12  -1  -16   1   3  11 |   | x7 |   | 0 |
| -19   3  -1  -12  -1  -16  -1   3  11 |   | x8 |   | 0 |
\ -19  -3  -1  -12  -1  -16  -1  -3  11 /   \ x9 /   \ 0 /
using the ScaLAPACK driver routine PSGESV.  The ScaLAPACK routine PSGESV solves a system of linear equations A*X = B, where the coefficient matrix (denoted by A) and the right-hand-side matrix (denoted by B) are real, general distributed matrices.  The coefficient matrix A is distributed as depicted below and, for simplicity, we solve the system for one right-hand side (NRHS=1); that is, the matrix B is a vector. The third element of the matrix B is equal to 1, and all other elements are equal to 0.  After solving this  system of equations, the solution vector X is given by
/ x1 \   /   0  \
| x2 |   | -1/6 |
| x3 |   |  1/2 |
| x4 |   |   0  |
| x5 | = |   0  |
| x6 |   |   0  |
| x7 |   | -1/2 |
| x8 |   |  1/6 |
\ x9 /   \   0  /
Let us assume that the matrix A is partitioned and distributed such that we have chosen the row and column block sizes as MB=NB=2, and the matrix is distributed on a 2-by-3 process grid (P_r=2, P_c=3).  The partitioning and distribution of our example matrix A is represented in the two figures below, where, to aid visualization, we use the notation s=19, c=3, a=1, l=12, p=16, and k=11.

Partioning of global matrix A:

  s   c |  a   l |  a   p |  a   c k
 -s   c |  a   l |  a   p a   c |  k
--------+--------+--------+--------+----
 -s  -c |  a   l |  a   p |  a   c k
 -s  -c | -a   l |  a   p a   c k
--------+--------+--------+--------+----
 -s  -c | -a  -l |  a   p a   c |  k
 -s  -c | -a  -l | -a   p |  a   c |  k
--------+--------+--------+--------+----
 -s  -c | -a  -l | -a  -p a   c |  k
 -s   c | -a  -l | -a  -p | -a   c |  k
--------+--------+--------+--------+----
 -s  -c | -a  -l | -a  -p | -a  -c k
Mapping of matrix A onto process grid (P_r=2, P_c=3). Note, for example, that process (0,0) contains a local array of size A(5,4).

                         0                1          2

  s   c |  a   c |  a   l |  k |  a   p
 -s   c |  a   c |  a   l |  k |  a   p
--------+--------+--------+----+--------
 -s  -c |  a   c | -a  -l k |  a   p   0
 -s  -c |  a   c | -a  -l |  k | -a   p
--------+--------+--------+----+--------
 -s  -c | -a  -c | -a  -l |  k | -a  -p
-----------------+-------------+--------
 -s  -c a   c a   l |  k a   p
 -s  -c |  a   c | -a   l |  k |  a   p
--------+--------+--------+----+--------  1
 -s  -c |  a   c | -a  -l |  k | -a  -p
 -s   c | -a   c | -a  -l k | -a  -p
The partitioning and distribution of our example matrix B are demonstrated in the figure below. Note that the matrix B is distributed only in column 0 of the process grid. All other columns in the process grid possess an empty local portion of the matrix B.

Mapping of matrix B onto process grid (P_r=2, P_c=3):

             0      1     2

  b1  |    |
  b2  |    |
------|    |
  b5  |    |    0
  b6  |    |
------|    |
  b9  |    |
------+----------
  b3     |
  b4  |    |
------|    |    1
  b7  |    |
  b8  |    |
On exit from PSGESV, process (0,0) contains (in the global view) the global vector X and (in the local view) the local array B given by
/ x1 \     / b1 \   /   0  \
| x2 |     | b2 |   | -1/6 |
| x5 |     | b5 | = |   0  |
| x6 |     | b6 |   |   0  |
\ x9 /     \ b9 /   \   0  /
and process (1,0) contains (in the global view) the global vector X and (in the local view) local array B given by
 
/ x3 \     / b3 \   /  1/2 \
| x4 |     | b4 |   |   0  |
| x7 |     | b7 | = | -1/2 |
\ x8 /     \ b8 /   \  1/6 /
The normalized residual check
          || A*x - b ||
-------------------------------
( || x || * || A || * eps * N )
where eps is the machine precision and N is the dimension of the matrix, is performed on the solution to verify the accuracy of the results.

Simplifying Assumptions Used in Example Program. Several simplifying assumptions and/or restrictions have been made in this example program in order to present the most basic example for the user:

  1. We have chosen a small block size, MB=NB=2; however, this should not be regarded as a typical choice of block size in a user's application.  For best performance, a choice of MB=NB=32 or MB=NB=64 (or eventually more) is more suitable. Refer to Chapter 5 of the ScaLAPACK Users' Guide for further details.
  2. A simplistic subroutine MATINIT is used to assign matrices A and B to the process grid.  Note that this subroutine hardcodes the local arrays on each process and does not perform communication. It is not a ScaLAPACK routine and it is provided only for the purposes of this example program.
  3. We assume RSRC=CSRC=0, and thus both matrices A and B are distributed across the process grid starting with process (0,0).  In general, however, any process in the current process grid can be assigned to receive the first element of the distributed matrix.
  4. We have set the local leading dimension of local array A and the local leading dimension of local array B to be the same over all process rows in the process grid. The variable MXLLDA is equal to the maximum local leading dimension for array A (denoted LLD_A) over all process rows.  Likewise, the variable MXLLDB is the maximum local leading dimension for array B (denoted LLD_B) over all process rows.  In general, however, the local leading dimension of the local array can differ from process to process in the process grid.
  5. The system is solved by using the entire matrix A, as opposed to a submatrix of A, so the global indices, denoted by IA, JA, IB, and JB, into the matrix are equal to 1.
Exercises: For this exercise, use the files psgesvdriver.f. You should compile it with make (you may also need to perform minor modifications in the file make.inc). Questions:  
Hands-on ScaLAPACK Tools Home