Slide 1

Slide 1 text

Introduction to MPI Saravanan Vijayakumaran IIT Bombay

Slide 2

Slide 2 text

What is MPI? ● Message Passing Interface is a standard for communication among processes ● Defines a set of library routines for writing portable message-passing programs ● Several open source and commercial implementations of MPI exist ● MPI programs can be written in C, C++, Fortran, Java, Python ● MPI programs target distributed memory systems

Slide 3

Slide 3 text

Distributed Memory Systems CPU Memory CPU Memory CPU Memory CPU Memory Interconnect

Slide 4

Slide 4 text

Shared Memory Systems CPU CPU Memory CPU CPU Interconnect

Slide 5

Slide 5 text

Hello World in MPI #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; }

Slide 6

Slide 6 text

Compilation and Execution #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } ● mpicc -o mpi_hello mpi_hello.c ● mpirun -n ./mpi_hello ● mpirun -n 1 ./mpi_hello Hello from rank 0 out of 1 ● mpirun -n 3 ./mpi_hello Hello from rank 0 out of 3 Hello from rank 1 out of 3 Hello from rank 2 out of 3

Slide 7

Slide 7 text

Hello World in MPI #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } ● mpi.h contains all definitions and declarations needed to compile an MPI program ● All identifiers defined by MPI begin with MPI_

Slide 8

Slide 8 text

Hello World in MPI #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } ● MPI_Init initializes the MPI system ● No other MPI functions should be called before MPI_Init ● Syntax int MPI_Init( int* argc_p; char*** argv_p; ) ● Returns an integer error code

Slide 9

Slide 9 text

Hello World in MPI #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } ● MPI_Finalize shuts down the MPI system and frees any allocated MPI resources

Slide 10

Slide 10 text

Hello World in MPI #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } ● In MPI, a communicator is a collection of processes which can send messages to each other ● MPI_COMM_WORLD is the default communicator ● MPI_Comm_size returns the number of processes in the communicator in the second argument ● If “mpirun -n 3 ./mpi_hello” is run, comm_size is 3

Slide 11

Slide 11 text

Hello World in MPI #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } ● Multiple copies of the program are executed in independent processes ● Each process in MPI is assigned a rank which is an integer from 0 to comm_size-1 ● MPI_Comm_rank returns the rank of the process in the second argument

Slide 12

Slide 12 text

Ranks of Processes ● mpirun -n 4 ./mpi_hello Rank 0 CPU Memory Rank 1 CPU Memory Rank 2 CPU Memory Rank 3 CPU Memory Interconnect

Slide 13

Slide 13 text

Hello World in MPI #include #include int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } ● mpirun -n 4 ./mpi_hello Hello from rank 0 out of 4 Hello from rank 1 out of 4 Hello from rank 2 out of 4 Hello from rank 3 out of 4 ● Four processes are launched each running a copy of the program ● Each copy will output only one of the above lines

Slide 14

Slide 14 text

Single Program Multiple Data ● mpirun -n 4 ./mpi_hello Rank 0 CPU Memory Rank 1 CPU Memory Rank 2 CPU Memory Rank 3 CPU Memory Hello from rank 0 out of 4 Hello from rank 1 out of 4 Hello from rank 2 out of 4 Hello from rank 3 out of 4

Slide 15

Slide 15 text

Single Program Multiple Data ● MPI programs follow the SPMD paradigm ● The same program will be run by all the processes in the distributed memory system ● Each process has a unique rank ● The program will behave differently based on its rank

Slide 16

Slide 16 text

Example: Finding Primes ● Suppose we want to find all prime numbers less than 10,000,000 ● Suppose we have a function int isPrime(int N) which returns 1 if N is prime and returns 0 otherwise ● How can we use MPI to achieve this task in parallel?

Slide 17

Slide 17 text

Example: Finding Primes // Initialize MPI and assume comm_size is 5 // Get the rank of the process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++) { if(isPrime(i) == 1) { // Store i in a file } } rank rank*2,000,000 (rank+1)*2,000,000 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000

Slide 18

Slide 18 text

Example: Finding Primes // Initialize MPI and assume comm_size is 5 // Get the rank of the process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++) { if(isPrime(i) == 1) { // Store i in a file } } rank rank*2,000,000 (rank+1)*2,000,000 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000 Rank 0 CPU Memory Rank 1 CPU Memory Rank 2 Memory Rank 3 CPU Memory 0 to 2M - 1 2M to 4M - 1 4M to 6M - 1 6M to 8M - 1 CPU Rank 4 CPU Memory 8M to 10M - 1

Slide 19

Slide 19 text

Process to Process Communication ● Finding primes did not require any communication between processes ● More complex applications will require inter-process communication ● Message transfer involves a matching pair of MPI_Send and MPI_Recv function calls ● Suppose process 0 wants to send a message to process 1 Rank 0 CPU Memory Rank 1 CPU Memory MPI_Send called here MPI_Recv called here

Slide 20

Slide 20 text

MPI_Send int MPI_Send( void* msg_buffer_p; // Pointer to message buffer int msg_size; // Size of message MPI_Datatype msg_type; // Message type int dest; // Rank of destination process int tag; // Message tag MPI_Comm communicator; // Communicator id )

Slide 21

Slide 21 text

MPI_Send int MPI_Send( void* msg_buffer_p; // Pointer to message buffer int msg_size; // Size of message MPI_Datatype msg_type; // Message type int dest; // Rank of destination process int tag; // Message tag MPI_Comm communicator; // Communicator id ) ● int a = 5; MPI_Send(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD) ● Message type can be MPI_CHAR, MPI_LONG, MPI_FLOAT etc ● Message size is the number of elements of msg_type in the message ● Message tag is used to differentiate messages of same type to the same destination

Slide 22

Slide 22 text

MPI_Recv int MPI_Recv( void* msg_buffer_p; // Pointer to message buffer int msg_size; // Size of message MPI_Datatype msg_type; // Message type int source; // Rank of source process int tag; // Message tag MPI_Comm communicator; // Communicator id MPI_Status* status_p; // Message status )

Slide 23

Slide 23 text

MPI_Recv int MPI_Recv( void* msg_buffer_p; // Pointer to message buffer int msg_size; // Size of message MPI_Datatype msg_type; // Message type int source; // Rank of source process int tag; // Message tag MPI_Comm communicator; // Communicator id MPI_Status* status_p; // Message status ) ● int b; MPI_Status status; MPI_Recv(&b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status) ● The received integer will be written to variable b

Slide 24

Slide 24 text

Example: Area Calculation a b f(x) x a b f(x) x ● Suppose we want to calculate the area under a function in the interval [a,b] ● We could split the interval into parts and assign each part to a process ● The total area can be calculated by adding the areas of the parts ● A single process needs to receive all the part areas

Slide 25

Slide 25 text

Example: Area Calculation // Initialize MPI and assume comm_size is 4 int rank; float part_area, total_area; MPI_Comm_rank(MPI_COMM_WORLD, &rank); part_area = calculate_part_area(rank); if (rank != 0) { MPI_Send(&part_area, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD); } else { total_area = part_area; for(int source = 1; source < comm_size; source++) { MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0, MPI_COMM_WORLD, &status); total_area = total_area + part_area; } } a b f(x) x 0 1 2 3 Rank 0 Rank 2 Rank 3 Rank 1

Slide 26

Slide 26 text

Blocking Behaviour of MPI_Recv/MPI_Send if (rank != 0) { MPI_Send(&part_area, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD); } else { total_area = part_area; for(int source = 1; source < comm_size; source++) { MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0, MPI_COMM_WORLD, &status); total_area = total_area + part_area; } } ● An MPI_Recv call blocks until a message with matching parameters is received ● Even if message from rank 2 arrives first it will not be processed ● An MPI_Send call may or may not block ● MPI_Ssend can be used for synchronous send Rank 0 Rank 2 Rank 3 Rank 1

Slide 27

Slide 27 text

Example: Finding Primes Revisited ● Want to find prime numbers < 10,000,000 ● Previously we divided the numbers equally among the processes ● But different blocks of numbers may take different times ● We can assign a block of 5,000 numbers to a process initially ● When a process finished its block, it can request a new block rank rank*2,000,000 (rank+1)*2,000,00 0 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000

Slide 28

Slide 28 text

Example: Finding Primes Revisited int block_start, next_block_start, done = 1; if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } ● Assume rank 0 process does block distribution ● The other processes request a new block after finding primes in current block Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start

Slide 29

Slide 29 text

Example: Finding Primes Revisited int block_start, next_block_start, done = 1; if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } ● But rank 0 process does not know which process will request next block ● MPI_Recv on the wrong source rank will block ● Source rank is needed to send the next_block_start Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start

Slide 30

Slide 30 text

Receiving Message from Unknown Source ● Call MPI_Recv with MPI_ANY_SOURCE in place of source rank MPI_Status status; MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); ● The actual source rank will be in status.MPI_SOURCE ● MPI_Status also contains the message length and message tag

Slide 31

Slide 31 text

Example: Finding Primes Revisited int block_start, next_block_start, done = 1; if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } ● MPI_Recv is called with MPI_ANY_SOURCE as source rank ● status.MPI_SOURCE can be used to send the message to the requesting process Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start

Slide 32

Slide 32 text

Collective Communications ● Broadcasting information to all processes using MPI_Bcast int N; if (rank == 0) { printf(“Enter the value of N \n”); scanf(“%d”, &N); } MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD); ● Synchronizing processes with MPI_Barrier MPI_Barrier( MPI_COMM_WORLD);

Slide 33

Slide 33 text

MPI Use Cases ● Ideal for compute intensive applications with high degree of parallelism ● Processes should not communicate often ● Not suitable for disk IO intensive applications

Slide 34

Slide 34 text

Learning resources ● MPI Tutorial by Blaise Barney, https://computing.llnl.gov/tutorials/mpi/ ● An Introduction to Parallel Programming, Peter S. Pacheco, Morgan-Kauffman Publishers, 2011 ● Parallel Programming in C with MPI and OpenMP, Michael J. Quinn, Tata-McGraw Hill, 2004

Slide 35

Slide 35 text

Thanks for your attention!