Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to MPI

sarva
April 07, 2016

Introduction to MPI

MPI tutorial

sarva

April 07, 2016
Tweet

More Decks by sarva

Other Decks in Programming

Transcript

  1. What is MPI? • Message Passing Interface is a standard

    for communication among processes • Defines a set of library routines for writing portable message-passing programs • Several open source and commercial implementations of MPI exist • MPI programs can be written in C, C++, Fortran, Java, Python • MPI programs target distributed memory systems
  2. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; }
  3. Compilation and Execution #include <mpi.h> #include <stdio.h> int main(int argc,

    char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • mpicc -o mpi_hello mpi_hello.c • mpirun -n <number of processes> ./mpi_hello • mpirun -n 1 ./mpi_hello Hello from rank 0 out of 1 • mpirun -n 3 ./mpi_hello Hello from rank 0 out of 3 Hello from rank 1 out of 3 Hello from rank 2 out of 3
  4. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • mpi.h contains all definitions and declarations needed to compile an MPI program • All identifiers defined by MPI begin with MPI_
  5. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • MPI_Init initializes the MPI system • No other MPI functions should be called before MPI_Init • Syntax int MPI_Init( int* argc_p; char*** argv_p; ) • Returns an integer error code
  6. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • MPI_Finalize shuts down the MPI system and frees any allocated MPI resources
  7. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • In MPI, a communicator is a collection of processes which can send messages to each other • MPI_COMM_WORLD is the default communicator • MPI_Comm_size returns the number of processes in the communicator in the second argument • If “mpirun -n 3 ./mpi_hello” is run, comm_size is 3
  8. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • Multiple copies of the program are executed in independent processes • Each process in MPI is assigned a rank which is an integer from 0 to comm_size-1 • MPI_Comm_rank returns the rank of the process in the second argument
  9. Ranks of Processes • mpirun -n 4 ./mpi_hello Rank 0

    CPU Memory Rank 1 CPU Memory Rank 2 CPU Memory Rank 3 CPU Memory Interconnect
  10. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • mpirun -n 4 ./mpi_hello Hello from rank 0 out of 4 Hello from rank 1 out of 4 Hello from rank 2 out of 4 Hello from rank 3 out of 4 • Four processes are launched each running a copy of the program • Each copy will output only one of the above lines
  11. Single Program Multiple Data • mpirun -n 4 ./mpi_hello Rank

    0 CPU Memory Rank 1 CPU Memory Rank 2 CPU Memory Rank 3 CPU Memory Hello from rank 0 out of 4 Hello from rank 1 out of 4 Hello from rank 2 out of 4 Hello from rank 3 out of 4
  12. Single Program Multiple Data • MPI programs follow the SPMD

    paradigm • The same program will be run by all the processes in the distributed memory system • Each process has a unique rank • The program will behave differently based on its rank
  13. Example: Finding Primes • Suppose we want to find all

    prime numbers less than 10,000,000 • Suppose we have a function int isPrime(int N) which returns 1 if N is prime and returns 0 otherwise • How can we use MPI to achieve this task in parallel?
  14. Example: Finding Primes // Initialize MPI and assume comm_size is

    5 <snip> // Get the rank of the process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++) { if(isPrime(i) == 1) { // Store i in a file <snip> } } rank rank*2,000,000 (rank+1)*2,000,000 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000
  15. Example: Finding Primes // Initialize MPI and assume comm_size is

    5 <snip> // Get the rank of the process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++) { if(isPrime(i) == 1) { // Store i in a file <snip> } } rank rank*2,000,000 (rank+1)*2,000,000 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000 Rank 0 CPU Memory Rank 1 CPU Memory Rank 2 Memory Rank 3 CPU Memory 0 to 2M - 1 2M to 4M - 1 4M to 6M - 1 6M to 8M - 1 CPU Rank 4 CPU Memory 8M to 10M - 1
  16. Process to Process Communication • Finding primes did not require

    any communication between processes • More complex applications will require inter-process communication • Message transfer involves a matching pair of MPI_Send and MPI_Recv function calls • Suppose process 0 wants to send a message to process 1 Rank 0 CPU Memory Rank 1 CPU Memory MPI_Send called here MPI_Recv called here
  17. MPI_Send int MPI_Send( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int dest; // Rank of destination process int tag; // Message tag MPI_Comm communicator; // Communicator id )
  18. MPI_Send int MPI_Send( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int dest; // Rank of destination process int tag; // Message tag MPI_Comm communicator; // Communicator id ) • int a = 5; MPI_Send(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD) • Message type can be MPI_CHAR, MPI_LONG, MPI_FLOAT etc • Message size is the number of elements of msg_type in the message • Message tag is used to differentiate messages of same type to the same destination
  19. MPI_Recv int MPI_Recv( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int source; // Rank of source process int tag; // Message tag MPI_Comm communicator; // Communicator id MPI_Status* status_p; // Message status )
  20. MPI_Recv int MPI_Recv( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int source; // Rank of source process int tag; // Message tag MPI_Comm communicator; // Communicator id MPI_Status* status_p; // Message status ) • int b; MPI_Status status; MPI_Recv(&b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status) • The received integer will be written to variable b
  21. Example: Area Calculation a b f(x) x a b f(x)

    x • Suppose we want to calculate the area under a function in the interval [a,b] • We could split the interval into parts and assign each part to a process • The total area can be calculated by adding the areas of the parts • A single process needs to receive all the part areas
  22. Example: Area Calculation // Initialize MPI and assume comm_size is

    4 int rank; float part_area, total_area; MPI_Comm_rank(MPI_COMM_WORLD, &rank); part_area = calculate_part_area(rank); if (rank != 0) { MPI_Send(&part_area, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD); } else { total_area = part_area; for(int source = 1; source < comm_size; source++) { MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0, MPI_COMM_WORLD, &status); total_area = total_area + part_area; } } a b f(x) x 0 1 2 3 Rank 0 Rank 2 Rank 3 Rank 1
  23. Blocking Behaviour of MPI_Recv/MPI_Send if (rank != 0) { MPI_Send(&part_area,

    1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD); } else { total_area = part_area; for(int source = 1; source < comm_size; source++) { MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0, MPI_COMM_WORLD, &status); total_area = total_area + part_area; } } • An MPI_Recv call blocks until a message with matching parameters is received • Even if message from rank 2 arrives first it will not be processed • An MPI_Send call may or may not block • MPI_Ssend can be used for synchronous send Rank 0 Rank 2 Rank 3 Rank 1
  24. Example: Finding Primes Revisited • Want to find prime numbers

    < 10,000,000 • Previously we divided the numbers equally among the processes • But different blocks of numbers may take different times • We can assign a block of 5,000 numbers to a process initially • When a process finished its block, it can request a new block rank rank*2,000,000 (rank+1)*2,000,00 0 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000
  25. Example: Finding Primes Revisited int block_start, next_block_start, done = 1;

    if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } • Assume rank 0 process does block distribution • The other processes request a new block after finding primes in current block Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start
  26. Example: Finding Primes Revisited int block_start, next_block_start, done = 1;

    if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } • But rank 0 process does not know which process will request next block • MPI_Recv on the wrong source rank will block • Source rank is needed to send the next_block_start Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start
  27. Receiving Message from Unknown Source • Call MPI_Recv with MPI_ANY_SOURCE

    in place of source rank MPI_Status status; MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); • The actual source rank will be in status.MPI_SOURCE • MPI_Status also contains the message length and message tag
  28. Example: Finding Primes Revisited int block_start, next_block_start, done = 1;

    if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } • MPI_Recv is called with MPI_ANY_SOURCE as source rank • status.MPI_SOURCE can be used to send the message to the requesting process Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start
  29. Collective Communications • Broadcasting information to all processes using MPI_Bcast

    int N; if (rank == 0) { printf(“Enter the value of N \n”); scanf(“%d”, &N); } MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD); • Synchronizing processes with MPI_Barrier MPI_Barrier( MPI_COMM_WORLD);
  30. MPI Use Cases • Ideal for compute intensive applications with

    high degree of parallelism • Processes should not communicate often • Not suitable for disk IO intensive applications
  31. Learning resources • MPI Tutorial by Blaise Barney, https://computing.llnl.gov/tutorials/mpi/ •

    An Introduction to Parallel Programming, Peter S. Pacheco, Morgan-Kauffman Publishers, 2011 • Parallel Programming in C with MPI and OpenMP, Michael J. Quinn, Tata-McGraw Hill, 2004