Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to MPI

Avatar for sarva sarva
April 07, 2016

Introduction to MPI

MPI tutorial

Avatar for sarva

sarva

April 07, 2016
Tweet

More Decks by sarva

Other Decks in Programming

Transcript

  1. What is MPI? • Message Passing Interface is a standard

    for communication among processes • Defines a set of library routines for writing portable message-passing programs • Several open source and commercial implementations of MPI exist • MPI programs can be written in C, C++, Fortran, Java, Python • MPI programs target distributed memory systems
  2. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; }
  3. Compilation and Execution #include <mpi.h> #include <stdio.h> int main(int argc,

    char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • mpicc -o mpi_hello mpi_hello.c • mpirun -n <number of processes> ./mpi_hello • mpirun -n 1 ./mpi_hello Hello from rank 0 out of 1 • mpirun -n 3 ./mpi_hello Hello from rank 0 out of 3 Hello from rank 1 out of 3 Hello from rank 2 out of 3
  4. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • mpi.h contains all definitions and declarations needed to compile an MPI program • All identifiers defined by MPI begin with MPI_
  5. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • MPI_Init initializes the MPI system • No other MPI functions should be called before MPI_Init • Syntax int MPI_Init( int* argc_p; char*** argv_p; ) • Returns an integer error code
  6. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • MPI_Finalize shuts down the MPI system and frees any allocated MPI resources
  7. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • In MPI, a communicator is a collection of processes which can send messages to each other • MPI_COMM_WORLD is the default communicator • MPI_Comm_size returns the number of processes in the communicator in the second argument • If “mpirun -n 3 ./mpi_hello” is run, comm_size is 3
  8. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • Multiple copies of the program are executed in independent processes • Each process in MPI is assigned a rank which is an integer from 0 to comm_size-1 • MPI_Comm_rank returns the rank of the process in the second argument
  9. Ranks of Processes • mpirun -n 4 ./mpi_hello Rank 0

    CPU Memory Rank 1 CPU Memory Rank 2 CPU Memory Rank 3 CPU Memory Interconnect
  10. Hello World in MPI #include <mpi.h> #include <stdio.h> int main(int

    argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int comm_size; MPI_Comm_size(MPI_COMM_WORLD, &comm_size); // Get the rank of the process int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // Print a hello world message printf("Hello from rank %d out of %d\n “, my_rank, comm_size); // Finalize the MPI environment MPI_Finalize(); return 0; } • mpirun -n 4 ./mpi_hello Hello from rank 0 out of 4 Hello from rank 1 out of 4 Hello from rank 2 out of 4 Hello from rank 3 out of 4 • Four processes are launched each running a copy of the program • Each copy will output only one of the above lines
  11. Single Program Multiple Data • mpirun -n 4 ./mpi_hello Rank

    0 CPU Memory Rank 1 CPU Memory Rank 2 CPU Memory Rank 3 CPU Memory Hello from rank 0 out of 4 Hello from rank 1 out of 4 Hello from rank 2 out of 4 Hello from rank 3 out of 4
  12. Single Program Multiple Data • MPI programs follow the SPMD

    paradigm • The same program will be run by all the processes in the distributed memory system • Each process has a unique rank • The program will behave differently based on its rank
  13. Example: Finding Primes • Suppose we want to find all

    prime numbers less than 10,000,000 • Suppose we have a function int isPrime(int N) which returns 1 if N is prime and returns 0 otherwise • How can we use MPI to achieve this task in parallel?
  14. Example: Finding Primes // Initialize MPI and assume comm_size is

    5 <snip> // Get the rank of the process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++) { if(isPrime(i) == 1) { // Store i in a file <snip> } } rank rank*2,000,000 (rank+1)*2,000,000 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000
  15. Example: Finding Primes // Initialize MPI and assume comm_size is

    5 <snip> // Get the rank of the process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++) { if(isPrime(i) == 1) { // Store i in a file <snip> } } rank rank*2,000,000 (rank+1)*2,000,000 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000 Rank 0 CPU Memory Rank 1 CPU Memory Rank 2 Memory Rank 3 CPU Memory 0 to 2M - 1 2M to 4M - 1 4M to 6M - 1 6M to 8M - 1 CPU Rank 4 CPU Memory 8M to 10M - 1
  16. Process to Process Communication • Finding primes did not require

    any communication between processes • More complex applications will require inter-process communication • Message transfer involves a matching pair of MPI_Send and MPI_Recv function calls • Suppose process 0 wants to send a message to process 1 Rank 0 CPU Memory Rank 1 CPU Memory MPI_Send called here MPI_Recv called here
  17. MPI_Send int MPI_Send( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int dest; // Rank of destination process int tag; // Message tag MPI_Comm communicator; // Communicator id )
  18. MPI_Send int MPI_Send( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int dest; // Rank of destination process int tag; // Message tag MPI_Comm communicator; // Communicator id ) • int a = 5; MPI_Send(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD) • Message type can be MPI_CHAR, MPI_LONG, MPI_FLOAT etc • Message size is the number of elements of msg_type in the message • Message tag is used to differentiate messages of same type to the same destination
  19. MPI_Recv int MPI_Recv( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int source; // Rank of source process int tag; // Message tag MPI_Comm communicator; // Communicator id MPI_Status* status_p; // Message status )
  20. MPI_Recv int MPI_Recv( void* msg_buffer_p; // Pointer to message buffer

    int msg_size; // Size of message MPI_Datatype msg_type; // Message type int source; // Rank of source process int tag; // Message tag MPI_Comm communicator; // Communicator id MPI_Status* status_p; // Message status ) • int b; MPI_Status status; MPI_Recv(&b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status) • The received integer will be written to variable b
  21. Example: Area Calculation a b f(x) x a b f(x)

    x • Suppose we want to calculate the area under a function in the interval [a,b] • We could split the interval into parts and assign each part to a process • The total area can be calculated by adding the areas of the parts • A single process needs to receive all the part areas
  22. Example: Area Calculation // Initialize MPI and assume comm_size is

    4 int rank; float part_area, total_area; MPI_Comm_rank(MPI_COMM_WORLD, &rank); part_area = calculate_part_area(rank); if (rank != 0) { MPI_Send(&part_area, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD); } else { total_area = part_area; for(int source = 1; source < comm_size; source++) { MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0, MPI_COMM_WORLD, &status); total_area = total_area + part_area; } } a b f(x) x 0 1 2 3 Rank 0 Rank 2 Rank 3 Rank 1
  23. Blocking Behaviour of MPI_Recv/MPI_Send if (rank != 0) { MPI_Send(&part_area,

    1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD); } else { total_area = part_area; for(int source = 1; source < comm_size; source++) { MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0, MPI_COMM_WORLD, &status); total_area = total_area + part_area; } } • An MPI_Recv call blocks until a message with matching parameters is received • Even if message from rank 2 arrives first it will not be processed • An MPI_Send call may or may not block • MPI_Ssend can be used for synchronous send Rank 0 Rank 2 Rank 3 Rank 1
  24. Example: Finding Primes Revisited • Want to find prime numbers

    < 10,000,000 • Previously we divided the numbers equally among the processes • But different blocks of numbers may take different times • We can assign a block of 5,000 numbers to a process initially • When a process finished its block, it can request a new block rank rank*2,000,000 (rank+1)*2,000,00 0 0 0 2,000,000 1 2,000,000 4,000,000 2 4,000,000 6,000,000 3 6,000,000 8,000,000 4 8,000,000 10,000,000
  25. Example: Finding Primes Revisited int block_start, next_block_start, done = 1;

    if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } • Assume rank 0 process does block distribution • The other processes request a new block after finding primes in current block Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start
  26. Example: Finding Primes Revisited int block_start, next_block_start, done = 1;

    if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } • But rank 0 process does not know which process will request next block • MPI_Recv on the wrong source rank will block • Source rank is needed to send the next_block_start Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start
  27. Receiving Message from Unknown Source • Call MPI_Recv with MPI_ANY_SOURCE

    in place of source rank MPI_Status status; MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); • The actual source rank will be in status.MPI_SOURCE • MPI_Status also contains the message length and message tag
  28. Example: Finding Primes Revisited int block_start, next_block_start, done = 1;

    if (rank != 0) { while(1) { MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); find_primes_in_block(block_start); } } else { next_block_start = 1; while(next_block_start < 10,000,000) { MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); MPI_Send(&next_block_start, 1, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD); next_block_start = next_block_start + 5000; } } • MPI_Recv is called with MPI_ANY_SOURCE as source rank • status.MPI_SOURCE can be used to send the message to the requesting process Rank 0 Rank 2 Rank 3 Rank 1 done next_block_start done next_block_start
  29. Collective Communications • Broadcasting information to all processes using MPI_Bcast

    int N; if (rank == 0) { printf(“Enter the value of N \n”); scanf(“%d”, &N); } MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD); • Synchronizing processes with MPI_Barrier MPI_Barrier( MPI_COMM_WORLD);
  30. MPI Use Cases • Ideal for compute intensive applications with

    high degree of parallelism • Processes should not communicate often • Not suitable for disk IO intensive applications
  31. Learning resources • MPI Tutorial by Blaise Barney, https://computing.llnl.gov/tutorials/mpi/ •

    An Introduction to Parallel Programming, Peter S. Pacheco, Morgan-Kauffman Publishers, 2011 • Parallel Programming in C with MPI and OpenMP, Michael J. Quinn, Tata-McGraw Hill, 2004