Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to MPI

sarva
April 07, 2016

Introduction to MPI

MPI tutorial

sarva

April 07, 2016
Tweet

More Decks by sarva

Other Decks in Programming

Transcript

  1. Introduction to MPI
    Saravanan Vijayakumaran
    IIT Bombay

    View full-size slide

  2. What is MPI?
    ● Message Passing Interface is a standard for communication among
    processes
    ● Defines a set of library routines for writing portable message-passing
    programs
    ● Several open source and commercial implementations of MPI exist
    ● MPI programs can be written in C, C++, Fortran, Java, Python
    ● MPI programs target distributed memory systems

    View full-size slide

  3. Distributed Memory Systems
    CPU
    Memory
    CPU
    Memory
    CPU
    Memory
    CPU
    Memory
    Interconnect

    View full-size slide

  4. Shared Memory Systems
    CPU CPU
    Memory
    CPU CPU
    Interconnect

    View full-size slide

  5. Hello World in MPI
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }

    View full-size slide

  6. Compilation and Execution
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }
    ● mpicc -o mpi_hello mpi_hello.c
    ● mpirun -n ./mpi_hello
    ● mpirun -n 1 ./mpi_hello
    Hello from rank 0 out of 1
    ● mpirun -n 3 ./mpi_hello
    Hello from rank 0 out of 3
    Hello from rank 1 out of 3
    Hello from rank 2 out of 3

    View full-size slide

  7. Hello World in MPI
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }
    ● mpi.h contains all definitions and declarations needed to
    compile an MPI program
    ● All identifiers defined by MPI begin with MPI_

    View full-size slide

  8. Hello World in MPI
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }
    ● MPI_Init initializes the MPI system
    ● No other MPI functions should be called before MPI_Init
    ● Syntax
    int MPI_Init(
    int* argc_p;
    char*** argv_p;
    )
    ● Returns an integer error code

    View full-size slide

  9. Hello World in MPI
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }
    ● MPI_Finalize shuts down the MPI system and frees any
    allocated MPI resources

    View full-size slide

  10. Hello World in MPI
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }
    ● In MPI, a communicator is a collection of processes
    which can send messages to each other
    ● MPI_COMM_WORLD is the default communicator
    ● MPI_Comm_size returns the number of processes in the
    communicator in the second argument
    ● If “mpirun -n 3 ./mpi_hello” is run, comm_size is 3

    View full-size slide

  11. Hello World in MPI
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }
    ● Multiple copies of the program are executed in
    independent processes
    ● Each process in MPI is assigned a rank which is an
    integer from 0 to comm_size-1
    ● MPI_Comm_rank returns the rank of the process in the
    second argument

    View full-size slide

  12. Ranks of Processes
    ● mpirun -n 4 ./mpi_hello
    Rank 0
    CPU
    Memory
    Rank 1
    CPU
    Memory
    Rank 2
    CPU
    Memory
    Rank 3
    CPU
    Memory
    Interconnect

    View full-size slide

  13. Hello World in MPI
    #include
    #include
    int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
    // Get the number of processes
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    // Get the rank of the process
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Print a hello world message
    printf("Hello from rank %d out of %d\n “,
    my_rank, comm_size);
    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
    }
    ● mpirun -n 4 ./mpi_hello
    Hello from rank 0 out of 4
    Hello from rank 1 out of 4
    Hello from rank 2 out of 4
    Hello from rank 3 out of 4
    ● Four processes are launched each running a copy of the
    program
    ● Each copy will output only one of the above lines

    View full-size slide

  14. Single Program Multiple Data
    ● mpirun -n 4 ./mpi_hello
    Rank 0
    CPU
    Memory
    Rank 1
    CPU
    Memory
    Rank 2
    CPU
    Memory
    Rank 3
    CPU
    Memory
    Hello from
    rank 0 out of 4
    Hello from
    rank 1 out of 4
    Hello from
    rank 2 out of 4
    Hello from
    rank 3 out of 4

    View full-size slide

  15. Single Program Multiple Data
    ● MPI programs follow the SPMD paradigm
    ● The same program will be run by all the processes in the distributed
    memory system
    ● Each process has a unique rank
    ● The program will behave differently based on its rank

    View full-size slide

  16. Example: Finding Primes
    ● Suppose we want to find all prime numbers less than 10,000,000
    ● Suppose we have a function int isPrime(int N) which returns 1 if N is
    prime and returns 0 otherwise
    ● How can we use MPI to achieve this task in parallel?

    View full-size slide

  17. Example: Finding Primes
    // Initialize MPI and assume comm_size is 5

    // Get the rank of the process
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++)
    {
    if(isPrime(i) == 1)
    {
    // Store i in a file

    }
    }
    rank rank*2,000,000 (rank+1)*2,000,000
    0 0 2,000,000
    1 2,000,000 4,000,000
    2 4,000,000 6,000,000
    3 6,000,000 8,000,000
    4 8,000,000 10,000,000

    View full-size slide

  18. Example: Finding Primes
    // Initialize MPI and assume comm_size is 5

    // Get the rank of the process
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    for (int i = rank*2,000,000; i < (rank+1)*2,000,000; i++)
    {
    if(isPrime(i) == 1)
    {
    // Store i in a file

    }
    }
    rank rank*2,000,000 (rank+1)*2,000,000
    0 0 2,000,000
    1 2,000,000 4,000,000
    2 4,000,000 6,000,000
    3 6,000,000 8,000,000
    4 8,000,000 10,000,000
    Rank 0
    CPU
    Memory
    Rank 1
    CPU
    Memory
    Rank 2
    Memory
    Rank 3
    CPU
    Memory
    0 to 2M - 1 2M to 4M - 1 4M to 6M - 1 6M to 8M - 1
    CPU
    Rank 4
    CPU
    Memory
    8M to 10M - 1

    View full-size slide

  19. Process to Process Communication
    ● Finding primes did not require any communication between processes
    ● More complex applications will require inter-process communication
    ● Message transfer involves a matching pair of MPI_Send and MPI_Recv
    function calls
    ● Suppose process 0 wants to send a message to process 1
    Rank 0
    CPU
    Memory
    Rank 1
    CPU
    Memory
    MPI_Send called here MPI_Recv called here

    View full-size slide

  20. MPI_Send
    int MPI_Send(
    void* msg_buffer_p; // Pointer to message buffer
    int msg_size; // Size of message
    MPI_Datatype msg_type; // Message type
    int dest; // Rank of destination process
    int tag; // Message tag
    MPI_Comm communicator; // Communicator id
    )

    View full-size slide

  21. MPI_Send
    int MPI_Send(
    void* msg_buffer_p; // Pointer to message buffer
    int msg_size; // Size of message
    MPI_Datatype msg_type; // Message type
    int dest; // Rank of destination process
    int tag; // Message tag
    MPI_Comm communicator; // Communicator id
    )
    ● int a = 5;
    MPI_Send(&a, 1, MPI_INT, 1, 0, MPI_COMM_WORLD)
    ● Message type can be MPI_CHAR, MPI_LONG, MPI_FLOAT etc
    ● Message size is the number of elements of msg_type in the
    message
    ● Message tag is used to differentiate messages of same type
    to the same destination

    View full-size slide

  22. MPI_Recv
    int MPI_Recv(
    void* msg_buffer_p; // Pointer to message buffer
    int msg_size; // Size of message
    MPI_Datatype msg_type; // Message type
    int source; // Rank of source process
    int tag; // Message tag
    MPI_Comm communicator; // Communicator id
    MPI_Status* status_p; // Message status
    )

    View full-size slide

  23. MPI_Recv
    int MPI_Recv(
    void* msg_buffer_p; // Pointer to message buffer
    int msg_size; // Size of message
    MPI_Datatype msg_type; // Message type
    int source; // Rank of source process
    int tag; // Message tag
    MPI_Comm communicator; // Communicator id
    MPI_Status* status_p; // Message status
    )
    ● int b;
    MPI_Status status;
    MPI_Recv(&b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status)
    ● The received integer will be written to variable b

    View full-size slide

  24. Example: Area Calculation
    a b
    f(x)
    x a b
    f(x)
    x
    ● Suppose we want to calculate the area under a function in the interval [a,b]
    ● We could split the interval into parts and assign each part to a process
    ● The total area can be calculated by adding the areas of the parts
    ● A single process needs to receive all the part areas

    View full-size slide

  25. Example: Area Calculation
    // Initialize MPI and assume comm_size is 4
    int rank;
    float part_area, total_area;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    part_area = calculate_part_area(rank);
    if (rank != 0)
    {
    MPI_Send(&part_area, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD);
    }
    else
    {
    total_area = part_area;
    for(int source = 1; source < comm_size; source++)
    {
    MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0, MPI_COMM_WORLD,
    &status);
    total_area = total_area + part_area;
    }
    }
    a b
    f(x)
    x
    0 1 2 3
    Rank 0 Rank 2
    Rank 3
    Rank 1

    View full-size slide

  26. Blocking Behaviour of MPI_Recv/MPI_Send
    if (rank != 0)
    {
    MPI_Send(&part_area, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD);
    }
    else
    {
    total_area = part_area;
    for(int source = 1; source < comm_size; source++)
    {
    MPI_Recv(&part_area, 1, MPI_FLOAT, source, 0,
    MPI_COMM_WORLD, &status);
    total_area = total_area + part_area;
    }
    }
    ● An MPI_Recv call blocks until a message
    with matching parameters is received
    ● Even if message from rank 2 arrives first
    it will not be processed
    ● An MPI_Send call may or may not block
    ● MPI_Ssend can be used for synchronous
    send
    Rank 0 Rank 2
    Rank 3
    Rank 1

    View full-size slide

  27. Example: Finding Primes Revisited
    ● Want to find prime numbers < 10,000,000
    ● Previously we divided the numbers equally
    among the processes
    ● But different blocks of numbers may take
    different times
    ● We can assign a block of 5,000 numbers to
    a process initially
    ● When a process finished its block, it can
    request a new block
    rank rank*2,000,000 (rank+1)*2,000,00
    0
    0 0 2,000,000
    1 2,000,000 4,000,000
    2 4,000,000 6,000,000
    3 6,000,000 8,000,000
    4 8,000,000 10,000,000

    View full-size slide

  28. Example: Finding Primes Revisited
    int block_start, next_block_start, done = 1;
    if (rank != 0)
    {
    while(1)
    {
    MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
    find_primes_in_block(block_start);
    }
    }
    else
    {
    next_block_start = 1;
    while(next_block_start < 10,000,000)
    {
    MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status);
    MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD);
    next_block_start = next_block_start + 5000;
    }
    }
    ● Assume rank 0 process does
    block distribution
    ● The other processes request a
    new block after finding primes in
    current block
    Rank 0 Rank 2
    Rank 3
    Rank 1
    done
    next_block_start
    done
    next_block_start

    View full-size slide

  29. Example: Finding Primes Revisited
    int block_start, next_block_start, done = 1;
    if (rank != 0)
    {
    while(1)
    {
    MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
    find_primes_in_block(block_start);
    }
    }
    else
    {
    next_block_start = 1;
    while(next_block_start < 10,000,000)
    {
    MPI_Recv(&done, 1, MPI_INT, source, 0, MPI_COMM_WORLD, &status);
    MPI_Send(&next_block_start, 1, MPI_INT, source, 0, MPI_COMM_WORLD);
    next_block_start = next_block_start + 5000;
    }
    }
    ● But rank 0 process does not
    know which process will request
    next block
    ● MPI_Recv on the wrong source
    rank will block
    ● Source rank is needed to send
    the next_block_start
    Rank 0 Rank 2
    Rank 3
    Rank 1
    done
    next_block_start
    done
    next_block_start

    View full-size slide

  30. Receiving Message from Unknown Source
    ● Call MPI_Recv with MPI_ANY_SOURCE in place of source rank
    MPI_Status status;
    MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
    ● The actual source rank will be in status.MPI_SOURCE
    ● MPI_Status also contains the message length and message tag

    View full-size slide

  31. Example: Finding Primes Revisited
    int block_start, next_block_start, done = 1;
    if (rank != 0)
    {
    while(1)
    {
    MPI_Send(&done, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    MPI_Recv(&block_start, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
    find_primes_in_block(block_start);
    }
    }
    else
    {
    next_block_start = 1;
    while(next_block_start < 10,000,000)
    {
    MPI_Recv(&done, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
    MPI_Send(&next_block_start, 1, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD);
    next_block_start = next_block_start + 5000;
    }
    }
    ● MPI_Recv is called with
    MPI_ANY_SOURCE as source
    rank
    ● status.MPI_SOURCE can be used
    to send the message to the
    requesting process
    Rank 0 Rank 2
    Rank 3
    Rank 1
    done
    next_block_start
    done
    next_block_start

    View full-size slide

  32. Collective Communications
    ● Broadcasting information to all processes using MPI_Bcast
    int N;
    if (rank == 0)
    {
    printf(“Enter the value of N \n”);
    scanf(“%d”, &N);
    }
    MPI_Bcast(&N, 1, MPI_INT, 0, MPI_COMM_WORLD);
    ● Synchronizing processes with MPI_Barrier
    MPI_Barrier( MPI_COMM_WORLD);

    View full-size slide

  33. MPI Use Cases
    ● Ideal for compute intensive applications with high degree of parallelism
    ● Processes should not communicate often
    ● Not suitable for disk IO intensive applications

    View full-size slide

  34. Learning resources
    ● MPI Tutorial by Blaise Barney,
    https://computing.llnl.gov/tutorials/mpi/
    ● An Introduction to Parallel Programming, Peter S. Pacheco,
    Morgan-Kauffman Publishers, 2011
    ● Parallel Programming in C with MPI and OpenMP, Michael J. Quinn,
    Tata-McGraw Hill, 2004

    View full-size slide

  35. Thanks for your attention!

    View full-size slide