Slide 1

Slide 1 text

Red Cloud and the MATLAB Distributed Computing Server (MDCS) www.cac.cornell.edu Steve Lantz Senior Research Associate Center for Advanced Computing (CAC) [email protected] Cornell Scientific Software Club, May 8, 2017

Slide 2

Slide 2 text

What Is Red Cloud? • Infrastructure-as-a-Service (IaaS) for research computing • Extra compute cycles and memory via on-demand VMs • Subscription model for buying computer time • Easy way to clone a software environment across servers • Local source of computing power on Cornell networks • Complementary resource to CAC’s disk storage offerings • Where to go if your MATLAB runs outgrow your laptop! 5/8/2017 www.cac.cornell.edu 2

Slide 3

Slide 3 text

What Is MDCS? Quoting from The MathWorks:* • MATLAB Distributed Computing Server™ lets you run computationally intensive MATLAB® programs and Simulink models on computer clusters, clouds, and grids. • You develop your program or model on a multicore desktop computer using Parallel Computing Toolbox™ and then scale up to many computers by running it on MDCS. • The server supports batch jobs, parallel computations, and distributed large data. The server includes a built-in cluster job scheduler [...] *https://www.mathworks.com/products/distriben.html 5/8/2017 www.cac.cornell.edu 3

Slide 4

Slide 4 text

Parallel Resources: Local & Remote 5/8/2017 www.cac.cornell.edu 4 PCT MDCS

Slide 5

Slide 5 text

Red Cloud and MDCS Demo The main steps from the CAC wiki doc will be shown: • Working with the Eucalyptus User Console (website) – Logging in to Red Cloud – Launching an MDCS instance – Setting up the security group • Connecting to MDCS from MATLAB R2016a – Running the built-in validation test – Initiating a parpool – Trying a couple of quick checks using spmd 5/8/2017 www.cac.cornell.edu 5

Slide 6

Slide 6 text

PCT Opens Up Parallel Possibilities • MATLAB has multithreading built into core libraries – Mostly aids big array operations; not within user control • PCT enables user-directed, interactive parallelism – Parallel for-loops: parfor – Single program, multiple data: spmd, pmode – Array partitioning for big-data parallelism: (co)distributed • PCT also enables batch-style parallelism – Multiple independent runs of a serial function: createJob – Single run of parallelized code: createCommunicatingJob 5/8/2017 www.cac.cornell.edu 6

Slide 7

Slide 7 text

Two Ways to Use PCT 5/8/2017 www.cac.cornell.edu 7 Start local or remote parpool, run PCT commands (scripted) Submit jobs, task functions to local or remote parcluster MATLAB Client MATLAB Workers MATLAB Client Interactive - vs. - Batch-style MATLAB Workers (maybe via Distributed Computing Server) Scheduler (file transfer)

Slide 8

Slide 8 text

Interactive PCT: Major Concepts 5/8/2017 www.cac.cornell.edu 8 • parpool: pool of distinct MATLAB processes = “labs” – Differs from multithreading! No shared address space – Ultimately allows same concepts to work on MDCS clusters • parfor: parallel for-loop, iterations are independent – Labs (workers) split up iterations; load balancing is built in • spmd: single program, multiple data – All labs execute every command; labs can communicate • (co)distributed: array is partitioned among workers – “Multiple data” for spmd; one array to MATLAB functions

Slide 9

Slide 9 text

Batch-Style PCT: Jobs and Tasks 5/8/2017 www.cac.cornell.edu 9 • parcluster creates a cluster object, which allows you to create Jobs. In PCT, Jobs are containers for Tasks, which are where the actual work is defined. clust Cluster Object Jobs(24) Jobs(25) j=createJob(clust); j=createCommunicatingJob(clust); Tasks(1) myFunction(z) Tasks(1) someFunction(x) Tasks(2) otherFunction(y) createTask(j,…); createTask(j,…); createTask(j,…);

Slide 10

Slide 10 text

Batch-Style PCT: Types of Jobs • PCT has 3 types of jobs: independent, SPMD, and pool • Independent: createJob() – Can contain many tasks; workers run the tasks one by one • SPMD: createCommunicatingJob(...,'Type','SPMD',...) – Has ONE task to be run by ALL workers, like a spmd block • Pool: createCommunicatingJob(...,'Type','Pool',...) – Has ONE task which is run by ONE worker – Other workers run spmd blocks or parfor loops in the task – Mimics the interactive mode of using PCT 5/8/2017 www.cac.cornell.edu 10

Slide 11

Slide 11 text

More on SPMD Jobs and spmd Blocks • The SPMD task function, like a spmd block, is responsible for implementing parallelism using “labindex” logic • The lab* functions allow workers (labs) to communicate; they act just like MPI message-passing methods – labSend(data,dest,[tag]); % point-to-point – labReceive(source,tag); % datatype, size are implicit – labReceive(); % take any source – labBroadcast(source); labBarrier; gop(f,x); % collectives • (Co)distributed arrays are sliced across workers so huge matrices can be operated on; collect slices with gather 5/8/2017 www.cac.cornell.edu 11

Slide 12

Slide 12 text

When Is File Transfer Needed? • If your workers do not share a disk with your client, and they will require custom functions or datafiles • Example: j = createJob(sched); createTask(j,@rand,1,{3,3}); createTask(j,@myfunction,1,{3,3}); submit(j); – The rand function is no problem at all, it’s built in – But myfunction.m does not exist on the remote computer – We’ll want to transfer this file and get it added to the path 5/8/2017 www.cac.cornell.edu 12

Slide 13

Slide 13 text

MATLAB Can Copy Files… Or You Can • Setting the AutoAttachFiles property tells MATLAB to copy files containing your function definitions • Use AttachedFiles to copy any data files or directories the task will need; directory structures are preserved – Not very efficient, though: file transfer occurs separately for each worker running a task for that particular job – OK for small projects with a couple of files • A better-scaling alternative is to copy your files to disk(s) on the remote server(s) in advance – Use AdditionalPaths to make the files available at run time 5/8/2017 www.cac.cornell.edu 13

Slide 14

Slide 14 text

Distributing Work with parfeval, batch • createJob() isn’t the only way to run independent tasks... • parfeval() requests the given function be excuted on one worker in a parpool, asynchronously • batch() does the same on one worker NOT in a parpool – It creates a one-task job and submits it to a parcluster – It can also be a one-line method for initiating a pool job – It works with either a function or a script • Either can easily be called in a loop over a list of tasks – Use fetchNext() to collect results as they become available 5/8/2017 www.cac.cornell.edu 14

Slide 15

Slide 15 text

Distributing Work Without PCT, MDCS • Create a MATLAB .m file that takes one or more input parameters (such as the name of an input file). • Apply the MATLAB C/C++ compiler (mcc), which converts the script to C, then to a standalone executable. • Run N copies of the executable on an N-core batch node or a cluster, each with a different input parameter – mpirun can launch non-MPI processes, too • Matlab runtimes (free!) must be available on all nodes • For process control, write a master script in Python, say 5/8/2017 www.cac.cornell.edu 15

Slide 16

Slide 16 text

GPGPU in MATLAB PCT: Fast and Easy • Many functions are overloaded to call CUDA code automatically if objects are declared with gpuArray type • Benchmarking with large 1D and 2D FFTs shows excellent acceleration on NVIDIA GPUs • MATLAB code changes are trivial – Move data to GPU by declaring a gpuArray – Call method in the usual way 5/8/2017 www.cac.cornell.edu 16 g = gpuArray(r); f = fft2(g);

Slide 17

Slide 17 text

Are GPUs Really That Simple? • No. Your application must meet four important criteria. 1. Nearly all required operations must be implemented natively for type GPUArray. 2. The computation must be arranged so the data seldom have to leave the GPU. 3. The overall working dataset must be large enough to exploit 100s of thread processors 4. On the other hand, the overall working dataset must be small enough that it does not exceed GPU memory. 5/8/2017 www.cac.cornell.edu 17

Slide 18

Slide 18 text

PCT and MDCS: The Bottom Line • PCT can greatly speed up large-scale computations and the analysis of large datasets – GPU functionality is a nice addition to the arsenal • MDCS allows parallel workers to run on cluster and cloud resources beyond one’s laptop, e.g., Red Cloud • Yes, a learning curve must be climbed… – General knowledge of how to restructure code so that parallelism is exposed – Specific knowledge of PCT functions • But speed often matters!