DIET - A Scalable Platform for Clusters, Grids and Clouds

Eddy Caron, Frédéric Desprez INRIA LIP ENS Lyon Avalon Research
Team DIET A Scalable Platform for Clusters, Grids and Clouds Benjamin Depardon SysFera Joint work with A. Muresan, J. Rouzaud-Cornabas (LIP ENS Lyon) H. Guemard, O. Mornard (SysFera)

Introduction •  Transparency and simplicity represent the holy grail for
Grids and Clouds (maybe even before performance) ! • Utility computing • (Almost) everything as a service •  Scheduling tunability to take into account the characteristics of specific application classes •  Large scope of applications ready (and not only number crunching ones !) • Heterogeneity of jobs executed and data managed •  Many incarnations of distributed platforms (peer-to-peer systems, clusters, clusters of clusters, grids, Clouds, …) • Many research projects around the world • Significant technology base • Many platforms available •  Do not forget good ol’ time research on scheduling and distributed systems ! - Most scheduling problems are very difficult to solve even in their simplistic form … - … but simple solutions often lead to better performance results in real life - 2 F. Desprez - ISC Cloud 25/09/2012

Avalon and SysFera Avalon •  Research team from Inria (with
CNRS, ENS Lyon, University of Lyon). •  Led by Christian Perez, located at LIP/ENS Lyon. •  Goals: contribute to the design of programming models supporting various kinds of architecture, to implement them by mastering the various algorithmic issues involved, and by studying the impact on application-level algorithms. •  Research areas: distributed algorithms, programming models, service deployment, service discovery, service composition and orchestration, large-scale data management, etc. •  Validation using applications from different fields of science such as bioinformatics, physics, cosmology, etc., over Grid'5000 SysFera •  Company co-founded by D. Loureiro, E. Caron, F. Desprez, and IT Translation •  Spin-off of Inria, led by David Loureiro. •  Goals: •  Industrialize the DIET software and expand upon it to provide advanced features. •  Use the accumulated experience and expertise to help software editors move to a SaaS/Cloud business model •  Areas of expertise: distributed computing, clusters, grids, HPC, Clouds, SaaS, PaaS, Big Data •  Validation in production environments for large companies and SMBs in various domains: energy, linguistics, video games, bioinformatics, cosmology, meteorology, and so on. - 3 F. Desprez - ISC Cloud 25/09/2012

DIET’s Goals Distributed Interactive Engineering Toolbox Our goals -  To
develop a toolbox for the deployment of environments using the Software as a Service (SaaS) paradigm with different applications -  Over clusters, clusters of clusters, grids and Clouds -  Use as much as possible public domain and standard software -  To obtain a high performance and scalable environment -  Implement and validate our more theoretical results •  Several applications in different fields -  simulation, bioinformatics, robotics, … •  Release 2.8.1 available on the web since July 2012 •  ACI Grid ASP, RNTL GASP, ANR LEGO CIGC-05-11, ANR Gwendia, Celtic-plus Project SEED4C http://graal.ens-lyon.fr/DIET/ - 4 F. Desprez - ISC Cloud 25/09/2012 DIET client Client layer DIET agents Scheduling layer MasterAgents, LocalAgents DIET server Service layer ServerDeamon (SeD)

DIET Architecture overview •  Context : Development of a toolbox
for deploying applications over various platforms with a hierarchical architecture for an improved scalability •  Distributed scheduler (MA, LA) •  Servers (SeD) •  Validation: Large validation over Grid’5000. •  DIET production used-case •  The Decrypthon project •  DIET was selected by IBM •  Transfered in a Start’up company SysFera (created in March 2010). •  Main research issues: scheduling, heterogeneity, automatic deployment, interoperability, high performance data transfer and management, monitoring, fault tolerance, genericity of solutions for various applications, static and dynamic analysis of performance, … - 5 F. Desprez - ISC Cloud 25/09/2012

Data Management DAGDA (Data Arrangement for Grid and Distributed Applications)
•  Joining task scheduling and data management •  Standardized through GridRPC OGF WG. •  Explicit data replication - Using the API •  Implicit data replication •  Data replacement algorithm - LRU, LFU AND FIFO •  Transfer optimization by selecting the more convenient source •  Storage resources usage management •  Data status backup/restoration   Join work with Gaël LeMahec (UPJV/MIS) - 6 F. Desprez - ISC Cloud 25/09/2012

Parallel and batch submissions Parallel & sequential jobs - transparent for
the user - system dependent submission SeDBatch - Many batch systems - Batch schedulers behaviour - Internal scheduling process •  Monitoring & Performance prediction •  Simulation MA LA SeD OAR SGE LSF PBS Loadleveler SeDBatch SeD// NFS SLURM - 7 F. Desprez - ISC Cloud 25/09/2012

Workflow representation •  Direct Acyclic Graph (DAG) -  Each vertex
is a task -  Each directed edge represents communication between tasks Goals •  Build and execute workflows •  Use different heuristics to solve scheduling problems •  Extensibility to address multi-workflows submission and large grid platform •  Manage heterogeneity and variability of environment ANR Gwendia •  Language definition (MOTEUR & MADAG) •  Comparison on Grid’5000 vs EGI Idle time Data transfert Execution time EGI (Glite) 32.857s 132.143 s 274.643 s Grid’5000 (DIET) 0.214s 3.371 s 540.614 s 8 F. Desprez - ISC Cloud 25/09/2012

DIET Cloud: First Prototype Inside the Cloud •  DIET platform
is virtualized inside the cloud. (as Xen image for example) •  Very flexible and scalable as DIET nodes can be launched •  Scheduling is more complex DIET as a Cloud manager •  Eucalyptus interface •  Eucalyptus is treated as a new Batch System •  Provide a new implementation for the BatchSystem abstract class - 9 F. Desprez - ISC Cloud 25/09/2012

DIET Cloud: Architecture Next Generation •  Many prerequisites available in
DIET •  service calls, •  scalable scheduling •  data management. •  Multi-cloud infrastructures manager •  SeD Cloud deals with a large number of Cloud IaaS using an API as δ-Cloud, OCCI, OpenStack, etc. •  DIET as Virtual Machines manager using the IaaS skills •  DIET SeD Cloud bootstraps a Cloud instance (VM launching) •  Application Deployment •  External tools •  Puppet, chef, etc. •  Elastic architecture •  Allows each DIET SeD Cloud to expand or reduces the number of compute resource. - 10 F. Desprez - ISC Cloud 25/09/2012

DIET Cloud: Workflow on Cloud using Nimbus •  Nimbus low-level
IaaS provider •  open-source IaaS provider •  provides low-level resources (VMs) •  compatible with the Amazon EC2 •  used a FutureGrid install •  Phantom high-level resource provider •  auto-scaling and high availability provider •  high-level resource provider •  subset of the Amazon auto-scale service •  part of the Nimbus platform ! "#$%!&'("()*! +,-./0,1!234532! 65789:! ;'<! ;'=! ;'>! ;'3! !"#$%&'(&')'*+', &'+-$&"'+, ?23@!1,-./0,1! A2B25C2!-2:90D! "2E0,F!1,-./0,1!DG:.:! ?B523D5:D! ?2"!AG7:2:! ! HIG3D,7! /-,3D23@! ?2"!)-G/5B<! !! HIG3D,7! /-,3D23@! HIG3D,7! :2-C5B2! J!J!J! .*/*0',!$1-+"*)%/0,2&-$3+, •  DIET MADag the workflow engine •  one service implementation per task •  each service launches its afferent task •  supports DAG, PTG and functional workflows •  Client submitting a workflow - 11 F. Desprez - ISC Cloud 25/09/2012

SysFera-DS overview End-User End-User End-User Clusters or supercomputers Workstations Data
centers Servers Private/public Clouds End-User Users - Tasks - Workflows - Data - Info CLI - C/C++ - Python - WS 12 25/09/2012 F. Desprez - ISC Cloud

SysFera-DS: Functionalities Workﬂow Engine Task Management Hierarchy of Schedulers FIFO
Green Cloud Appli1 Appli2 ... Monitoring LRMS SLURM SGE LoadLeveler OAR ... Computing Resources Cloud Resources Data Management C C++ Python Web interface User and Admin Interfaces User Management File Management Persistence Implicit/Explicit Replication Storage External Storage Local Storage Cloud Storage Cloud Interface Security Web services CLI 13 25/09/2012 F. Desprez - ISC Cloud

Seed4C: Secure embedded element and data protection •  Seed4C goal:
Guarantee end-to-end security of service •  Can we get a seed to build trusted Clouds ? • Up to 80% of problems can be solved with a protected execution and a proper policy enforcement • A TCB (Trusted Control Plane) within the network: the seed •  Smart deployment of SEEDs •  SEED load balancing • Pre-provisioning of security credentials • Dynamic association with applications/services •  SEED form factors and management • Hardware / Software / dedicated VMs / OS component ? - 14 "#$"%&$'%!"#$#%&''( '%)*&+*'%)(,'-"$.&/( +0(,'$1.-'2( SEED 4C SEED 4C http://projects.celtic-initiative.org/seed4c/ © Alcatel Lucent / INRIA / MPY F. Desprez - ISC Cloud 25/09/2012

One Seed4C Use Case: HPC •  Added value of Network
of secure elements (NoSE) • Generation and protection of secrets (Key) in network protocols •  OSPF, SMTP, S-BGP, Secure BGP • Execution of sensible code •  Policy verification •  Bootstrap •  isolation • Assurance •  Validation of host characteristics •  Certification of host characteristics •  MAC address •  Location •  VM bootstrap on server side •  Design of new elements to interface NoSE and Cloud software • SPS: Secure Provisioning and Scheduling - 15 http://projects.celtic-initiative.org/seed4c/ Resources OpenNebula SPS CMP Resources Nimbus CMP SPS DIET SPS CMP: Cloud Management Platform SPS: Secure Provisioning and Scheduling © Alcatel Lucent / INRIA / MPY F. Desprez - ISC Cloud 25/09/2012

•  Automatic analysis •  Lots of computations •  3 processes:
analyzer, segmenter and partofspeech Techlimed – Linguistic as a Service (LaaS) •  Arabic language processing •  Web indexation •  Text mining •  Sentiment analysis •  Search engine - 16 A picture The picture And the picture Is it a picture His picture Is it his a picture Agglutination: elements added to a word How it is written: How it is read: Vowel addition/deletion F. Desprez - ISC Cloud 25/09/2012

Techlimed – OVH/Amazon EC2 solution •  Workflow distributed between OVH
servers and Amazon EC2 - 17 ... OmniNames MA MADAG SeD SeD SeD On-demand resources Fixed OVH resources Debian repository Management scripts Monitoring system Startup 1. Init from standard AMI 2. Install softs from Debian repo on OVH 3. Configure softs & environment 4. Connect SeD to MA 5. Connect Ganglia daemon Termination 1. Disconnect SeD 2. Wait for jobs termination or timeout 3. Terminate instance Arabic language analysis Arabic language analysis Arabic language analysis F. Desprez - ISC Cloud 25/09/2012

SOP: Think global Services for personal computer   SOP project
(ANR-11-INFR-001) -  Network connection + machine + software package -  No maintenance for users (latest functionalities, no virus) -  Always best performance -  Transparent reconfiguration -  Not limited by users' hardware -  Shared resources -  Cloud resources -  Data centers -  Try to save energy while providing good QoS Provide a distributed infrastructure to provide end-users with a seamless access to computational resources - 18 F. Desprez - ISC Cloud F. Desprez - ISC Cloud 25/09/2012

Grid’5000 One original vision - Being able to perform experiments at
every level of a grid or cloud software stack with the possibility •  of reproducing the experimental conditions •  isolate the experiments between each other •  to get a good flexibility •  to understand what is going on inside the platform •  to inject experimental conditions (faults, external load) •  An instrument for Computer Science •  10 sites in France connected through Renater with more than 70000 cores •  Sites in Luxemburgh and Brazil •  A example (FutureGrid in the USA) •  One of the first IaaS Clouds - 19 Grid Application Grid Middleware OS (…) Grid BIOS F. Desprez - ISC Cloud 25/09/2012

Conclusion and Future Work •  Going from a research prototype
to a production software •  Development between the Avalon research team and SysFera •  Flexible approach for Clusters, Grids, and Clouds •  Research issues currently addressed • Semi-static scheduling of dynamic workflows over Clouds • Elastic scheduling of resources • Multi-criteria scheduling (cost and performance) • Joint scheduling of requests and data management services •  SLA • MapReduce-like applications scheduling • Schedulers for different classes of applications • Automatic deployment of new applications • Elastic management of the middleware itself •  Still validated over Grid’5000 with various applications - 20 F. Desprez - ISC Cloud 25/09/2012

•  F. Desprez, E. Caron Avalon Team, LIP ENS Lyon
(Frederic.Desprez,Eddy.Caron)@inria.fr •  F. Veillet, B. Depardon SysFera •  http://graal.ens-lyon.fr/DIET

DIET - A Scalable Platform for Clusters, Grids ...

DIET - A Scalable Platform for Clusters, Grids and Clouds

SysFera

More Decks by SysFera

Other Decks in Technology

Featured

Transcript

Eddy Caron, Frédéric Desprez INRIA LIP ENS Lyon Avalon Research

Introduction •  Transparency and simplicity represent the holy grail for

Avalon and SysFera Avalon •  Research team from Inria (with

DIET’s Goals Distributed Interactive Engineering Toolbox Our goals -  To

DIET Architecture overview •  Context : Development of a toolbox

Data Management DAGDA (Data Arrangement for Grid and Distributed Applications)

Parallel and batch submissions Parallel & sequential jobs - transparent for

Workflow representation •  Direct Acyclic Graph (DAG) -  Each vertex

DIET Cloud: First Prototype Inside the Cloud •  DIET platform

DIET Cloud: Architecture Next Generation •  Many prerequisites available in

DIET Cloud: Workflow on Cloud using Nimbus •  Nimbus low-level

SysFera-DS overview End-User End-User End-User Clusters or supercomputers Workstations Data

SysFera-DS: Functionalities Workﬂow Engine Task Management Hierarchy of Schedulers FIFO

Seed4C: Secure embedded element and data protection •  Seed4C goal:

One Seed4C Use Case: HPC •  Added value of Network

•  Automatic analysis •  Lots of computations •  3 processes:

Techlimed – OVH/Amazon EC2 solution •  Workflow distributed between OVH

SOP: Think global Services for personal computer   SOP project

Grid’5000 One original vision - Being able to perform experiments at

Conclusion and Future Work •  Going from a research prototype

•  F. Desprez, E. Caron Avalon Team, LIP ENS Lyon