Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DIET - A Scalable Platform for Clusters, Grids ...

SysFera
September 25, 2012

DIET - A Scalable Platform for Clusters, Grids and Clouds

Frédéric Desprez (Inria Research Director) was at ISC Cloud'12 to talk about DIET, the grid and Cloud middleware.
He described how DIET started and the features it now boasts, including workflow management and multi-Cloud interfacing, thanks to the joint development of Inria and SysFera.

SysFera

September 25, 2012
Tweet

More Decks by SysFera

Other Decks in Technology

Transcript

  1. Eddy Caron, Frédéric Desprez INRIA LIP ENS Lyon Avalon Research

    Team DIET A Scalable Platform for Clusters, Grids and Clouds Benjamin Depardon SysFera Joint work with A. Muresan, J. Rouzaud-Cornabas (LIP ENS Lyon) H. Guemard, O. Mornard (SysFera)
  2. Introduction •  Transparency and simplicity represent the holy grail for

    Grids and Clouds (maybe even before performance) ! • Utility computing • (Almost) everything as a service •  Scheduling tunability to take into account the characteristics of specific application classes •  Large scope of applications ready (and not only number crunching ones !) • Heterogeneity of jobs executed and data managed •  Many incarnations of distributed platforms (peer-to-peer systems, clusters, clusters of clusters, grids, Clouds, …) • Many research projects around the world • Significant technology base • Many platforms available •  Do not forget good ol’ time research on scheduling and distributed systems ! - Most scheduling problems are very difficult to solve even in their simplistic form … - … but simple solutions often lead to better performance results in real life - 2 F. Desprez - ISC Cloud 25/09/2012
  3. Avalon and SysFera Avalon •  Research team from Inria (with

    CNRS, ENS Lyon, University of Lyon). •  Led by Christian Perez, located at LIP/ENS Lyon. •  Goals: contribute to the design of programming models supporting various kinds of architecture, to implement them by mastering the various algorithmic issues involved, and by studying the impact on application-level algorithms. •  Research areas: distributed algorithms, programming models, service deployment, service discovery, service composition and orchestration, large-scale data management, etc. •  Validation using applications from different fields of science such as bioinformatics, physics, cosmology, etc., over Grid'5000 SysFera •  Company co-founded by D. Loureiro, E. Caron, F. Desprez, and IT Translation •  Spin-off of Inria, led by David Loureiro. •  Goals: •  Industrialize the DIET software and expand upon it to provide advanced features. •  Use the accumulated experience and expertise to help software editors move to a SaaS/Cloud business model •  Areas of expertise: distributed computing, clusters, grids, HPC, Clouds, SaaS, PaaS, Big Data •  Validation in production environments for large companies and SMBs in various domains: energy, linguistics, video games, bioinformatics, cosmology, meteorology, and so on. - 3 F. Desprez - ISC Cloud 25/09/2012
  4. DIET’s Goals Distributed Interactive Engineering Toolbox Our goals -  To

    develop a toolbox for the deployment of environments using the Software as a Service (SaaS) paradigm with different applications -  Over clusters, clusters of clusters, grids and Clouds -  Use as much as possible public domain and standard software -  To obtain a high performance and scalable environment -  Implement and validate our more theoretical results •  Several applications in different fields -  simulation, bioinformatics, robotics, … •  Release 2.8.1 available on the web since July 2012 •  ACI Grid ASP, RNTL GASP, ANR LEGO CIGC-05-11, ANR Gwendia, Celtic-plus Project SEED4C http://graal.ens-lyon.fr/DIET/ - 4 F. Desprez - ISC Cloud 25/09/2012 DIET client Client layer DIET agents Scheduling layer MasterAgents, LocalAgents DIET server Service layer ServerDeamon (SeD)
  5. DIET Architecture overview •  Context : Development of a toolbox

    for deploying applications over various platforms with a hierarchical architecture for an improved scalability •  Distributed scheduler (MA, LA) •  Servers (SeD) •  Validation: Large validation over Grid’5000. •  DIET production used-case •  The Decrypthon project •  DIET was selected by IBM •  Transfered in a Start’up company SysFera (created in March 2010). •  Main research issues: scheduling, heterogeneity, automatic deployment, interoperability, high performance data transfer and management, monitoring, fault tolerance, genericity of solutions for various applications, static and dynamic analysis of performance, … - 5 F. Desprez - ISC Cloud 25/09/2012
  6. Data Management DAGDA (Data Arrangement for Grid and Distributed Applications)

    •  Joining task scheduling and data management •  Standardized through GridRPC OGF WG. •  Explicit data replication - Using the API •  Implicit data replication •  Data replacement algorithm - LRU, LFU AND FIFO •  Transfer optimization by selecting the more convenient source •  Storage resources usage management •  Data status backup/restoration   Join work with Gaël LeMahec (UPJV/MIS) - 6 F. Desprez - ISC Cloud 25/09/2012
  7. Parallel and batch submissions Parallel & sequential jobs - transparent for

    the user - system dependent submission SeDBatch - Many batch systems - Batch schedulers behaviour - Internal scheduling process •  Monitoring & Performance prediction •  Simulation MA LA SeD OAR SGE LSF PBS Loadleveler SeDBatch SeD// NFS SLURM - 7 F. Desprez - ISC Cloud 25/09/2012
  8. Workflow representation •  Direct Acyclic Graph (DAG) -  Each vertex

    is a task -  Each directed edge represents communication between tasks Goals •  Build and execute workflows •  Use different heuristics to solve scheduling problems •  Extensibility to address multi-workflows submission and large grid platform •  Manage heterogeneity and variability of environment ANR Gwendia •  Language definition (MOTEUR & MADAG) •  Comparison on Grid’5000 vs EGI Idle  time   Data  transfert   Execution  time   EGI  (Glite)   32.857s   132.143  s   274.643  s   Grid’5000  (DIET)      0.214s        3.371  s   540.614  s   8 F. Desprez - ISC Cloud 25/09/2012
  9. DIET Cloud: First Prototype Inside the Cloud •  DIET platform

    is virtualized inside the cloud. (as Xen image for example) •  Very flexible and scalable as DIET nodes can be launched •  Scheduling is more complex DIET as a Cloud manager •  Eucalyptus interface •  Eucalyptus is treated as a new Batch System •  Provide a new implementation for the BatchSystem abstract class - 9 F. Desprez - ISC Cloud 25/09/2012
  10. DIET Cloud: Architecture Next Generation •  Many prerequisites available in

    DIET •  service calls, •  scalable scheduling •  data management. •  Multi-cloud infrastructures manager •  SeD Cloud deals with a large number of Cloud IaaS using an API as δ-Cloud, OCCI, OpenStack, etc. •  DIET as Virtual Machines manager using the IaaS skills •  DIET SeD Cloud bootstraps a Cloud instance (VM launching) •  Application Deployment •  External tools •  Puppet, chef, etc. •  Elastic architecture •  Allows each DIET SeD Cloud to expand or reduces the number of compute resource. - 10 F. Desprez - ISC Cloud 25/09/2012
  11. DIET Cloud: Workflow on Cloud using Nimbus •  Nimbus low-level

    IaaS provider •  open-source IaaS provider •  provides low-level resources (VMs) •  compatible with the Amazon EC2 •  used a FutureGrid install •  Phantom high-level resource provider •  auto-scaling and high availability provider •  high-level resource provider •  subset of the Amazon auto-scale service •  part of the Nimbus platform ! "#$%!&'("()*! +,-./0,1!234532! 65789:! ;'<! ;'=! ;'>! ;'3! !"#$%&'(&')'*+', &'+-$&"'+, ?23@!1,-./0,1! A2B25C2!-2:90D! "2E0,F!1,-./0,1!DG:.:! ?B523D5:D! ?2"!AG7:2:! ! HIG3D,7! /-,3D23@! ?2"!)-G/5B<! !! HIG3D,7! /-,3D23@! HIG3D,7! :2-C5B2! J!J!J! .*/*0',!$1-+"*)%/0,2&-$3+, •  DIET MADag the workflow engine •  one service implementation per task •  each service launches its afferent task •  supports DAG, PTG and functional workflows •  Client submitting a workflow - 11 F. Desprez - ISC Cloud 25/09/2012
  12. SysFera-DS overview End-User End-User End-User Clusters or supercomputers Workstations Data

    centers Servers Private/public Clouds End-User Users - Tasks - Workflows - Data - Info CLI - C/C++ - Python - WS 12 25/09/2012 F. Desprez - ISC Cloud
  13. SysFera-DS: Functionalities Workflow Engine Task Management Hierarchy of Schedulers FIFO

    Green Cloud Appli1 Appli2 ... Monitoring LRMS SLURM SGE LoadLeveler OAR ... Computing Resources Cloud Resources Data Management C C++ Python Web interface User and Admin Interfaces User Management File Management Persistence Implicit/Explicit Replication Storage External Storage Local Storage Cloud Storage Cloud Interface Security Web services CLI 13 25/09/2012 F. Desprez - ISC Cloud
  14. Seed4C: Secure embedded element and data protection •  Seed4C goal:

    Guarantee end-to-end security of service •  Can we get a seed to build trusted Clouds ? • Up to 80% of problems can be solved with a protected execution and a proper policy enforcement • A TCB (Trusted Control Plane) within the network: the seed •  Smart deployment of SEEDs •  SEED load balancing • Pre-provisioning of security credentials • Dynamic association with applications/services •  SEED form factors and management • Hardware / Software / dedicated VMs / OS component ? - 14 "#$"%&$'%!"#$#%&''( '%)*&+*'%)(,'-"$.&/( +0(,'$1.-'2( SEED 4C SEED 4C http://projects.celtic-initiative.org/seed4c/ © Alcatel Lucent / INRIA / MPY F. Desprez - ISC Cloud 25/09/2012
  15. One Seed4C Use Case: HPC •  Added value of Network

    of secure elements (NoSE) • Generation and protection of secrets (Key) in network protocols •  OSPF, SMTP, S-BGP, Secure BGP • Execution of sensible code •  Policy verification •  Bootstrap •  isolation • Assurance •  Validation of host characteristics •  Certification of host characteristics •  MAC address •  Location •  VM bootstrap on server side •  Design of new elements to interface NoSE and Cloud software • SPS: Secure Provisioning and Scheduling - 15 http://projects.celtic-initiative.org/seed4c/ Resources OpenNebula SPS CMP Resources Nimbus CMP SPS DIET SPS CMP: Cloud Management Platform SPS: Secure Provisioning and Scheduling © Alcatel Lucent / INRIA / MPY F. Desprez - ISC Cloud 25/09/2012
  16. •  Automatic analysis •  Lots of computations •  3 processes:

    analyzer, segmenter and partofspeech Techlimed – Linguistic as a Service (LaaS) •  Arabic language processing •  Web indexation •  Text mining •  Sentiment analysis •  Search engine - 16 A picture The picture And the picture Is it a picture His picture Is it his a picture Agglutination: elements added to a word How it is written: How it is read: Vowel addition/deletion F. Desprez - ISC Cloud 25/09/2012
  17. Techlimed – OVH/Amazon EC2 solution •  Workflow distributed between OVH

    servers and Amazon EC2 - 17 ... OmniNames MA MADAG SeD SeD SeD On-demand resources Fixed OVH resources Debian repository Management scripts Monitoring system Startup 1. Init from standard AMI 2. Install softs from Debian repo on OVH 3. Configure softs & environment 4. Connect SeD to MA 5. Connect Ganglia daemon Termination 1. Disconnect SeD 2. Wait for jobs termination or timeout 3. Terminate instance Arabic language analysis Arabic language analysis Arabic language analysis F. Desprez - ISC Cloud 25/09/2012
  18. SOP: Think global Services for personal computer   SOP project

    (ANR-11-INFR-001) -  Network connection + machine + software package -  No maintenance for users (latest functionalities, no virus) -  Always best performance -  Transparent reconfiguration -  Not limited by users' hardware -  Shared resources -  Cloud resources -  Data centers -  Try to save energy while providing good QoS Provide a distributed infrastructure to provide end-users with a seamless access to computational resources - 18 F. Desprez - ISC Cloud F. Desprez - ISC Cloud 25/09/2012
  19. Grid’5000 One original vision - Being able to perform experiments at

    every level of a grid or cloud software stack with the possibility •  of reproducing the experimental conditions •  isolate the experiments between each other •  to get a good flexibility •  to understand what is going on inside the platform •  to inject experimental conditions (faults, external load) •  An instrument for Computer Science •  10 sites in France connected through Renater with more than 70000 cores •  Sites in Luxemburgh and Brazil •  A example (FutureGrid in the USA) •  One of the first IaaS Clouds - 19 Grid Application Grid Middleware OS (…) Grid BIOS F. Desprez - ISC Cloud 25/09/2012
  20. Conclusion and Future Work •  Going from a research prototype

    to a production software •  Development between the Avalon research team and SysFera •  Flexible approach for Clusters, Grids, and Clouds •  Research issues currently addressed • Semi-static scheduling of dynamic workflows over Clouds • Elastic scheduling of resources • Multi-criteria scheduling (cost and performance) • Joint scheduling of requests and data management services •  SLA • MapReduce-like applications scheduling • Schedulers for different classes of applications • Automatic deployment of new applications • Elastic management of the middleware itself •  Still validated over Grid’5000 with various applications - 20 F. Desprez - ISC Cloud 25/09/2012
  21. •  F. Desprez, E. Caron Avalon Team, LIP ENS Lyon

    (Frederic.Desprez,Eddy.Caron)@inria.fr •  F. Veillet, B. Depardon SysFera •  http://graal.ens-lyon.fr/DIET