A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications

A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing
Applications Alessandro Ferreira Leite Supervised by Christine EISENBEIS (INRIA, LRI, UPSud XI) Alba MELO (Université de Brasília) Claude TADONKI (Mines ParisTech) December 2, 2014

Cloud federation Entry point Cloud federation Entry point Cloud federation
Entry point Entry point

Entry point Entry point M ulti-cloud Multi-cloud

Entry point Entry point Multi-cloud Multi-cloud “In today’s world cloud computing is the new time sharing, more or less (Vinton G. Cerf, in ACM and the Professional Programmer. Queue 12, 7, July 2014)”

Why cloud computing is diﬀerent from the previous computing models
(e.g., grid computing)? Capacity Capacity demand Alessandro Ferreira Leite December 2, 2014 4 / 74

Why cloud computing is diﬀerent from the previous computing models
(e.g., grid computing)? Capacity demand Alessandro Ferreira Leite December 2, 2014 4 / 74

Why cloud computing is different from the previous paradigms (e.g.,
grid computing)? different services: clouds offer a wide range of services infrastructure, software, storage, networking broad network access: cloud services are available over the Internet and can be accessed through standard network protocols resource pooling: cloud users should only be aware of some informations elasticity: resources can be dynamically provisioned or released on-demand and in any quantity the cost of 1000 nodes during 1 hour is the same as the 1 node during 1000 hours Alessandro Ferreira Leite December 2, 2014 5 / 74

But, there are some drawbacks 1 performance stability 2 data
lock-in and standardization 3 security Alessandro Ferreira Leite December 2, 2014 6 / 74

Clouds are organized in layers PaaS IaaS SaaS sysadmins developers
end users Google Apps NetSuite SaleForce Azure AppEngine Heroku AWS EC2 Rackspace GCE Alessandro Ferreira Leite December 2, 2014 7 / 74

The main diﬀerence between the cloud service models relies on
the kind of control that the users have over clouds’ infrastructure Applications Data Runtime Middleware O.S Virtualization Servers Storage Networking Traditional scenario The users manage Alessandro Ferreira Leite December 2, 2014 8 / 74

the kind of control that the users have over clouds’ infrastructure Infrastructure as a Service Applications Data Runtime Middleware O.S Virtualization Servers Storage Networking The providers manage The users manage Alessandro Ferreira Leite December 2, 2014 8 / 74

the kind of control that the users have over clouds’ infrastructure Platform as a Service Applications Data Runtime Middleware O.S Virtualization Servers Storage Networking The providers manage The users manage Alessandro Ferreira Leite December 2, 2014 8 / 74

the kind of control that the users have over clouds’ infrastructure Software as a Service Applications Data Runtime Middleware O.S Virtualization Servers Storage Networking The providers manage Alessandro Ferreira Leite December 2, 2014 8 / 74

Trade-oﬀ: control vs level of abstraction Applications Data Runtime Middleware
O.S Virtualization Servers Storage Networking Packaged Software Infrastructure (as a Service) Platform (as a Service) Software (as a Service) Applications Data Runtime Middleware O.S Virtualization Servers Storage Networking Applications Data Runtime Middleware O.S Virtualization Servers Storage Networking Applications Data Runtime Middleware O.S Virtualization Servers Storage Networking The users manage The providers manage The users manage The providers manage The providers manage The users manage Alessandro Ferreira Leite December 2, 2014 9 / 74

Outline 1 Introduction and Motivation 2 Objectives 3 Cloud provider:
using federated clouds to meet power consumption constraints 4 Software developer: using federated clouds to execute native cloud applications 5 Unskilled cloud users: using the cloud to execute cloud-unaware applications 6 Multiple users’ proﬁles: using federated clouds with an adequate level of abstraction 7 Conclusion and Perspectives

Following a goal-oriented strategy, this thesis aims to investigate the
usage of federated clouds considering different viewpoints 1 cloud provider to reduce power consumption of data centers without having significant performance loss 2 experienced software developers to execute real cloud application at reduced-cost, without being locked to a cloud provider 3 ordinary users to execute cloud-unaware applications trying to reduce the financial cost and the execution time, requiring minimal users’ interventions 4 multiple users’ profiles to allow the usage of federated clouds with a level of abstraction that is suitable for experienced and inexperienced users Alessandro Ferreira Leite December 2, 2014 10 / 74

Cloud providers point of view Alessandro Ferreira Leite December 2,
2014 10 / 74

Data centers’ power consumption is greater than the whole electricity
consumed by some countries Source:“How dirty is your data?”, Greenpeace report. Available at: bit.ly/howdirtyisyourdata The demand for electricity will be greater than the combined total demands of France, Germany, Canada, and Brazil (≈ 1, 973 billion kWh). Alessandro Ferreira Leite December 2, 2014 11 / 74

The high data centers’ electricity demand is mostly due to
the large number of servers most of the time the servers are idle (avg usage of 6%) 56% of data centers facilities at the peak performance Source: SMART 2020: Enabling the low carbon economy in the information age. Available at: bit.ly/aip3pV Alessandro Ferreira Leite December 2, 2014 12 / 74

A way to reduce data centers’ power consumption is using
virtualization and server consolidation Host 1 OS VM Monitor VM1 Host 2 OS VM Monitor VM2 Host 3 OS VM Monitor VM4 VM3 Alessandro Ferreira Leite December 2, 2014 13 / 74

A way to reduce data centers’ power consumption is using
virtualization and server consolidation Host 1 OS VM Monitor VM1 Host 2 OS VM Monitor VM2 Host 3 OS VM Monitor VM4 VM3 VM1 VM4 Alessandro Ferreira Leite December 2, 2014 13 / 74

First objective of this thesis We aim to reduce power
consumption of data centers without having signiﬁcantly performance losses Alessandro Ferreira Leite December 2, 2014 14 / 74

Our question Can we ... 1 increase data centers’ energy
eﬃciency through server consolidation? 2 use federated clouds to try to meet performance guarantees (i.e., service-level agreement (SLA))? Alessandro Ferreira Leite December 2, 2014 15 / 74

Our proposal: a multi-agent strategy to negotiate resources’ allocation Private
Cloud Electricity manager AC Cloud coordinator SLA Manager Carbon footprint calculator regulates Cloud user Public Cloud Cloud coordinator CERA Electricity provider Alessandro Ferreira Leite December 2, 2014 16 / 74

Each cloud had one coordinator accountable for monitoring the metrics
and for negotiating with other agents VM1 VM2 VMn ... Node Energy sensor SLA sensor Coordinator VM1 VM2 VMn ... Noden Workload sensor Energy sensor SLA sensor Workload sensor Alessandro Ferreira Leite December 2, 2014 17 / 74

Proposed algorithm Host a OS Hypervisor Host b OS Hypervisor
Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Host 1 OS Hypervisor Alessandro Ferreira Leite December 2, 2014 18 / 74

VM1 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Cloud user Host 1 OS Hypervisor Alessandro Ferreira Leite December 2, 2014 18 / 74

Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Host 1 OS Hypervisor VM1 Alessandro Ferreira Leite December 2, 2014 18 / 74

Proposed algorithm Hosta OS Hypervisor Hostb OS Hypervisor VM2 Cloud
coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Cloud user Host1 OS Hypervisor VM1 Alessandro Ferreira Leite December 2, 2014 18 / 74

Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM3 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Cloud user Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM3 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM3 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 VM3 Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM4 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Cloud user VM3 Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM4 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 VM3 Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 VM3 VM4 Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM5 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Cloud user VM3 VM4 Host b OS Hypervisor VM1 VM2 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM7 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 VM3 VM Host b OS Hypervisor VM1 VM2 VM5 VM6 VM4 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM6 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 Cloud user VM3 VM4 Host b OS Hypervisor VM1 VM2 VM5 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM6 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 VM3 VM4 Host b OS Hypervisor VM1 VM2 VM5 VM6 Alessandro Ferreira Leite December 2, 2014 18 / 74

VM7 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 CE Cloud user VM3 VM4 Host b OS Hypervisor VM1 VM2 VM5 VM6 Can I increase my power threshold? No VM4 ! Maximum power consumption Alessandro Ferreira Leite December 2, 2014 18 / 74

VM7 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 C VM3 VM4 Host b OS Hypervisor VM1 VM2 VM5 VM6 Can I increase my power threshold? VM4 ! Maximum power consumption Alessandro Ferreira Leite December 2, 2014 18 / 74

VM7 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 CERA VM3 VM4 Host b OS Hypervisor VM1 VM2 VM5 VM6 Can I increase my power threshold? VM4 ! Maximum power consumption Alessandro Ferreira Leite December 2, 2014 18 / 74

VM7 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 CERA VM3 VM4 Host b OS Hypervisor VM1 VM2 VM5 VM6 Can I increase my power threshold? No VM4 ! Maximum power consumption Alessandro Ferreira Leite December 2, 2014 18 / 74

VM7 Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 VM3 VM4 Host b OS Hypervisor VM1 VM2 VM5 VM6 VM4 ! Maximum power consumption Alessandro Ferreira Leite December 2, 2014 18 / 74

Cloud coordinator 1 Cloud 1 Cloud 2 Cloud coordinator 2 VM3 VM Host b OS Hypervisor VM1 VM2 VM5 VM6 VM7 ! Maximum powe consumption Alessandro Ferreira Leite December 2, 2014 18 / 74

Environment: we evaluated our power management strategy with a modiﬁed
version of the CloudSim simulator 1 the simulation environment included 2 clouds (each cloud had one data center) that had 100 hosts each 2 the workload model included provisioning for 400 virtual machines, where each virtual machine requested a CPU core, 256 MB of RAM and 1GB of storage 3 a global energy consumption threshold of 3KWh per data center 4 the service-level agreement (SLA) was deﬁned in terms of response time (10 minutes) Alessandro Ferreira Leite December 2, 2014 19 / 74

Results: scenario 1 - only one cloud (DC1) is overloaded
20% 40% 60% 80% 100% DC1 DC2 Without VM migration CPU utilization threshold Power consumption (kWh) 0 2 4 6 8 10 12 2.9 1.9 9.1 2.8 9 1.9 2.8 1.7 9.1 2.9 1.5 9.1 2.9 1.4 9.2 4.8 4.7 4.5 4.4 4.3 Alessandro Ferreira Leite December 2, 2014 20 / 74

Results: scenario 2 - two clouds (DC1 & DC2) are
overloaded 20% 40% 60% 80% 100% DC1 DC2 Without VM migration CPU utilization threshold Power consumption (kWh) 0 2 4 6 8 10 12 2.5 2.9 9.1 9 9.1 9.1 9.2 2.5 2.9 2.5 2.9 2.6 2.9 2.6 2.9 5.4 5.4 5.4 5.5 5.5 Alessandro Ferreira Leite December 2, 2014 21 / 74

Diﬀerent from other works, in our work, workload migration requires
negotiation between the data centers Paper Target Federated Multi-agent Migration Negotiation SLA [9] Cluster No No No No No [8] Cluster No No Same DC No No [6] Cloud Yes No Among DCs No Yes [18] Cluster No No Same DC No No [3] Cloud No Yes No No No [5] Cluster No Yes No No No [11] Cloud No No No No Yes [1] Cloud No No Same DC No Yes [21] Cloud No No No No Yes [2] Cloud No No Same DC No Yes This work Cloud Yes Yes Among DCs Yes Yes Alessandro Ferreira Leite December 2, 2014 22 / 74

Statement: workload consolidation helped data centers to reduce power consumption
federated clouds helped on minimizing power consumption however, in a large-scale scenario (i.e., with dozens or even hundred of clouds) it may be diﬃcult to manage resources’ allocation across clouds’ data centers one immediate issue may be the network latency and resource heterogeneity moreover, other nodes’ component should also be considered, as CPU no long dominates the nodes’ power consumption [Minas and Ellison, 2009, Capra et al., 2012] Alessandro Ferreira Leite December 2, 2014 23 / 74

Software developers point of view Alessandro Ferreira Leite December 2,
2014 23 / 74

Second objective of this thesis We aim to execute a
real HPC application at reduced-cost, without being locked to a cloud provider Alessandro Ferreira Leite December 2, 2014 24 / 74

Considering the developers’ viewpoints, there are some risks associated to
the usage of the clouds 1 technical portability and interoperability issues 2 economical cost for changing the applications to meet the clouds’ characteristics 3 legal data protections required by law Alessandro Ferreira Leite December 2, 2014 25 / 74

Aware of these risks, diﬀerent cloud providers oﬀer resources with
zero-cost, for the developers to test the cloud Alessandro Ferreira Leite December 2, 2014 26 / 74

Our question Can we ... 1 use the resources available
for free on the clouds to execute HPC applications? Alessandro Ferreira Leite December 2, 2014 27 / 74

Example of executing a MapReduce application using the free clouds
Cloud 2 Cloud 3 Cloud 4 Cloud 5 Cloud 6 bag-of-tasks Cloud 1 Alessandro Ferreira Leite December 2, 2014 28 / 74

Example of executing a MapReduce application using the free clouds
Cloud 2 Cloud 3 Cloud 4 Cloud 5 Cloud 6 Cloud 1 Alessandro Ferreira Leite December 2, 2014 28 / 74

We designed a hierarchical architecture Communication driver Cloud storage Cloud
coordinator Job controller MapReduce Module Communication driver Cloud master1 Task module OUT IN Task executor1 Task executorx Communication driver Cloud master2 Task module OUT IN Task executor1 Task executory Tasks Results Results Tasks Query sequences and genomics database Scores GUI 2 3 4 4 5 6 Interaction between the components 1 Alessandro Ferreira Leite December 2, 2014 29 / 74

Proposed Smith-Waterman (SW) algorithm execution Input Queue Output Queue Smith-Waterman
Algorithm Slave instance (1) dequeue one task to execute (3) enqueue the result (score) (2) execute the task Map Reduce Alessandro Ferreira Leite December 2, 2014 30 / 74

Environment: we executed our application on five federated clouds Cloud
Configuration Elastic Compute Cloud (EC2) 24 applications in the micro instances (Intel Xeon 2.0GHz 1 core, 613MB RAM) Google App En- gine (GAE) 10 applications with 2 instances per application Heroku 1 application deployed in the Free Cedar stack configuration OpenShift 1 application deployed in the Express Configu- ration PiCloud 1 application deployed in one Intel Xeon 2.66GHz, 8GB RAM Alessandro Ferreira Leite December 2, 2014 31 / 74

We compared up to 24 query sequences with the database
UniProKB/SwissProt the database UniProKB/SwissProt comprised 572794 sequences the sequences’ sizes ranged from 144 amino acids to 5478 amino acids more than 12 million genomics sequences comparisons Alessandro Ferreira Leite December 2, 2014 32 / 74

Result: the federated cloud scenario outperformed the execution time of
the best single cloud Amazon GAE Heroku PiCloud OpenShift Federated 0 5000 10000 15000 4800 13500 14400 12800 9600 3720 Execution time (seconds) Cloud provider Alessandro Ferreira Leite December 2, 2014 33 / 74

Result: the federated cloud scenario outperformed the execution time of
the best single cloud Amazon GAE Heroku PiCloud OpenShift Federated 0 5000 10000 15000         Alessandro Ferreira Leite December 2, 2014 33 / 74

Our result is compared to the ones obtained with Cell/BEs
and multi-core clusters Year Platform Comparison Output # Proc. Elements GCUPs Cell/BE 2008 query x dbase score 6 SPEs 8.00 cluster Cell/BE 2008 query x dbase score,align. 84 SPEs 0.42 cluster 2004 seq x seq score,align. 60 procs 0.25 cluster 2009 query x dbase score 24 cores 4.38 GPU 2009 query x dbase score GTX295 29.7 GPU 2013 seq x seq score,align. GTX560 58.21 cloud 2009 query x dbase score 768 cores — cloud 2012 query x dbase score 20 EC2 Units — this work 2012 query x dbase score 5 federated 1.35 clouds Alessandro Ferreira Leite December 2, 2014 34 / 74

Statement: free-quota resources are an interesting option for HPC testbeds
on the clouds its usage demands maturity for the developers the developers have to deal with diﬀerent conﬁguration tasks and constraints Alessandro Ferreira Leite December 2, 2014 35 / 74

Unskilled users point of view Alessandro Ferreira Leite December 2,
2014 35 / 74

Third objective of this thesis We aim to to execute
cloud-unaware applications considering the point of view of ordinary users Alessandro Ferreira Leite December 2, 2014 36 / 74

Executing an application on the cloud is a complex task
1 select an instance type to set up the environment 2 install and conﬁgure software and libraries 3 transfer data from their local machine to the cloud 4 create a virtual machine image (VMI) of the environment 5 execute their applications on the cloud 6 transfer data from the cloud to their local machine Alessandro Ferreira Leite December 2, 2014 37 / 74

Executing an application on the cloud is a complex task
1 select an instance type to set up the environment 2 install and conﬁgure software and libraries 3 transfer data from their local machine to the cloud 4 create a VMI of the environment 5 execute their applications on the cloud 6 transfer data from the cloud to their local machine Alessandro Ferreira Leite December 2, 2014 37 / 74

Choosing the best platform for an application involves many trade-offs
for example, Amazon EC2 offers 34 instance types organized into six families it is difficult to predict instances’ performance learning the cloud concepts can take days or even weeks most of the available applications were not designed to execute on cloud environment Alessandro Ferreira Leite December 2, 2014 38 / 74

Our question Can we ... 1 use autonomic computing to
help the unskilled users on instantiating the cloud? Alessandro Ferreira Leite December 2, 2014 39 / 74

What is autonomic computing Autonomic computing Computing systems that manage
themselves, adjusting to the environment changes in order to meet the users’ objectives1. Such systems are essentially self-management systems that aim to free the users from the details of systems’ operations. These systems achieve this goal implementing eight characteristics: self-conﬁguration, self-healing, self-optimization, self-protection, self-knowledge, context-awareness, openness, and self-adaptation. 1Paul Horn. Autonomic computing: IBM’s perspective on the state of information technology. 2001 Alessandro Ferreira Leite December 2, 2014 40 / 74

Although some applications were designed to run sequentially, they comprise
independent tasks that can be executed in parallel T1 T2 T4 s2 s1 s3 T3 Example of a workﬂow Alessandro Ferreira Leite December 2, 2014 41 / 74

Although some applications were designed to run sequentially, they comprise
independent tasks that can be executed in parallel T1 T2 T4 s2 s1 s3 T3 Example of a workﬂow we can group these subtasks into partitions and assign them for diﬀerent nodes Alessandro Ferreira Leite December 2, 2014 41 / 74

A subtask can be identified based on the format of
applications’ data Some domains like bioinformatics, use well-known formats for representing their data we can take advantage of these formats to read, to parse, and to filter the data employing different strategies >ERR135910.3 2405:1:1101:1234:1973:Y/1 NAAGGGTTTGAGTAAGAGCATAGCTGTTGGGACCCGAAAGATGGTGAACT >ERR135910.5 2405:1:1101:1170:1994:Y/1 NTCAACGAGGAATTCCTAGTAAGCGNAAGTCATCANCTTGCGTTGAATAC >ERR135910.6 2405:1:1101:1272:1972:Y/1 NTAGTACTATGGTTGGAGACAACATGGGAATCCGGGGTGCTGTAGGCTTG Alessandro Ferreira Leite December 2, 2014 42 / 74

The data are persisted onto a distributed database in the
JSON format { "sequences" : [ { "name" : "ERR135910.3", "description" : "2405:1:1101:1234:1973:Y/1", "value" : "NAAGGGTTTGAGTAAGAGCATAGCTGTTGGGACCCGAAAGATGGTGAACT" }, { "name" : "ERR135910.5", "description" : "2405:1:1101:1170:1994:Y/1", "value" : "NTCAACGAGGAATTCCTAGTAAGCGNAAGTCATCANCTTGCGTTGAATAC" }, { "name" : "ERR135910.6", "description" : "2405:1:1101:1272:1972:Y/1", "value" : "NTAGTACTATGGTTGGAGACAACATGGGAATCCGGGGTGCTGTAGGCTTG" } ] } this ﬁle is automatically generated by the system Alessandro Ferreira Leite December 2, 2014 43 / 74

We can employ the MapReduce model to distribute and to
execute the applications A = load ’$input’ using FastaStorage as (id:chararray, d: int, seq:bytearray, header:charrray); B = foreach A generates flatten (KmerGenerator(seq,20)) as (kmer:bytearray); C = group B by kmer paralell $p; D = foreach C generate group, count(B); E = group D by $1 paralell $p; F = foreach E generate group, count(D); store F into ’$output’ example of description using the Pig model Alessandro Ferreira Leite December 2, 2014 44 / 74

Our architecture employed the same strategy to execute non-MapReduce applications
(i.e., a genomics workﬂow) Infernal Segemehl RNAfold SAMtools 1 the Infernal application maps the sequences onto a nucleic acid sequence database (e.g., Rfam) 2 the sequences with no hit or with a low score hit are processed by the segemehl application 3 SAMTools sorts the alignments and converts them to the SAM/BAM format 4 RNAFold calculates the minimum free energy of the RNA molecules we provide a domain speciﬁc language (DSL) to allow the users do describe the applications and their dependencies Alessandro Ferreira Leite December 2, 2014 45 / 74

Our architecture employed the same strategy to execute non-MapReduce applications
(i.e., a genomics workﬂow) Infernal Segemehl RNAfold SAMtools 1 the Infernal application maps the sequences onto a nucleic acid sequence database (e.g., Rfam) 2 the sequences with no hit or with a low score hit are processed by the segemehl application 3 SAMTools sorts the alignments and converts them to the SAM/BAM format 4 RNAFold calculates the minimum free energy of the RNA molecules we provide a DSL to allow the users do describe the applications and their dependencies Alessandro Ferreira Leite December 2, 2014 45 / 74

In our case, the users use our DSL to describe
and to execute the workﬂow ... application: - command: "segemehl.x -i ${idx} -d ${genome} -q ${ segemehl_database} > ${segemehl_output}" order: 2 data-def: ... data: - id: "infernal_hits" path: "$HOME/infernal/infernal_hits.txt" - id: "spombe_reads" path: "$HOME/Spombe/reads_spombe2.fa" - id: "segemehl_database" query: "SELECT sequence FROM spombe_reads WHERE sequence NOT IN (SELECT sequence FROM infernal_hits WHERE score >= 34)" example of deﬁning the execution of non-MapReduce applications using our DSL Alessandro Ferreira Leite December 2, 2014 46 / 74

Our architecture’s style is based on microservices What is a
microservice? A lightweight and independent service that performs a single function and collaborates with other services using a well-deﬁned interface to achieve some objective. In that case, the services can scale in/out independently. Alessandro Ferreira Leite December 2, 2014 47 / 74

Environment: a system prototype was deployed on Amazon EC2 Instance
type CPU RAM Cost (USD/hour) PC Intel Core 2 Quad CPU 2.40GHz 4 GB N/A hs1.8xlarge Intel Xeon 2.0 GHz 16 cores 171 GB 4.60 m1.xlarge Intel Xeon 2.0 GHz 4 cores 15 GB 0.48 c1.xlarge Intel Xeon 2.0 GHz 8 cores 7 GB 0.58 t1.micro Intel Xeon 2.0 GHz 1 core 613 MB 0.02 Alessandro Ferreira Leite December 2, 2014 48 / 74

Result: the users may see the cloud as an unattractive
option 61462 31295 65888 50113 0 20000 40000 60000 c1.xlarge hs1.8xlarge m1.xlarge PC Instance type Execution time in seconds 12 78 27 0 20 40 60 80 c1.xlarge hs1.8xlarge m1.xlarge Instance type Cost (USD) Alessandro Ferreira Leite December 2, 2014 49 / 74

Result: with the auto-scaling, the cost and the execution time
were reduced 5408 8160 0 2000 4000 6000 8000 c1.xlarge m1.xlarge Instance type Execution time in seconds Alessandro Ferreira Leite December 2, 2014 50 / 74

were reduced time (seconds) #instances 0 2 4 6 8 10 0 2000 4000 6000 8000 10000 Alessandro Ferreira Leite December 2, 2014 50 / 74

were reduced 31295 10830 0 10000 20000 30000 auto−scaling hs1.8xlarge Instance type Execution time in seconds (lower is better) 11 VMs user's selection 11 VMs 14 78 0 20 40 60 80 auto−scaling hs1.8xlarge Instance type Cost (USD) (lower is better) user's selection (without the auto-scaling) Alessandro Ferreira Leite December 2, 2014 50 / 74

Other works have tried to scale applications’ execution on the
cloud Work Workﬂow Auto-scaling User support Application CloVR Yes Yes Script BLAST Resilin No Yes Hadoop MapReduce applications BioPig No Yes Pig MapReduce applications SeqPig No Yes Pig MapReduce applications This work Yes Yes DSL + Pig more general Alessandro Ferreira Leite December 2, 2014 51 / 74

Statement: the users could use a cloud to execute ordinary
applications self-conﬁguration helped the users to use a cloud historical data helped us to reduce automatically both the execution time and the monetary cost a DSL allowed the users to describe the dependencies between the applications Alessandro Ferreira Leite December 2, 2014 52 / 74

Statement: we identified some concerns related to our system 1
we assumed that the clouds belonging to a cloud provider were homogeneous with regard to resource types and constraints 2 we assumed that some configuration data (e.g., access keys, VMI) were automatically shared between the clouds of a provider 3 we considered just the viewpoint of unskilled users we need a method that can allow us to handle cloud heterogeneity taking into different users’ profiles Alessandro Ferreira Leite December 2, 2014 53 / 74

Multiple users’ point of view Alessandro Ferreira Leite December 2,
2014 53 / 74

Fourth objective of this thesis We aim to allow the
usage of federated clouds with a level of abstraction that is suitable for experienced and inexperienced users Alessandro Ferreira Leite December 2, 2014 54 / 74

There is no easy way to deploy and to execute
an application on federated clouds taking into account temporal and functional dependencies between the resources 1 in order to deploy an application on multiple clouds the users have to conﬁgure each cloud individually 2 in some cases, we can create a virtual machine image (VMI) with the applications the VMI can only handle software package descriptions the usage of VMI among multiple cloud is normally expensive the VMI cannot handle cloud outage 3 the users have to read extensive documentation in order to ensure that their desired resources are available in the cloud and also to understand the constraints of each resource Alessandro Ferreira Leite December 2, 2014 55 / 74

Some computing environment conﬁgurations are diﬃcult even for skilled system
administrators enhanced networking networking cost Alessandro Ferreira Leite December 2, 2014 56 / 74

The clouds employ diﬀerent terms to describe their resources and/or
services Amazon EC2 uses Elastic Computing Unit (ECU) as a metric to express the CPU capacity of a virtual machine Google Compute Engine (GCE) uses Google Compute Engine Units (GCEUs) it is very diﬃcult to convert from one similar metric (e.g., ECU) of a provider to another (e.g., GCEUs) additionally, the providers utilize high-level terms to describe the performance of their resources, which limits a decision based only on the resources’ descriptions Alessandro Ferreira Leite December 2, 2014 57 / 74

Although cloud providers use diﬀerent terms to describe their services,
we can identity many commonalities in these services A computing service: 1 has a hardware capacity (i.e., number of CPU cores, GFlops, RAM memory size) 2 uses a disk technology (e.g., SSD, EBS) 3 depends on an operating system (i.e., VMI) 4 runs on a geographic region and on an availability zone (i.e., data center) 5 has a billing type (e.g., on-demand, spot, reserved) Alessandro Ferreira Leite December 2, 2014 58 / 74

Our question Can we ... 1 employ a software product
line engineering method to conﬁgure cloud environments? Alessandro Ferreira Leite December 2, 2014 59 / 74

What is software product line Software product line (SPL) A
method to build software systems that share a set of features that satisfy the needs of a particular domain or mission, and are developed from a common set of assets in a prescribed way2. A feature means a user requirement or a visible functionality of a product. we capture the knowledge of creating and conﬁguring computing environments on the clouds in the form of reusable assets SPL leads to customizable cloud environment at lower ﬁnancial costs 2Paul Clements and Linda Northrop. Software Product Lines: Practices and Patterns. Addison-Wesley, 2001 Alessandro Ferreira Leite December 2, 2014 60 / 74

Our SPL method Cloud communality and variability modeling (Domain analysis)
Deployment (Product configuration) Abstract feature model Concrete feature model with the constraints and quantitative attributes of the clouds Product configurations in the Pareto Front Reference for the deployed resources (products) Domain knowledge System engineers Domain engineers Cloud users Requirements and objectives Description and constraints of the environment System System and configuration scripts implementation (Domain implementation) Optimal configurations Requirements analysis Domain Engineering Product Engineering User needs Configuration knowledge definition Measurement Domain design SPL architecture definition The CSP model 1 2 Phase Activity Sub-activity Alessandro Ferreira Leite December 2, 2014 61 / 74

We use extended feature model to describe the clouds Legend:
Mandatory Optional Alternative Abstract Concrete Hardware configuration Compute unit Bus size b32 b64 RAM memory Network Compute General Instance type Family type Instance disk Memory Storage Shared GPU Intel Phi FPGA Accelerator gflops: Integer frequency: Integer #vcpu: Integer sizeGB: Integer throughputGbps: Integer ingressCostGB: Integer egressCostGB: Integer [0..n] diskSizeGB: Integer iops: Integer cost: Integer lag: Integer est: Integer maxMntDisks: Integer maxMntDisksSizeGB: Integer maxInstances: Integer ... ... Alessandro Ferreira Leite December 2, 2014 62 / 74

We use extended feature model to describe the clouds c3.2xlarge-hardware
Intel Xeon E5-2680 b64 c3.2xlarge-network Ephemeral SSD Ephemeral SSD Compute instance-disk-1 instance-disk-2 cost: 0.53 USD lag: 0.15 maxInstances: 20 maxMntDisksSizeGB: 20,480 gflops: 123.20 frequency: 2.8 GHz #vcpu: 8 throughputGbps: 1.5 egressCostGB: 0.12 ingressCostGB: 0.0 diskSizeGB: 80 iops: 24 cost: 0.0 diskSizeGB: 80 iops: 24 cost: 0.0 c3.2xlarge Alessandro Ferreira Leite December 2, 2014 62 / 74

We use extended feature model to describe the clouds General
Intel Sandy Bridge Xeon gﬂops: 26.4 frequency: 2.6 GHz #vpcu: 2 n2-network throughputGbps: 0.56 ingressCostGB: 0.0 egressCost: 0.11 n1-standard-2-hardware n1-standard-2 cost: 0.14 lag: 0.10 maxInstances: 12 maxMntDisks: 16 maxMntDiskSizeGB: 10, 240 Alessandro Ferreira Leite December 2, 2014 62 / 74

Our conﬁguration knowledge maps the cloud conﬁgurations into the assets
assets are scripts with some variation points ... export NODE_REGION_NAME= ${[location.region.name]} export NODE_REGION_ENDPOINT= ${[location.region.endpoint]} export NODE_ZONE_NAME= ${[location.name]} ... Cloud functions into code Provider Class amazon org.excalibur.driver.aws.ec2.EC2 google org.excalibur.driver.google.compute.GoogleCompute Clouds’ data such as price, availability zones, instance types, VMIs Alessandro Ferreira Leite December 2, 2014 63 / 74

Our method uses benchmarks to obtain qualitative and quantitative data
from the clouds LINPACK3 to measure the sustainable performance of the instance types iperf4 to measure network throughput UnixBench5 to compare the performance variations of the instance types of a cloud provider 3netlib.org/linpack 4software.es.net/iperf 5code.google.com/p/byte-unixbench Alessandro Ferreira Leite December 2, 2014 64 / 74

We implemented the architecture and the feature model, enabling the
users to instantiate the clouds based on diﬀerent requirements Domain and system engineers Cloud API System solver Feature models 1. create the feature models 2. Submit their requirements 3. Loads the models 5. Returns the solutions (All, best, Pareto front) 4. Asks for the valid solutions 8. Stores the selected solutions 6. Shows the solutions (All, best, Pareto front) 7. Request the deploy 9. Deploy the resources Alessandro Ferreira Leite December 2, 2014 65 / 74

In our architecture, the nodes are organized through a peer-to-peer
(P2P) overlay N2 M1 N3 N4 M2 M3 M4 Cloud 1 Cloud 2 XMPP connection Chord ring Internal P2P overlay Internal P2P overlay External P2P overlay Alessandro Ferreira Leite December 2, 2014 66 / 74

Using our method to execute a biological sequence comparison application
on the cloud ... requirements: cpu: 2 memory: 6 platform: "LINUX" cost: 0.2 number-of-instances-per-cloud: 10 applications: application: name: "ssearch36" command-line: "ssearch36 -d 0 ${query} ${database} >> ${score_table}" file: - name: "query" path: "$HOME/sequences/O60341.fasta" - name: "database" path: "$HOME/uniprot_sprot.fasta" - name: "score_table" path: "$HOME/scores/O60341_scores.txt" generated: "Y" ... on-finished: "TERMINATE" Alessandro Ferreira Leite December 2, 2014 67 / 74

After few minutes, the users could have their requested computing
environment automatically conﬁgured on the clouds 514 468 502 414 432 549 441 515 0 100 200 300 400 500 600 !m3.large! !n1,standard,2! !c3.xlarge! !n1,standard,4!! Configuration time (seconds) 5 instances 10 instances (EC2) (GCE) (EC2) (GCE) Alessandro Ferreira Leite December 2, 2014 68 / 74

And, they cloud also have the results of their application’s
execution create 10 virtual machines of the instance type n1-standard-4. Hence, we executed the tasks in a multi-cloud scenario. In this case, there were 10 instances (5 c3.xlarge and 5 n1-standard-4). Figure 8.6 shows the wallclock time for the multi-cloud scenario. 8.4 Related Work Over the year, di erent cloud architectures have been proposed with di erent objectives. Some of these architecture implements autonomic properties. Most of these architecture were presented in section 2.5. Table 8.3 presents the architecture that implements 8.5 Final Considerations 198 276 249 132 135 188 170 89 93 0 50 100 150 200 250 300 m3.large n1-standard-2 c3.xlarge n1-standard-4 Execution time (seconds) 5 instances 10 instances gure 10.8: Application execution time for 24 sequence comparisons using the SSEARCH an the UniProtKB/Swiss-Prot database (EC2) (GCE) (EC2) (GCE) Alessandro Ferreira Leite December 2, 2014 69 / 74

The literature has focused on the provider and software developers
Work Target Cloud model Multi-cloud SCORCH VMI configuration IaaS No [17, 16] VMI configuration IaaS No [20] VMI configuration IaaS No [10] VMI deployment IaaS No [4] VMI deployment IaaS No [7] Instance selection IaaS No HW-CSPL Refactoring of an application PaaS Yes SALOON Deployment of native cloud applications PaaS Yes [12] CMMTA SaaS No [13, 14] CMMTA SaaS No [15] CMMTA SaaS No [19] CMMTA SaaS No This work IS & IC IaaS Yes Configuration management of multi-tenant applications (CMMTA) Instance selection (IS) and infrastructure configuration (IC) Alessandro Ferreira Leite December 2, 2014 70 / 74

Statement: it is possible to match ﬂexibility and control on
the cloud a SPL-based method is an interesting solution it can meet the objectives of diﬀerent users it promotes reuse at low-cost it enables declarative strategies Alessandro Ferreira Leite December 2, 2014 71 / 74

Conclusion Summary of the contributions 1 a server consolidation strategy
to reduce power consumption of cloud data centers, taking into account SLA (objective 1) 2 an architecture to execute a native cloud application on a vertical federated cloud at zero-cost (objective 2) 3 an autonomic architecture to execute and to scale cloud-unaware application in an infrastructure-as-a-service (IaaS) cloud (objective 3) 4 a software product line engineering to handle multi-cloud commonality and variability (objective 4) 5 an autonomic architecture that uses our SPL engineering method to execute applications on federated clouds (objective 4) software prototype available at dohko.io Alessandro Ferreira Leite December 2, 2014 72 / 74

Conclusion Summary of the contributions 1 a server consolidation strategy
to reduce power consumption of cloud data centers, taking into account SLA (objective 1) 2 an architecture to execute a native cloud application on a vertical federated cloud at zero-cost (objective 2) 3 an autonomic architecture to execute and to scale cloud-unaware application in an IaaS cloud (objective 3) 4 a software product line engineering to handle multi-cloud commonality and variability (objective 4) 5 an autonomic architecture that uses our SPL engineering method to execute applications on federated clouds (objective 4) software prototype available at dohko.io Alessandro Ferreira Leite December 2, 2014 72 / 74

Cloud has evolved as a computing model for providing HPC
as utility — metacomputer Alessandro Ferreira Leite December 2, 2014 73 / 74

Perspectives 1 Energy management models for multiple clouds 2 Cloud
computing management and opportunistic computing 3 Support other autonomic properties 4 Improved mechanisms to support the developers on writing self-adaptive applications Alessandro Ferreira Leite December 2, 2014 74 / 74

User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications
Thank you Questions ?

Scenario 1: execution time of the tasks 20% 40% 60%
80% 100% FAP (DC1 + DC2) Without VM migration CPU utilization threshold Execution time (seconds) 0 500 1000 1500 2000 1654 1360 1644 1320 1659 1296 1644 1298 1644 1290 Alessandro Ferreira Leite December 2, 2014 74 / 74

Scenario 1: number of virtual machine (VM) migrated from DC1
to DC2 20% 40% 60% 80% 100% CPU utilization threshold Number of VM migrations 0 50 100 150 200 186 172 168 152 148 Alessandro Ferreira Leite December 2, 2014 74 / 74

Scenario 2: execution time of the tasks 20% 40% 60%
80% 100% FAP (DC1 + DC2) Without VM migration CPU utilization threshold Execution time (seconds) 0 500 1000 1500 2000 1716 1360 1664 1320 1654 1296 1672 1298 1679 1290 Alessandro Ferreira Leite December 2, 2014 74 / 74

Scenario 2: number of VM migrated between the data centers
20% 40% 60% 80% 100% DC1 DC2 Without VM migration CPU utilization threshold Power consumption (kWh) 0 2 4 6 8 10 12 2.5 2.9 9.1 9 9.1 9.1 9.2 2.5 2.9 2.5 2.9 2.6 2.9 2.6 2.9 Alessandro Ferreira Leite December 2, 2014 74 / 74

Scenario 2: average SLA 20% 40% 60% 80% 100% FAP
(DC1 + DC2) Without VM migration CPU utilization threshold Average SLA violation (%) 0 20 40 60 80 100 31.6 51.4 31.5 49 31.6 44.5 31.4 43.9 31.4 41.2 Alessandro Ferreira Leite December 2, 2014 74 / 74

[1] Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A.
F. De Rose, and Rajkumar Buyya. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1):23–50, 2011. [2] Zhibo Cao and Shoubin Dong. An energy-aware heuristic framework for virtual machine consolidation in cloud computing. The Journal of Supercomputing, 69(1):429–451, 2014. [3] Yee Ming Chen and Hsin-Mei Yeh. An implementation of the multiagent system for market-based cloud resource allocation. Journal of Computing, 2(11):27–33, 2010. [4] Trieu C. Chieu, Ajay Mohindra, Alexei Karve, and Alla Segal. Solution-based deployment of complex application services on a cloud. In IEEE International Conference on Service Operations and Logistics and Informatics, pages 282–287, 2010. [5] Rajarshi Das, Jeﬀrey O. Kephart, Charles Lefurgy, Gerald Tesauro, David W. Levine, and Hoi Chan. Autonomic multi-agent management of power and performance in data centers. In AAMAS, pages 107–114, 2008. [6] Corentin Dupont, Thomas Schulze, Giovanni Giuliani, Andrey Somov, and Fabien Hermenier. An energy aware framework for virtual machine placement in cloud federated data centres. In 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet, pages 4:1–4:10, 2012. [7] Jesús García-Galán, Omer Rana, Pablo Trinidad, and Antonio Ruiz-Cortés. Migrating to the cloud: a software product line based analysis. In 3rd International Conference on Cloud Computing and Services Science, pages 416–426, 2013. [8] Fabien Hermenier, Xavier Lorca, Jean-Marc Menaud, Gilles Muller, and Julia Lawall. Entropy: A consolidation manager for clusters. In ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 41–50, 2009. [9] Kyong Hoon Kim, Wan Yeon Lee, Jong Kim, and Rajkumar Buyya. SLA-based scheduling of bag-of-tasks applications on power-aware cluster systems. Transactions on Information and Systems, E93-D(12):3194–3201, 2010. [10] Alexander V. Konstantinou, Tamar Eilam, Michael Kalantar, Alexander A. Totok, William Arnold, and Edward Snible. An architecture for virtual solution composition and deployment in infrastructure clouds. In 3rd International Workshop on Virtualization Technologies in Distributed Computing, pages 9–18, 2009. [11] Young Choon Lee and Albert Y. Zomaya. Energy eﬃcient utilization of resources in cloud computing systems. The Journal of Supercomputing, pages 1–13, 2010. [12] R. Mietzner, A. Metzger, F. Leymann, and K. Pohl. Variability modeling to support customization and deployment of multi-tenant-aware software as a service applications. In Workshop on Principles of Engineering Service Oriented Systems, pages 18–25, 2009. Alessandro Ferreira Leite December 2, 2014 74 / 74

[13] Stefan T. Ruehl and Urs Andelfinger. Applying software product
lines to create customizable software-as-a-service applications. In 15th International Software Product Line Conference, pages 16:1–16:4, 2011. [14] Stefan T. Ruehl, Urs Andelfinger, Andreas Rausch, and Stephan A. W. Verclas. Toward realization of deployment variability for software-as-a-service applications. In IEEE 5th International Conference on Cloud Computing, pages 622–629, 2012. [15] Julia Schroeter, Peter Mucha, Marcel Muth, Kay Jugel, and Malte Lochau. Dynamic configuration management of cloud-based applications. In 16th International Software Product Line Conference, pages 171–178, 2012. [16] Le Nhan Tam, Gerson Sunyé, and Jean-Marc Jezequel. A model-based approach for optimizing power consumption of IaaS. In 2nd Symposium on Network Cloud Computing and Applications, pages 31–39, 2012. [17] Le Nhan Tam, Gerson Sunyé, and Jean-Marc Jézéquel. A model-driven approach for virtual machine image provisioning in cloud computing. In 1st European Conference on Service-Oriented and Cloud Computing, pages 107–121, 2012. [18] Akshat Verma, Puneet Ahuja, and Anindya Neogi. pmapper: Power and migration cost aware application placement in virtualized systems. In ACM/IFIP/USENIX International Conference on Middleware, pages 243–264, 2008. [19] Stefan Walraven, Dimitri Van Landuyt, Eddy Truyen, Koen Handekyn, and Wouter Joosen. Efficient customization of multi-tenant software-as-a-service applications with service lines. Journal of Systems and Software, 91:48–62, 2014. [20] Tianle Zhang, Zhihui Du, Yinong Chen, Xiang Ji, and Xiaoying Wang. Typical virtual appliances: An optimized mechanism for virtual appliances provisioning and management. Journal of Systems and Software, 84(3):377–387, 2011. [21] Liang Zhou, Baoyu Zheng, Jingwu Cui, and Sulan Tang. Toward green service in cloud: From the perspective of scheduling. In International Conference on Computing, Networking and Communications, pages 939–943, 2012. Alessandro Ferreira Leite December 2, 2014 74 / 74

A User-Centered and Autonomic Multi-Cloud Archi...

A User-Centered and Autonomic Multi-Cloud Architecture for High Performance Computing Applications

More Decks by Alessandro Leite (PhD)

Other Decks in Education

Featured

Transcript