UCC 2013 Keynote Rick McGeer: Distributed Clouds and Software Defined Networking

Slide 1

Slide 1 text

Rick McGeer Chief Scientist, US IGNITE December 9, 2013

Slide 2

Slide 2 text

Distributed Clouds and Software Defined Networking Complementary Technologies for the Next-Generation Internet

Slide 3

Slide 3 text

Or, A Post-Hoc Justification for the Last 10 Years of My Life 3

Slide 4

Slide 4 text

Slide 5

Slide 5 text

5 The Future is Distributed Clouds integrated with Software-Defined- Networks!

Slide 6

Slide 6 text

6 SDN is a set of abstractions over the networking control plane Proxies are an essential element of the Internet Architecture Shouldn’t there be an abstraction architecture for proxies?

Slide 7

Slide 7 text

Network Challenges • Original Concept of the Network: dumb pipe between smart endpoints – Content-agnostic routing – Rates controlled by endpoints – Content- and user-agnostic forwarding • Clean separation of concerns – Routing and forwarding by network elements – Rate control, admission control, security at endpoints

Slide 8

Slide 8 text

Clean separation of concerns doesn’t work very well • Need application-aware stateful forwarding (e.g., multicast) • Need QoS guarantees and network-aware endpoints – For high-QoS applications – For lousy links • Need in-network security and admission control – Endpoint security easily overwhelmed…

Slide 9

Slide 9 text

Some Examples • Load-balanced end-system multicast • Adaptive/DPI-based Intrusion Detection • In-network transcoding to multiple devices • Web and file content distribution networks • Link-sensitive store-and-forward connection-splitting TCP proxies • Email proxies (e.g., MailShadow) • In-network compression engines (Riverbed) • Adaptive firewall • In-situ computation for data reduction from high-bandwidth sensors (e.g., high-resolution cameras)

Slide 10

Slide 10 text

Common Feature • All of these examples require some combination of in-network and endpoint services – Information from the network – Diversion to a proxy – Line-rate packet filtering • All require endpoint processing – Stateful processing – Connection-splitting – Filesystem access • Three central use cases – Optimization of network resources, especially bandwidth – Proximity to user for real-time response – In-situ sensor processing

Slide 11

Slide 11 text

Historic Solution: Middleboxes • Dedicated network appliances to perform specific function • Gets the job done, but… – Appliances proliferate (one or more per task) – Opaque – Interact unpredictably… • Don’t do everything – E.g., generalized in-situ processing engine for data reduction • APST, 2005: “The ability to support…multiple coexisting overlays [of proxies]…becomes the crucial universal piece of the [network] architecture.”

Slide 12

Slide 12 text

OpenFlow and SDN • L2/L3 Technology to permit software-defined control of network forwarding and routing • What it’s not: – On-the-fly software decisions about routing and forwarding – In-network connection-splitting store-and-forward – In-network on-the-fly admission control – In-network content distribution – Magic…. • What it is: – Table-driven routing and forwarding decisions (including drop and multicast) – Callback protocol from a switch to a controller when entry not in table (“what do I do now?”) – Protocol which permits the controller to update the switch

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

In-Network Processing • L4/L7 Services provided by nodes in the network – TCP/Application layer proxies – Stateful/DPI based intrusion detection – Application-layer admission control – Application-layer load-balancing – …. • Key features – Stateful processing – Transport/Application layer information required

Slide 17

Slide 17 text

Middleboxes and the Network • Classic View: Proxies and Middleboxes are a necessary evil that breaks the “end-to-end principle” (Network should be a dumb pipe between endpoints) • Modern View (Peterson): “Proxies play a fundamental role in the Internet architecture: They bridge discontinuities between different regions of the Internet. To be effective, however, proxies need to coordinate and communicate with each other.” • Generalized Modern View (this talk): Proxies and Middleboxes are special cases of a general need: endpoint processing in the network. We need to merge the Cloud and the Network.

Slide 18

Slide 18 text

Going From Today to Tomorrow • Today: Middleboxes • Tomorrow: In-network general-purpose processors fronted by OpenFlow switches • Advantages of Middleboxes – Specialized processing at line rate • Disadvantages of middleboxes – Nonexistent programming environment – Opaque configuration – Vendor-specific updates – Only common functions get done – Interact unpredictably…

Slide 19

Slide 19 text

Anatomy of a Middlebox

Slide 20

Slide 20 text

Generalized Architecture

Slide 21

Slide 21 text

The Future

Slide 22

Slide 22 text

Advantages of the Generalizing and Factoring the Middlebox • Transparent • Open programming environment: Linux + OpenFlow • Much broader range of features and functions • Interactions between middleboxes mediated by OpenFlow rules – Verifiable – Predictable • Updates are software uploads

Slide 23

Slide 23 text

OpenFlow + In Network Processing + Line-rate processing + Largely implementable on COTS switches + Packet handling on a per-flow basis + Rapid rule update + Unified view of the network + L2-L7 services

Slide 24

Slide 24 text

But I Need Proxies Everywhere… • Proxies are needed where I need endpoint processing – In-situ data reduction – Next to users – Where I need filtering • Can’t always predict these in advance for every service! • So I need a small cloud everywhere, so I can instantiate a middlebox anywhere • Solution = Distributed “EC2” + OpenFlow network • “Slice”: Virtual Network of Virtual Machines • OpenFlow creates Virtual Network • “EC2” lets me instantiate VM’s everywhere

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Shenker’s SDN Architecture 26 Specification of a virtual network, with explicit forwarding instructions Translation onto OpenFlow rules on physical network Effectuation on physical network

Slide 27

Slide 27 text

Perfect for L1-L3 27

Slide 28

Slide 28 text

Key Function we want: Add Processing Anywhere in the Virtual Network 28

Slide 29

Slide 29 text

Going from Virtual Network to Virtual Distributed System 29 Specification of a virtual distributed cloud, with explicit forwarding instructions BETWEEN specified VMs Translation onto OpenFlow rules on physical network AND instantiation on physical machines at appropriate sites Effectuation on physical network AND physical clouds

Slide 30

Slide 30 text

Key Points • Federated Clouds can be somewhat heterogeneous – Must support common API – Can have some variants (switch variants still present a common interface through OpenFlow) • DSOS is simply a mixture of three known components: – Network Operating System – Cloud Managers (e.g., ProtoGENI, Eucalytpus, OpenStack) – Tools to interface with Network OS and Cloud Managers (nascent tools under development) 30

Slide 31

Slide 31 text

Implications for OpenFlow/SDN • Southbound API (i.e., OpenFlow): minimal and anticipated in 1.5 – “Support for L4/L7 services”, aka, seamless redirection • Northbound API – Joint allocation of virtual machines and networks – Location-aware allocation of virtual machines – WAN-aware allocation of networks – QoS controls between sites • Build on/extend successful architectures – “Neutron for the WAN” 31

Slide 32

Slide 32 text

Implications for Cloud Architectures • Key problem we’ve rarely considered: how do we easily instantiate and stitch together services at multiple sites/multiple providers? • Multiple sites is easy, multiple providers is not • Need easy way to instantiate from multiple providers – Common AUP/Conventions? Probably – Common form of identity/multiple IDs? Multiple or bottom-up (e.g. Facebook) – Common API? Absolutely • Need to understand what’s important and what isn’t – E.g. very few web services charge for bandwidth 32

Slide 33

Slide 33 text

Initial Attempts • Ignite Technical Architecture/GENI Racks • GENI Mesoscale • SAVI • JGN-X • … 33

Slide 34

Slide 34 text

With Credit To… 34

Slide 35

Slide 35 text

GENI Mesoscale • Nationwide network of small local clouds • Each cloud – 80-150 worker cores – Several TB of disk – OpenFlow-native local switching • Interconnected over OpenFlow-based L2 Network • Local “Aggregate Manager” (aka controller) • Two main designs with common API – InstaGENI (ProtoGENI-based) – ExoGENI (ORCA/OpenStack-based) • Global Allocation through federate aggregate managers • User allocation of networks and slices through tools (GENI portal, Flack) 35

Slide 36

Slide 36 text

GENI And The Distributed Cloud Stack • Physical Resources – GENI Racks, Emulab, GENI backbone • Cloud OS – ProtoGENI, ExoGENI… • Orchestration Layer – GENI Portal, Flack… 36

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Existing ISP connects Layer 2 Ignite Connect (1 GE or 10GE) Layer 3 GENI control plane Layer 2 connect to subscribers Existing head-end New GENI / Ignite rack pair OpenFlow switch(es) Flowvisor Remote management Instrumentation Aggregate manager Measurement Programmable servers Storage Video switch (opt) Home Most equipment not shown U.S. Ignite City Technical Architecture

Slide 39

Slide 39 text

39 GENI Mesoscale Deployment

Slide 40

Slide 40 text

Distributed Clouds and NSFNet: Back to the Future • GENI today is NSFNet circa 1985 • GENI and the SFA: Set of standards (e.g., TCP/IP) • Mesoscale: Equivalent to NSF Backbone • GENIRacks: Hardware/software instantiation of standards that sites can deploy instantly – Equivalent to VAX 11 running Berkeley Unix – InstaGENI cluster running ProtoGENI and OpenFlow • Other instantiations which are interoperable – VNode (Aki Nakao, University of Tokyo and NICT) – Tomato (Dennis Schwerdel, TU-Kaiserslautern)

Slide 41

Slide 41 text

JGN-X (Japan) 41

Slide 42

Slide 42 text

SAVI (Canada) 42

Slide 43

Slide 43 text

Ofelia (EU) 43

Slide 44

Slide 44 text

“Testbeds” vs “Clouds” • JGN-X, GENI, SAVI, Ofelia, GLab, OneLab are all described as “Testbeds” – But they are really Clouds – Tests require realistic services • History of testbeds: – Academic ResearchAcademic/Research servicesCommercial services – Expect similar evolution here (but commercial will come faster) 44

Slide 45

Slide 45 text

Programming Environment for Distributed Clouds • Problem: Allocating and configuring distributed clouds is a pain – Allocate network of VM’s – Build VM’s and deploy images – Deploy and run software • But most slices are mostly the same • Automate commonly-used actions and pre-allocate typical slices • 5-minute rule: Build, deploy, and execute “Hello, World” in five minutes • Decide what to build: start with sample application 45

Slide 46

Slide 46 text

TransGeo: A Model TransCloud Application • Scalable, Ubiquitous Geographic Information System • Open and Public – Anyone can contribute layers – Anyone can host computation • Why GIS? – Large and active community – Characterized by large data sets (mostly satellite images) – Much open-source easily deployable software, standard data formats – Computation naturally partitions and is loosely-coupled – Collaborations across geographic regions and continents – Very pretty… 46

Slide 47

Slide 47 text

TransGeo Architecture 47

Slide 48

Slide 48 text

TransGeo Sites (May 2013) 48

Slide 49

Slide 49 text

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Opening up TransGEO: The GENI Experiment Engine • Key Idea: Genericize and make available the infrastructure behind the TransGEO demo – Open to every GENI, FIRE, JGN-X, Ofelia, SAVI…experimenter who wants to use it • TransGEO is a trivial application on a generic infrastructure – Perhaps 1000 lines of Python code on top of • Key-Value Store • Layer 2 network • Sandboxed Python programming environment • Messaging Service • Deployment Service • GIS Libraries 51

Slide 52

Slide 52 text

GENI Experiment Engine • Permanent, Long-Running, Distributed File System • Permanent, Long-Running, GENI-wide Message Service • Permanent, Long-Running, Distributed Python Environment • Permanent, world-wide Layer-2 VLANs on high-performance networks • All offered in slices • All shared by many experimenters • Model: Google App Engine • Advantage for GENI: Efficient use of resources • Advantage for Experimenters: Up and running in no time 52

Slide 53

Slide 53 text

GENI Experiment Engine Architecture 53

Slide 54

Slide 54 text

Staged Rollout • Permanent Layer-2 Network Spring 2014 • Shared File System based on (Swift) Spring 2014 • Python environment Summer 2014 54

Slide 55

Slide 55 text

Thanks and Credits Joe Mambretti, Fei Yeh, Jim Chen Northwestern/ iCAIR Andy Bavier, Marco Yuen, Larry Peterson, Jude Nelson, Tony Mack PlanetWorks/Princeton Chris Benninger, Chris Matthews, Chris Pearson, Andi Bergen, Paul Demchuk, Yanyan Zhuang, Ron Desmarais, Stephen Tredger, Yvonne Coady, Hausi Muller University of Victoria Heidi Dempsey, Marshall Brinn, Vic Thomas, Niky Riga, Mark Berman, Chip Elliott BBN/GPO Rob Ricci, Leigh Stoller, Gary Wong University of Utah Glenn Ricart, William Wallace, Joe Konstan US Ignite Paul Muller, Dennis Schwerdel TU-Kaiserslautern Amin Vahdat, Alvin AuYoung, Alex Snoeren, Tom DeFanti UCSD 55

Slide 56

Slide 56 text

Thanks and Credits Nick Bastin Barnstormer Softworks Shannon Champion Matrix Integration Jessica Blaine, Jack Brassil, Kevin Lai, Narayan Krishnan, Dejan Milojicic, Norm Jouppi, Patrick Scaglia, Nicki Watts, Michaela Mezo, Bill Burns, Larry Singer, Rob Courtney, Randy Anderson, Sujata Banerjee, Charles Clark HP Aki Nakao University of Tokyo 56

Slide 57

Slide 57 text

Conclusions • Distributed Clouds are nothing new… – Akamai was basically the first Distributed Cloud – Single Application, now generalizing • But this is OK… – Web simply wrapped existing services • Now in vogue with telcos (“Network Function Virtualization”) • What’s new/different in GENI/JGN-X/SAVI/Ofelia…. – Support from programmable networks – “Last frontier” for software in systems • Open Problems – Siting VMs! – Complex network/compute/storage optimization problems • Needs – “http”-like standardization of APIs at IaaS, PaaS layers 57

Slide 58

Slide 58 text

Links http://www.youtube.com/watch?v=eXsCQdshMr4 http://pages.cs.wisc.edu/~akella/CS838/F09/838- Papers/APST05.pdf http://citeseerx.ist.psu.edu/viewdoc/download?d oi=10.1.1.20.123&rep=rep1&type=pdf

Slide 59

Slide 59 text

No content