2003: Machines That Think! (for Brocade Comm.)

1 Machines That Think! Tom Lyon For Brocade Communications 9/24/2003

2 What’s new in computing?  “The thing that hath
been, it is that which shall be; and that which is done is that which shall be done: and there is no new thing under the sun.”  Ecclesiastes 1:9

3 Nothing New…  Alan Turing, 1936 – what can
be done  “On the Computability of Numbers”  Vannevar Bush, 1945 – what should be done  “As We May Think”  John von Neumann, 1945 – how to build it  “First Draft of a Report to the EDVAC”

4 Then & Now  Sequential Tape  Swap to
Drives  I/O Bus  Fast DRAM  SMP systems  Software is cheap  Memory is cheap  Sequential Hard Drives  Swap to DRAM  Network  Fast Cache  SMP chips  Hardware is cheap  Memory is least cheap

5 The Last Hardware Problem: Latency  Speed of light
~ 1ft/ns  Speed of DRAM ~70 ns  Hard drive rotational latency – 2-16ms

6 Processors  Xeon DP (no L3) = 55M transistors
 Xeon MP (2MB L3) = 108M transistors  Intel 1B transistor process – 20nm, 80 atoms

7 Power  Desktop & Server – ignore power, go
for performance  Pentium, Itanium, Opteron, PPC 970  Embedded – balanced power & performance  MIPS  Handheld – power is paramount  ARM – Xscale, OMAP, …  AMD Alchemy  PPC 405LP

8 64 bit  AMD Opteron – x86-64  Intel
Itanium  IBM/Apple PowerPC 970  Ultrasparc  MIPS64 – Broadcom, PMC-Sierra  SuperH – SH-5

9 Deep vs Wide  ILP – instruction level parallelism
 Deep pipelining, speculative execution – more ops/clock  CLP (VLIW) – compiler level parallelism  More ops/inst – Itanium, Transmeta  TLP – thread level parallelism  Intel Hyperthreading, Sun Niagara  OLP – OS level parallelism  Vanderboot,VMWare  NLP – network level - clusters

10 Beyond Price/Performance  What matters when hardware is free?
 Density, Cooling, Cabling  Network Computing  Virtualization  Consolidation  Autonomic Computing  Recovery Oriented Computing

11 Blade Servers  Address Density, Power, & Cabling problems
in the Datacenter  Blades & Clusters should be made for each other  Choose best p/p processors and replicate  But, too much profit at stake in servers  Exotic processors are where the $$s are

12 Network Computing  Clusters – Oracle, Top500, J2EE 
Performance, availability, or both?  Homogeneous, Local Area  Web Services – XML everywhere  Grid Computing  How many supercomputers can one scientist use?  Heterogeneous, Wide Area  Utility Computing  Pay by the play

13 Virtualization “Any problem in Computer Science can be solved
with another level of indirection” - Butler Lampson

14 Virtualization Seeing what is not there  Virtual Drives
 RAID, LUN Mapping, etc.  Networks  VLANs, VPNs, proxies, etc.  Processors  VMware, HyperThreading, Vanderpool  Software emulations – VirtualPC, etc.

15 The Management Challenge  Virtualization creates new management problems
 At least doubles the number of managed objects  Creates new security risks, …

16 Consolidation  Cost control rules today  Systems have
greater capacity  IT always wants fewer things to manage  Easily managed systems never seem to appear  Better networking enables central services

17 Autonomic Computing

18 Aaron Brown, Dan Hettenna, David Oppenheimer, Noah Treuhaft, Leonard
Chung, Patty Enriquez, Susan Housand, Archana Ganapathi, Dan Patterson, Jon Kuroda, Mike Howard, Matthew Mertzbacher, Dave Patterson, and Kathy Yelick University of California at Berkeley In cooperation with George Candea, James Cutler, and Armando Fox Stanford University Recovery-Oriented Computing

19 ROC: goals and assumptions of last 15 years 
Goal #1: Improve performance  Goal #2: Improve performance  Goal #3: Improve cost-performance  Assumptions  Humans are perfect (they don’t make mistakes during installation, wiring, upgrade, maintenance or repair)  Software will eventually be bug free (good programmers write bug-free code, debugging works)  Hardware MTBF is already very large (~100 years between failures), and will continue to increase

20 Today, after 15 years of improving performance  Availability
is now the vital metric for servers  near-100% availability is becoming mandatory for e- commerce, enterprise apps, online services, ISPs  But service outages are frequent  65% of IT managers report that their websites were unavailable to customers over a 6-month period  25%: 3 or more outages  Outage costs are high  social effects: negative press, loss of customers who “click over” to competitor  $500,000 to $5,000,000 per hour in lost revenues

21 New goals: ACME  Availability: failures are common 
Traditional fault-tolerance doesn’t solve the problems  Change  In back-end system tiers, software upgrades difficult, failure-prone, or ignored  For application service over WWW, daily change  Maintainability  human operator error is single largest failure source?  system maintenance environments are unforgiving  Evolutionary growth  1U-PC cluster front-ends scale, evolve well  back-end scalability still limited

22 Recovery-Oriented Computing Philosophy • Failures are a fact, and
recovery/repair is how we cope with them • Improving recovery/repair improves availability – UnAvailability = MTTR / MTTF – 1/10th MTTR just as valuable as 10X MTBF • If major Sys Admin job is recovery after failure, ROC also helps with sys admin

23 R.O.C. http://roc.CS.Berkeley.EDU/

24 Software Complexity  It’s in software because it couldn’t
be specified precisely  Too many ways to write buggy code  Interfaces (APIs, ABIs, Protocols, …) are not stable  Co-resident software modules interact in weird ways – combinatorial explosion in testing

25 Leaning Tower of APIs  APIs are rarely stable
or upwards compatible  Any app uses lots of APIs  APIs capture developers by keeping them busy!  APIs are meaningless to any except developers

26 Dedicated Servers  Each major application dictates the exact
versions & patches to underlying middleware & OS  Therefore, each app requires its own OS image  Each OS requires own server

27 Appliances  Traditionally, the end-user or VAR integrates hardware,
middleware, application  If this is down by the product vendor, you have an appliance  Appliance hardware can be configured for best value for application

28 Linux Rising  Linux == Recall Windows!  Unfortunately,
136 candidates on the ballot:  RedHat, Suse, Debian, Mandrake, Montavista…  “Write once, run anywhere” is truer with Linux than Java, but applies at the source level  Leaning tower of APIs can be “patched”

29 Closer Than Ever… Machines That Think!

2003: Machines That Think! (for Brocade Comm.)

2003: Machines That Think! (for Brocade Comm.)

Tom Lyon

More Decks by Tom Lyon

Other Decks in Technology

Featured

Transcript

1 Machines That Think! Tom Lyon For Brocade Communications 9/24/2003

2 What’s new in computing?  “The thing that hath

3 Nothing New…  Alan Turing, 1936 – what can

4 Then & Now  Sequential Tape  Swap to

5 The Last Hardware Problem: Latency  Speed of light

6 Processors  Xeon DP (no L3) = 55M transistors

7 Power  Desktop & Server – ignore power, go

8 64 bit  AMD Opteron – x86-64  Intel

9 Deep vs Wide  ILP – instruction level parallelism

10 Beyond Price/Performance  What matters when hardware is free?

11 Blade Servers  Address Density, Power, & Cabling problems

12 Network Computing  Clusters – Oracle, Top500, J2EE 

13 Virtualization “Any problem in Computer Science can be solved

14 Virtualization Seeing what is not there  Virtual Drives

15 The Management Challenge  Virtualization creates new management problems

16 Consolidation  Cost control rules today  Systems have

17 Autonomic Computing

18 Aaron Brown, Dan Hettenna, David Oppenheimer, Noah Treuhaft, Leonard

19 ROC: goals and assumptions of last 15 years 

20 Today, after 15 years of improving performance  Availability

21 New goals: ACME  Availability: failures are common 

22 Recovery-Oriented Computing Philosophy • Failures are a fact, and

23 R.O.C. http://roc.CS.Berkeley.EDU/

24 Software Complexity  It’s in software because it couldn’t

25 Leaning Tower of APIs  APIs are rarely stable

26 Dedicated Servers  Each major application dictates the exact

27 Appliances  Traditionally, the end-user or VAR integrates hardware,

28 Linux Rising  Linux == Recall Windows!  Unfortunately,

29 Closer Than Ever… Machines That Think!