Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2003: Machines That Think! (for Brocade Comm.)

Tom Lyon
September 24, 2003

2003: Machines That Think! (for Brocade Comm.)

Vision/State of the World talk about Computers/Servers for Brocade Communications

Tom Lyon

September 24, 2003
Tweet

More Decks by Tom Lyon

Other Decks in Technology

Transcript

  1. 2 What’s new in computing?  “The thing that hath

    been, it is that which shall be; and that which is done is that which shall be done: and there is no new thing under the sun.”  Ecclesiastes 1:9
  2. 3 Nothing New…  Alan Turing, 1936 – what can

    be done  “On the Computability of Numbers”  Vannevar Bush, 1945 – what should be done  “As We May Think”  John von Neumann, 1945 – how to build it  “First Draft of a Report to the EDVAC”
  3. 4 Then & Now  Sequential Tape  Swap to

    Drives  I/O Bus  Fast DRAM  SMP systems  Software is cheap  Memory is cheap  Sequential Hard Drives  Swap to DRAM  Network  Fast Cache  SMP chips  Hardware is cheap  Memory is least cheap
  4. 5 The Last Hardware Problem: Latency  Speed of light

    ~ 1ft/ns  Speed of DRAM ~70 ns  Hard drive rotational latency – 2-16ms
  5. 6 Processors  Xeon DP (no L3) = 55M transistors

     Xeon MP (2MB L3) = 108M transistors  Intel 1B transistor process – 20nm, 80 atoms
  6. 7 Power  Desktop & Server – ignore power, go

    for performance  Pentium, Itanium, Opteron, PPC 970  Embedded – balanced power & performance  MIPS  Handheld – power is paramount  ARM – Xscale, OMAP, …  AMD Alchemy  PPC 405LP
  7. 8 64 bit  AMD Opteron – x86-64  Intel

    Itanium  IBM/Apple PowerPC 970  Ultrasparc  MIPS64 – Broadcom, PMC-Sierra  SuperH – SH-5
  8. 9 Deep vs Wide  ILP – instruction level parallelism

     Deep pipelining, speculative execution – more ops/clock  CLP (VLIW) – compiler level parallelism  More ops/inst – Itanium, Transmeta  TLP – thread level parallelism  Intel Hyperthreading, Sun Niagara  OLP – OS level parallelism  Vanderboot,VMWare  NLP – network level - clusters
  9. 10 Beyond Price/Performance  What matters when hardware is free?

     Density, Cooling, Cabling  Network Computing  Virtualization  Consolidation  Autonomic Computing  Recovery Oriented Computing
  10. 11 Blade Servers  Address Density, Power, & Cabling problems

    in the Datacenter  Blades & Clusters should be made for each other  Choose best p/p processors and replicate  But, too much profit at stake in servers  Exotic processors are where the $$s are
  11. 12 Network Computing  Clusters – Oracle, Top500, J2EE 

    Performance, availability, or both?  Homogeneous, Local Area  Web Services – XML everywhere  Grid Computing  How many supercomputers can one scientist use?  Heterogeneous, Wide Area  Utility Computing  Pay by the play
  12. 13 Virtualization “Any problem in Computer Science can be solved

    with another level of indirection” - Butler Lampson
  13. 14 Virtualization Seeing what is not there  Virtual Drives

     RAID, LUN Mapping, etc.  Networks  VLANs, VPNs, proxies, etc.  Processors  VMware, HyperThreading, Vanderpool  Software emulations – VirtualPC, etc.
  14. 15 The Management Challenge  Virtualization creates new management problems

     At least doubles the number of managed objects  Creates new security risks, …
  15. 16 Consolidation  Cost control rules today  Systems have

    greater capacity  IT always wants fewer things to manage  Easily managed systems never seem to appear  Better networking enables central services
  16. 18 Aaron Brown, Dan Hettenna, David Oppenheimer, Noah Treuhaft, Leonard

    Chung, Patty Enriquez, Susan Housand, Archana Ganapathi, Dan Patterson, Jon Kuroda, Mike Howard, Matthew Mertzbacher, Dave Patterson, and Kathy Yelick University of California at Berkeley In cooperation with George Candea, James Cutler, and Armando Fox Stanford University Recovery-Oriented Computing
  17. 19 ROC: goals and assumptions of last 15 years 

    Goal #1: Improve performance  Goal #2: Improve performance  Goal #3: Improve cost-performance  Assumptions  Humans are perfect (they don’t make mistakes during installation, wiring, upgrade, maintenance or repair)  Software will eventually be bug free (good programmers write bug-free code, debugging works)  Hardware MTBF is already very large (~100 years between failures), and will continue to increase
  18. 20 Today, after 15 years of improving performance  Availability

    is now the vital metric for servers  near-100% availability is becoming mandatory for e- commerce, enterprise apps, online services, ISPs  But service outages are frequent  65% of IT managers report that their websites were unavailable to customers over a 6-month period  25%: 3 or more outages  Outage costs are high  social effects: negative press, loss of customers who “click over” to competitor  $500,000 to $5,000,000 per hour in lost revenues
  19. 21 New goals: ACME  Availability: failures are common 

    Traditional fault-tolerance doesn’t solve the problems  Change  In back-end system tiers, software upgrades difficult, failure-prone, or ignored  For application service over WWW, daily change  Maintainability  human operator error is single largest failure source?  system maintenance environments are unforgiving  Evolutionary growth  1U-PC cluster front-ends scale, evolve well  back-end scalability still limited
  20. 22 Recovery-Oriented Computing Philosophy • Failures are a fact, and

    recovery/repair is how we cope with them • Improving recovery/repair improves availability – UnAvailability = MTTR / MTTF – 1/10th MTTR just as valuable as 10X MTBF • If major Sys Admin job is recovery after failure, ROC also helps with sys admin
  21. 24 Software Complexity  It’s in software because it couldn’t

    be specified precisely  Too many ways to write buggy code  Interfaces (APIs, ABIs, Protocols, …) are not stable  Co-resident software modules interact in weird ways – combinatorial explosion in testing
  22. 25 Leaning Tower of APIs  APIs are rarely stable

    or upwards compatible  Any app uses lots of APIs  APIs capture developers by keeping them busy!  APIs are meaningless to any except developers
  23. 26 Dedicated Servers  Each major application dictates the exact

    versions & patches to underlying middleware & OS  Therefore, each app requires its own OS image  Each OS requires own server
  24. 27 Appliances  Traditionally, the end-user or VAR integrates hardware,

    middleware, application  If this is down by the product vendor, you have an appliance  Appliance hardware can be configured for best value for application
  25. 28 Linux Rising  Linux == Recall Windows!  Unfortunately,

    136 candidates on the ballot:  RedHat, Suse, Debian, Mandrake, Montavista…  “Write once, run anywhere” is truer with Linux than Java, but applies at the source level  Leaning tower of APIs can be “patched”