Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SC03 Sun Microsystems Keynote

Adrian Cockcroft
November 19, 2022
110

SC03 Sun Microsystems Keynote

I wrote and presented this deck at Supercomputing 2003 in Phoenix AZ when I was the Chief Architect for the High Performance Technical Computing team at Sun. Since then some things have changed, but also a lot has stayed the same...

Adrian Cockcroft

November 19, 2022
Tweet

Transcript

  1. Adrian Cockcroft HPTC Chief Architect Sun Microsystems Inc. 11/24/03 High

    Performance Technical Computing Solutions “Grid Everywhere” www.sun.com/hptc
  2. Challenges • Accelerate innovation • Accelerate competitive advantage – For

    Sun – For customers • Accelerate research to solution • Accelerate solution to mainstream – Reproducible results – Lower delivered cost Not just technology challenges . . . Organizational challenges . . .
  3. HPTC Go To Market Program: “Grid Everywhere” Campaign Sun-Wide Alignment

    and Focus Sun-Wide Alignment and Focus Demand Creation Demand Creation Marketing Program Marketing Program Solutions Solutions Development Development Stay tuned for “Grid Everywhere” – rolling out during FY04.....
  4. HPTC Technical Strategy Make it easier to sell and use

    HPTC solutions Hardware – Open, commodity, flexible, scalable Software – Linux nodes, Solaris infrastructure Grid - Evolve from Enterprise Grid to Global Grid Interconnect – Map out many alternatives Developer – added focus on Java Web/Grid Services Early Adopter - streamlined solutions development DARPA HPCS - internal mindshare and funding Leverage Sun and Partner Products into Solutions
  5. HPTC As an Early Adopter Market Willingness to engage to

    solve problems Characteristics Technically advanced end users and developers Deep understanding of technology Ability to figure out solutions to problems Ability to optimize applications to the system Sales and support interactions need very experienced Sun staff Requirement to partner and co-develop
  6. Early Adopter Example: Interval Math Coordinate research, partner and take

    it to market Interval Components Sun Fortran has Interval Datatype Solver libraries are under development Interval algorithms exist The “Interval Problem” The textbooks say you can't solve nonlinear optimization problems – true for point solutions only!! Hard to convince/explain what this means....
  7. Hill Climb Optimization Example Start by looking at the conventional

    optimization method for an over-simplified one-dimensional example 0 1 2 3 4 5 6 7 8 9 10 11 Hill 0 1 2 3 4 5 6 7 8 9 10 11 Sample the Hill
  8. Hill Climb Optimization Example Finds a high point on the

    hill, but not always the highest using a naiive algorithm 0 1 2 3 4 5 6 7 8 9 10 11 Narrow to required resolution 0 1 2 3 4 5 6 7 8 9 10 11 Evaluate either side and walk to top of hill
  9. Hill Climb Optimization Example Try the same algorithm on a

    more difficult dataset, with very narrow and specific optimal solutions – it's no longer a robust solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Telegraph poles on the prarie 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Sample Prarie
  10. Hill Climb Optimization Example Zoom in on a local high

    point, but it doesn't find a good solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Narrow to required resolution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Evaluate either side and walk to top of hill
  11. Interval Optimization Example Interval solver tells you the range of

    the possible solution over an interval, here we split the data into four intervals and evaluate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Telegraph poles on the prarie 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Prarie Interval ranges
  12. Interval Optimization Example Keep zooming in until the result is

    a small enough interval, and you end up with a deterministic and correctly bounded solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Zoom in on Maximum 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Interval Driven True Solution The Answer!
  13. Architecture Architecture It's not just the technology.... It's not just

    the technology.... It's how you put it together.... It's how you put it together....
  14. Workload Characteristics Workloads are vastly different, which drives different solutions

    High Performance for Technical Streaming, low variance, high utilization Measure peak CPU Gflops, SPECfp, Gbytes/sec Efficient, runs at the peak capacity of the system High Performance for Commercial More transactions, faster sub-second response Bursty, queueing effects, high variance, low utilization CPU bound benchmarks (e.g. TPC) not relevant to real world Mostly I/O latency bound on disk or network Very inefficient use of system capacity 100% 100% Technical Commercial
  15. Capability and Capacity Computing Proc Memory Switch Proc Mem I/O

    Mem I/O Proc Network Switch Proc Mem I/O Mem I/O Proc Mem I/O Cache-coherent shared- memory multi-processors (SMP) • Tightly-coupled: highest bandwidth, lowest latency • Large, workloads: ad-hoc transaction processing, data warehousing • Shared pool to 100 processors • Single Terabyte scale memory Cluster multi-processor • Loosely coupled • Standard H/W & S/W • Highly parallel (web, some HPTC) Scale Vertically (Capability) Single OS Instance Multiple OS Instances Scale Horizontally (Capacity) Cluster Mgmt.
  16. Vertical vs. Horizontal Workloads Scale Vertically • Commercial Workloads —

    Large databases — Transactional databases — Data warehouses • HPTC Workloads — Climate modeling — Data mining — Signal Processing — Cryptanalysis — Nuclear simulation — Some structural analysis — EDA full assembly simulation Scale Horizontally • Commercial Workloads — Web servers, Firewalls — Proxy servers, Directories — SSL, VPN — Media streaming — XML processing • HPTC Workloads — Seismic analysis — Genomics — Computational Fluid Dynamics — EDA sub-assembly simulation — Some Structural Analysis — Crash Testing
  17. Workload Performance Factors • Processor speed, capacity and throughput •

    Memory capacity • System interconnect latency & bandwidth • Network and storage I/O • Operating system scalability • Visualization performance and quality • Optimized applications • Network service availability #1 issue for real world cluster performance and scaling
  18. Grid Solutions Overview Visualization Storage Integration Global Grid Services &

    Portal Compute Grid Data Grid Visual Grid Applications & Users
  19. Sun Grid Engine Portal & Sun ONE Portal Server Storage

    Systems Desktops and Information Appliances Solaris/Linux Operating Environment N1 Sun Mgmnt Center Sun Control Station Sun Grid Engine Sun ONE Developer Studio Sun HPC Cluster Tools Sun Grid Services Environment Web User Interface Throughput and HPC Clusters, Enterprise Servers Global Grid Layer SysAdmin Tools Distributed Workload Management Development Tools Sun ONE Web Services Globus/Avaki OGSA
  20. Data Grid Sun StorEdge™ Sun StorEdge™ Performance Suite Performance Suite

    Sun™ Cluster Sun™ Cluster Heterogeneous Heterogeneous Client Client Sun StorEdge™ Sun StorEdge™ Utilization Suite Utilization Suite Sun StorEdge™ 3900 Series Sun StorEdge™ 3900 Series Sun StorEdge QFS Shared File Systems Sun StorEdge QFS Shared File Systems Solaris Linux IRIX AIX Solaris Win2K, NT HP-UX Future Achieved 3 GB/Sec! HPC SAN HPC SAN Professional Professional Services Services
  21. Graphics Grid: Access for More Users to Visualization Services at

    Required Visual Quality and Performance Levels Storage Storage Compute Compute Display Display Clients Clients Visualization Visualization SAN/ NAS Graphics InterConnect Digital Video Delivery Compute Cluster Compute Cluster Visualization Services Over LAN/WAN
  22. Sun Fire Link 4.8 GB/s 4 µs latency Interconnect Components

    Scale Vertically (Capability) or Scale Horizontally (Capacity)? GBE 100 MB/s 100µs latency • Parallel applications: OpenMP • Large Shared Memory • Top Performance • Higher acquisition cost • Lower development and management complexity • Serial and parallel applications: MPI • Throughput • Lower acquisition cost • Higher development and management complexity Myrinet 400 MB/s 4 µs latency Infiniband 800 MB/s 8 µs latency V480 V210 V60X SF4800 V1280 V880 V480 SF15K SF12K SF6800 Interdependent Threads Cluster Performance The Deciding Factor What do the workloads require?
  23. CMT On-Chip 100 – x00 GB/s 0.1 - 0.01 µs

    $xxx? Interconnect Components Mapping Out Bandwidth and Latency Ethernet 0.1 GB/s 100 - 10µs $xxx Myrinet/IB/FL 0.4 – 4.8 GB/s 10 - 1 µs $x,xxx Memory 9.6 - 57 GB/s 1 - 0.1 µs $xx,xxx 0.1 1 10 100 1000 Gigabytes/sec Bandwidth (on logarithmic scale) Latency (inverted log scale) 10ns 100ns 1us 10us 100us Proximity System Call Library Call Load/Store Instruction
  24. A Complete Compute Platform Solution Proven and Repeatable Reference Architectures

    Servers Workstations Control Network (Gigabit Ethernet) Data Network (Gigabit Ethernet) Sun StorEdge storage solutions (Direct-attached, NAS, HA-NFS, HPTC SAN) Sun ONE Grid Engine Sun Compute Grid rack systems Sun Cluster Grid Manager
  25. Infrastructure Partners • Grid partners – Altair Engineering – Avaki

    – Engineous – Globus – GridIron – GridXpert – Meiosys – Platform • Interconnect partners – Force10 – Infinicon – Mellanox – Myrinet – Topspin • Data grid partners – Instrumental – Precision I/O – Qlogic
  26. ISV Application Partners • Energy – CGG – Landmark –

    Paradigm Geotechnology – Schlumberger/Geoquest • Life Sciences – Accelrys – Gene Logic – MDL – Oracle – Spotfire • Manufacturing – Ansys – Computational Dynamics – ESI – Fluent – LSTC – MSC.Software • Visualization / Analysis – AVS – CEI – EDS – ICEM – Multigen/Paradigm – VNI
  27. A Complete Solution • Focused on key business and technical

    challenges • Based on proven and repeatable reference architectures – Optimized for specific industries and applications • Adoption and deployment assistance at all levels of Grid Computing – Cluster and enterprise grids up to global grids Hardware, Software, and Services Services Services Achitecture,assesment, Achitecture,assesment, implementation, and training implementation, and training Grid Computing Grid Computing Reference Architectures Reference Architectures Proven and repeatable methodologies Proven and repeatable methodologies Hardware Hardware Workstations, servers, Workstations, servers, complete rack systems, complete rack systems, and storage and storage Software Software Sun Sun ONE Grid Engine Family, ONE Grid Engine Family, Sun Sun Control Station, Control Station, Grid Engine Portal Grid Engine Portal Sun-Certified Grid Computing Partners Sun-Certified Grid Computing Partners
  28. Execution Strategy Clear ambitious vision Identify and push back on

    obstacles Efficient engagement across Sun Use SunShot, SunCAP, SunSigma toolsets Clear scope and ownership HPTC "owns" technical markets, Grid, and early adopter commercial markets for Sun
  29. Call to Action • Take advantage of Sun's new focus

    on Grid to build partnerships between early adopters and HPTC • Learn to speak the new languages of Grid, e.g. Java based Open Grid Services Architecture (OGSA) and Globus, Avaki and other Web Services • Feedback: Tell Sun's HPTC group what works, what doesn't, where the opportunities lie...