I wrote and presented this deck at Supercomputing 2003 in Phoenix AZ when I was the Chief Architect for the High Performance Technical Computing team at Sun. Since then some things have changed, but also a lot has stayed the same...
HPTC Go To Market Program: “Grid Everywhere” Campaign Sun-Wide Alignment and Focus Sun-Wide Alignment and Focus Demand Creation Demand Creation Marketing Program Marketing Program Solutions Solutions Development Development Stay tuned for “Grid Everywhere” – rolling out during FY04.....
HPTC Technical Strategy Make it easier to sell and use HPTC solutions Hardware – Open, commodity, flexible, scalable Software – Linux nodes, Solaris infrastructure Grid - Evolve from Enterprise Grid to Global Grid Interconnect – Map out many alternatives Developer – added focus on Java Web/Grid Services Early Adopter - streamlined solutions development DARPA HPCS - internal mindshare and funding Leverage Sun and Partner Products into Solutions
HPTC As an Early Adopter Market Willingness to engage to solve problems Characteristics Technically advanced end users and developers Deep understanding of technology Ability to figure out solutions to problems Ability to optimize applications to the system Sales and support interactions need very experienced Sun staff Requirement to partner and co-develop
Early Adopter Example: Interval Math Coordinate research, partner and take it to market Interval Components Sun Fortran has Interval Datatype Solver libraries are under development Interval algorithms exist The “Interval Problem” The textbooks say you can't solve nonlinear optimization problems – true for point solutions only!! Hard to convince/explain what this means....
Hill Climb Optimization Example Start by looking at the conventional optimization method for an over-simplified one-dimensional example 0 1 2 3 4 5 6 7 8 9 10 11 Hill 0 1 2 3 4 5 6 7 8 9 10 11 Sample the Hill
Hill Climb Optimization Example Finds a high point on the hill, but not always the highest using a naiive algorithm 0 1 2 3 4 5 6 7 8 9 10 11 Narrow to required resolution 0 1 2 3 4 5 6 7 8 9 10 11 Evaluate either side and walk to top of hill
Hill Climb Optimization Example Try the same algorithm on a more difficult dataset, with very narrow and specific optimal solutions – it's no longer a robust solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Telegraph poles on the prarie 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Sample Prarie
Hill Climb Optimization Example Zoom in on a local high point, but it doesn't find a good solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Narrow to required resolution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Evaluate either side and walk to top of hill
Interval Optimization Example Interval solver tells you the range of the possible solution over an interval, here we split the data into four intervals and evaluate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Telegraph poles on the prarie 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Prarie Interval ranges
Interval Optimization Example Keep zooming in until the result is a small enough interval, and you end up with a deterministic and correctly bounded solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Zoom in on Maximum 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Interval Driven True Solution The Answer!
Architecture Architecture It's not just the technology.... It's not just the technology.... It's how you put it together.... It's how you put it together....
Workload Characteristics Workloads are vastly different, which drives different solutions High Performance for Technical Streaming, low variance, high utilization Measure peak CPU Gflops, SPECfp, Gbytes/sec Efficient, runs at the peak capacity of the system High Performance for Commercial More transactions, faster sub-second response Bursty, queueing effects, high variance, low utilization CPU bound benchmarks (e.g. TPC) not relevant to real world Mostly I/O latency bound on disk or network Very inefficient use of system capacity 100% 100% Technical Commercial
Workload Performance Factors ● Processor speed, capacity and throughput ● Memory capacity ● System interconnect latency & bandwidth ● Network and storage I/O ● Operating system scalability ● Visualization performance and quality ● Optimized applications ● Network service availability #1 issue for real world cluster performance and scaling
Sun Grid Engine Portal & Sun ONE Portal Server Storage Systems Desktops and Information Appliances Solaris/Linux Operating Environment N1 Sun Mgmnt Center Sun Control Station Sun Grid Engine Sun ONE Developer Studio Sun HPC Cluster Tools Sun Grid Services Environment Web User Interface Throughput and HPC Clusters, Enterprise Servers Global Grid Layer SysAdmin Tools Distributed Workload Management Development Tools Sun ONE Web Services Globus/Avaki OGSA
Data Grid Sun StorEdge™ Sun StorEdge™ Performance Suite Performance Suite Sun™ Cluster Sun™ Cluster Heterogeneous Heterogeneous Client Client Sun StorEdge™ Sun StorEdge™ Utilization Suite Utilization Suite Sun StorEdge™ 3900 Series Sun StorEdge™ 3900 Series Sun StorEdge QFS Shared File Systems Sun StorEdge QFS Shared File Systems Solaris Linux IRIX AIX Solaris Win2K, NT HP-UX Future Achieved 3 GB/Sec! HPC SAN HPC SAN Professional Professional Services Services
Graphics Grid: Access for More Users to Visualization Services at Required Visual Quality and Performance Levels Storage Storage Compute Compute Display Display Clients Clients Visualization Visualization SAN/ NAS Graphics InterConnect Digital Video Delivery Compute Cluster Compute Cluster Visualization Services Over LAN/WAN
A Complete Solution ● Focused on key business and technical challenges ● Based on proven and repeatable reference architectures – Optimized for specific industries and applications ● Adoption and deployment assistance at all levels of Grid Computing – Cluster and enterprise grids up to global grids Hardware, Software, and Services Services Services Achitecture,assesment, Achitecture,assesment, implementation, and training implementation, and training Grid Computing Grid Computing Reference Architectures Reference Architectures Proven and repeatable methodologies Proven and repeatable methodologies Hardware Hardware Workstations, servers, Workstations, servers, complete rack systems, complete rack systems, and storage and storage Software Software Sun Sun ONE Grid Engine Family, ONE Grid Engine Family, Sun Sun Control Station, Control Station, Grid Engine Portal Grid Engine Portal Sun-Certified Grid Computing Partners Sun-Certified Grid Computing Partners
Execution Strategy Clear ambitious vision Identify and push back on obstacles Efficient engagement across Sun Use SunShot, SunCAP, SunSigma toolsets Clear scope and ownership HPTC "owns" technical markets, Grid, and early adopter commercial markets for Sun
Call to Action ● Take advantage of Sun's new focus on Grid to build partnerships between early adopters and HPTC ● Learn to speak the new languages of Grid, e.g. Java based Open Grid Services Architecture (OGSA) and Globus, Avaki and other Web Services ● Feedback: Tell Sun's HPTC group what works, what doesn't, where the opportunities lie...