Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Warehouse Appliances

Shankar
February 15, 2011

Data Warehouse Appliances

A Point Of View on Data Warehouse Appliances

Shankar

February 15, 2011
Tweet

More Decks by Shankar

Other Decks in Technology

Transcript

  1. Topics ×  What is a Data Warehouse Appliance ×  How

    does a DW Appliance work? ×  ‘Shared Nothing’ Architecture ×  Query Processing Requirements ×  Migration path from Oracle ×  Market Trends ×  Appliance Vendors ×  Appliance Scorecard
  2. Data Warehouse Appliance: What is it? ×  An integrated set

    of servers, storage, operating system(s), DBMS and software specifically pre- installed and pre-optimized for data warehousing (DW). ×  Typically supplied on a preconfigured set of hardware as a complete system, acting as a true appliance. ×  Can apply to software-only systems, purportedly easy to install on specific hardware configurations. Drivers •  Performance •  Scalability •  Cost •  Availability •  Portability
  3. DW Appliance Performance: How does it do that? ×  Massively

    Parallel Processing (MPP) ×  Data Warehouse Appliance architectures provide high query performance and platform scalability ×  Most MPP architectures implement a "shared-nothing architecture" where each server operates self-sufficiently and controls its own memory and disk ×  Basis of MPP Scalability ×  Divide the data, workload, and system resources evenly among many parallel, processing units ×  No single point of control for any operation ×  I/O, Buffers, Locking, Logging, Dictionary ×  Nothing centralized ×  Nothing in the way of linear scalability DW Appliance Logs Locks Buffers I/O
  4. VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs

    VPROCs VPROCs VPROCs VPROCs VPROCs VPROCs Advantage of Shared Nothing Architecture ×  Delivers linear scalability ×  Maximizes utilization of resources ×  To any size configuration ×  Allows flexible configurations ×  Incremental upgrades ×  Linear with a slope of 1 at any size # Nodes Performance
  5. Query Processing Requirements ×  ANSI compliant SQL ×  All functions,

    not just entry level ×  High performance algorithms ×  Join, Aggregation, Sort etc. ×  Compiled expressions ×  Complex query features ×  Derived tables, Case expressions, all forms of sub-queries, Samples etc. ×  Big limits: 256 table joins, 256 nesting levels etc. ×  1MB SQL/Views/Macros ×  The more complex the better
  6. Migration Paths from Standard Oracle Redesign 1:1 Migration (Forklift) Evolution

    Migration Drivers Approach Performance 1:1 or Redesign or Evolution Availability 1:1 or Evolution Scalability 1:1 or Evolution Single View of the Business Redesign or Evolution Business Questions (Complexity) Redesign or Evolution Cost of Ownership 1:1 There are several options for migrating from Oracle to a DWA. The best strategy is dependent upon the overall goals of the project, existing capabilities and organizational constraints.
  7. DW Appliance Market Trends ×  Vendors have started moving toward

    using commodity technologies rather than proprietary assembly of commodity components. ×  Implemented applications show usage expansion from tactical implementations to strategic and enterprise data warehouse use ×  Most analysts see DW appliances gaining market share vs. traditional DBMS solutions ×  Vendors have begun providing the ability to incorporate 'in-database' analytic algorithms to take advantage of their MPP architectures
  8. Appliance Vendors ×  Netezza* ×  Teradata ×  Aster Data** × 

    Oracle Exadata ×  Greenplum*** * - IBM acquired Netezza 2010-Q3 ** - Teradata acquired Aster Data 2011-Q1 *** - EMC acquired Greenplum 2010-Q3
  9. Netezza TwinFin-12 •  8 Disk Enclosures •  96 1TB SAS

    Drives •  (4 hot spares) •  RAID 1 Mirroring •  12 Netezza S-Blades: •  96 Core’s ( Intel Quad-Core 2.5 GHz) •  96 FPGA’s ( 125 MHz ) •  192 GB DDR2 RAM (> ½ TB compressed) •  Linux 64-bit Kernel •  2 Hosts (Active-Passive): •  24 Cores (Quad-Core Intel 2.6 GHz) •  96 GB Memory •  4x146 GB SAS Drives • Red Hat Linux 5 64-bit • 10G Internal Network •  User Data Capacity: 128 TB •  Data Scan Speed: 145 TB/Hour •  Load Speed (per system): 2.0 TB/Hour •  Power/Rack: 7,400 Watts •  Cooling/Rack: 25,500 BTU/Hour
  10. Teradata 2650 n  Teradata Database 13.x, Software Bundle, and SUSE

    Linux Operating System n  Fully-integrated cabinet design n  Intel Six Core Xeon Westmere Processors @ 2.93GHz n  Nine servers per cabinet n  (216) 300GB or 600GB enterprise-class drives per fully populated cabinet n  (108) 2TB SAS drives per fully populated cabinet n  16.4TB customer data per fully populated cabinet with 300GB drives n  32.2TB customer data per fully populated cabinet with 600GB drives n  54.9TB customer data per fully populated cabinet with 2TB drives n  Scales up to 6 cabinets, ~ 275TB customer data with 2TB drives n  Teradata Managed Server Options n  RAID1 disk mirroring, Automatic Node Failover n  864GB memory per cabinet n  Teradata BYNET® over Gigabit Ethernet
  11. Aster Data The Aster Data Solution The Aster Data Analytic

    Platform, Aster Data nCluster, delivers a massively parallel software solution that embeds MapReduce analytic processing with data stores for big data analytics that incorporate new data sources and types to deliver new analytic capabilities with breakthrough performance and scalability. Its unique Applications- Within® architecture runs analytic application logic inside the system, leveraging its massively-parallel processing architecture and patent-pending SQL- MapReduce® to fully parallelize processing for deep and ultra-fast analysis of massive data sets. Appliance Highlights •  “Always-Parallel” Pervasive Parallelism •  Embedded MapReduce •  Unlimited Scalability •  “Always-On” Fault Tolerance •  Hybrid Row and Column Stores •  Dynamic Mixed Workload Management •  Extensibility Framework
  12. Greenplum )JHI$BQBDJUZ%$"3FHBSEMFTTPGZPVSEBUBDPNQVUJOHSFRVJSFNFOUT UI DCA family to meet your needs. EMC

    Greenplum Database is at the core switching between them can be effortless as requirements change and DCA The DCA is a purpose-built, highly scalable next generation data wareh architecturally integrates database, compute, storage, and network into easy-to-implement system. It is the industry leader in price and perform HIGH CAPACITY DCA 5IF)JHI$BQBDJUZ%$"JTEFTJHOFEUPIPTUBNVMUJQFUBCZUFPGEBUBXJUI space, surging power consumption, or increasing costs. For businesses analysis of extremely large amounts of data or those looking for a longe model offers the lowest cost-per-unit data warehouse. DCA FAMILY SPECIFICATIONS OVERVIEW DCA GP1000 High Capacity DCA GP1000C Master Servers 2 2 Segment Servers 16 16 Total CPU core 192 192 Total Memory 768 GB 768 GB Segment HDD’s SSDs 192 192 Usable Capacity (uncompressed) 36 TB 124 TB Usable Capacity (compressed) 144 TB 496TB Maximum Expansion 6 racks 6 racks Scan Rate 24GB/Sec 14 GB/Sec Data Load Rate 5#)PVS 5#)PVS t4VQQPSUTMJOFBSTDBMBCJMJUZ HIGH CAPACITY DCA t"CJMJUZUPIPTUNVMUJQFUBCZUFPGEBUB without taking up additional space, surging power consumption, or increasing costs t#FTUQSJDFQFSVOJUEBUBXBSFIPVTF appliance DATA COMPUTING APPLIANCE FAMILY The EMC Greenplum Data Computing Appliance (DCA) is available in two models: DCA and High Capacity DCA. DCA The DCA is a purpose-built, highly scalable next generation data warehousing appliance that architecturally integrates database, compute, storage, and network into an enterprise-class, easy-to-implement system. It is the industry leader in price and performance. HIGH CAPACITY DCA The High Capacity DCA is designed to host a multi-petabyte of data without taking up additional space, surging power consumption, or increasing costs. For businesses that require detailed analysis of extremely large amounts of data or those looking for a longer term archive, this model offers the lowest cost-per-unit data warehouse.