Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HYRISE

grundprinzip
January 28, 2012

 HYRISE

HYRISE is a research prototype for an in-memory hybrid database. In this talk I present the general architecture and the ideas behind our layout algorithm.

grundprinzip

January 28, 2012
Tweet

More Decks by grundprinzip

Other Decks in Science

Transcript

  1. HYRISE – In-Memory Storage Engine Martin Grund1, Jens Krueger1, Philippe

    Cudre-Mauroux3, Samuel Madden2 Alexander Zeier1, Hasso Plattner1 1Hasso-Plattner-Institute, Germany 2MIT CSAIL, USA 3University of Fribourg, Switzerland
  2. Motivation ▪  Enterprise applications have evolved: not just OLAP vs.

    OLTP □  Demand for real-time analytics on transactional data □  High throughput analytics è completely in memory –  Massive RAMs (>1TB/node) enable this for many apps Example: ▪  Available-To-Promise Check – Perform real-time ATP check directly on transactional data during order entry, without materialized aggregates of available stocks. ▪  Dunning – Search for open invoices interactively instead of scheduled batch runs. ▪  Operational Analytics – Instant customer sales analytics with always up-to-date data. HYRISE | Martin Grund | VLDB 2011 2
  3. Our System: HYRISE ▪  High throughput on structured enterprise data

    □  Completely in main memory ▪  Efficiently executed both OLTP and OLAP requests □  Key idea: Vertically partition tables ▪  New algorithms to find the best partitioning for all tables □  Based on a workload profile □  Using a cache-miss based cost model □  Scalable to huge number of tables, wide relations –  E.g., Many SAP apps have 10K+ tables w/ 100+ columns HYRISE | Martin Grund | VLDB 2011 3
  4. Memory Hierarchy - Recap ▪  Memory hierarchy does not stop

    with main memory ▪  Motivation for disk-based column stores, remains valid for main memory; Avoid loading data that is not accessed. HYRISE | Martin Grund | VLDB 2011 4 ▪  Accessing memory with different strides introduces different latencies Sequential accesses 10x-100x faster 1 10 100 1000 8 64 512 4K 32K 256K 2M CPU Cycles per Value Stride in Bytes CPU Registers Main Memory Flash Hard Disk Higher Performance Lower Price / Higher Latency CPU Caches
  5. Enterprise Application Characteristics ▪  Identify: Why are enterprise applications so

    complex? ▪  Detailed customer data analysis from SAP installations of 12 companies (~32 billion event records analyzed) ▪  Enterprise applications have □  Extremely wide schemas – up to 300 attributes on heavily used tables □  Thousands of tables – every ERP installation ~ 70k □  Changing workload HYRISE | Martin Grund | VLDB 2011 6
  6. 0 % 10 % 20 % 30 % 40 %

    50 % 60 % 70 % 80 % 90 % 100% Typical Tranasactional Customer Database TPC-C Workload Lookup / Read Table Scan Range Select Insert Modification Delete Write: Read: Example Enterprise Workload HYRISE | Martin Grund | VLDB 2011 7 ▪  Range selects occur often ▪  Real world is more complicated than single tuple access ▪  With new applications the “read”-gap will even increase
  7. Summary ▪  Traditional View □  OLTP Systems for transactional scenarios

    □  OLAP Systems for analytical scenarios ▪  Our View: Single System □  Main Memory □  Vertically partitioned □  Single copy of data (no redundancy) –  To reduce maintenance and overhead of multiple copies Key challenge: How to perform vertical partitioning to optimize performance on a given hybrid workload HYRISE | Martin Grund | VLDB 2011 8
  8. HYRISE Architecture ▪  Our focus is on three key aspects

    □  In-Memory Data Storage –  Predicting access costs □  Layout Decisions □  Optimizing Query Execution ▪  Layout Engine integrates cost model and workload data HYRISE | Martin Grund | VLDB 2011 10 Task Based Executor Query Engine Layout Engine Main Memory Storage Layer Cost Model Workload Data
  9. HYRISE Partitioning Problem ▪  Each table split into a set

    of non-overlapping containers (partitions) □  Each container consists of one or more attributes ▪  Uses workload as input to find best partitioning ▪  The performance of each workload operator on a given layout is calculated based on cache misses ▪  Container overhead cost defines the cost of loading data that is not accessed by a query operator HYRISE | Martin Grund | VLDB 2011 11 C1 (a1) C2 (a2 .. a6) C2 (a7 .. a8) r = (a1 ... a8)
  10. Cost Model – Projections ▪  Goal is to predict cost

    of basic accesses to a container □  Based on access to multiple attributes over all rows (projection) and access to all attributes of a container to a selection of rows (selectivity) ▪  Cache misses are precisely calculated, using the offset and width of the columns projected from the container □  Not enough to calculate #accessed bytes à understand how the accessed data is laid out HYRISE | Martin Grund | VLDB 2011 12 Cache Line Width
  11. Cost Model – Selection ▪  Experimental validation shows the match

    of the model and reality HYRISE | Martin Grund | VLDB 2011 13 0 0.5 1 1.5 2 2.5 0.001 0.101 0.201 0.301 0.401 0.501 0.601 0.701 0.801 0.901 Cache Misses Millions Selectivity Row Store Row Cost (Model) Column Store Column Cost (Model)
  12. Cost Model ▪  HYRISE cost model provides means to calculate

    cache misses for □  Full projections / partial projections □  Selections – capturing both independent and overlapping selections ▪  More complex operators can be composed out of the basic elements ▪  Experiments show that cache misses are a good predictor for performance of in-memory database systems. HYRISE | Martin Grund | VLDB 2011 14
  13. Cost Model – Cache Miss vs. Cycles HYRISE | Martin

    Grund | VLDB 2011 15 0 200 400 600 800 1000 1200 0 1 2 3 4 5 6 7 8 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Millions Millions Column Store Row Store CPU Cycles Col CPU Cycles Row ▪  Cache misses are a good predictor for performance
  14. Layout Selection ▪  For narrow tables, finding the optimal layout

    is easy and can be done through exhaustive enumeration ▪  Enterprise applications have super-wide schemas □  Up to 300 attributes in our study ▪  è millions of possible layouts HYRISE | Martin Grund | VLDB 2011 17
  15. First Approach ▪  Exponential, but multiple pruning steps that reduce

    the number of possible layouts in practice 1.  Candidate Generation □  Determine all primary partitions (the largest partitions that will not incur any container overhead cost) 2.  Candidate Merging □  Inspect all permutations of primary partitions to generate partitions that minimize the overall cost 3.  Layout Generation □  Generate all valid layouts by exhaustively exploring all possible combinations of partitions from the second phase HYRISE | Martin Grund | VLDB 2011 18
  16. Candidate Generation ▪  Determining all primary partitions □  Primary Partition:

    Largest partition that does not incur container overhead cost ▪  Each operation on a table implicitly splits the attributes into two subsets □  The order of the operations can be ignored ▪  Recursively splitting each set of attributes of the workload into subsets for each operation HYRISE | Martin Grund | VLDB 2011 19
  17. Candidate Generation HYRISE | Martin Grund | VLDB 2011 20

    ORG PHONE COMPANY EMAIL NAME ID Table Query 1 - Select ID,NAME from Table where ORG = 9 Query 2 - Select ID,COMPANY from Table where ORG = 9 ID NAME ORG ID COMPANY ORG OP 1 OP 2 OP 3 OP 4
  18. ORG PHONE COMPANY EMAIL NAME ID ID NAME OP 1

    ID NAME ORG PHONE EMAIL COMPANY ORG OP 2 ID NAME PHONE EMAIL COMPANY ORG ID COMPANY OP 3 ORG OP 4 ID EMAIL PHONE ORG NAME COMPANY ID EMAIL PHONE ORG NAME COMPANY Candidate Generation HYRISE | Martin Grund | VLDB 2011 21
  19. Candidate Merging ▪  Generate possible permutations of primary partitions ▪ 

    Identify partitions that reduce the overall cost for the workload □  Based on the assumption that the access cost for two partitions with the same attribute set can be independently computed □  Calculation based on the cost model HYRISE | Martin Grund | VLDB 2011 22
  20. Candidate Merging HYRISE | Martin Grund | VLDB 2011 23

    Primary Permutation Subset 1 12,000 ✔11,764 Subset 2 12,000 ✔11,764 Subset 3 12,000 ✖36,764 Will be inserted into the global candidate list Only an excerpt, 5 attributes Generate 31 permutations. ID COMPANY ID NAME Primary Partitions Merged Permutation ORG ID COMPANY ID ID ORG ID NAME ID COMPANY vs ID NAME
  21. Candidate Merging HYRISE | Martin Grund | VLDB 2011 24

    Result of Phase 2 ID NAME ID NAME EMAIL PHONE ORG COMPANY ID COMPANY ID NAME COMPANY
  22. Layout Generation ▪  Generate all possible valid layouts from the

    result of phase 2 ▪  Exhaustively explore all combinations ▪  A valid layout contains all attributes exactly once HYRISE | Martin Grund | VLDB 2011 25
  23. Layout Generation HYRISE | Martin Grund | VLDB 2011 26

    ✔ EMAIL PHONE COMPANY ORG ID NAME COMPANY EMAIL PHONE ORG NAME ID ORG EMAIL PHONE NAME COMPANY ID ORG COMPANY NAME EMAIL PHONE ID 27.7 28.2 28.2 28.5 Cost in 1000
  24. Divide and Conquer Partitioning ▪  With huge numbers of attributes

    the scalability of the original algorithm degrades ▪  Proposal: approximation that clusters frequently used attributes, by generating optimal sub-layouts for each cluster of primary partitions HYRISE | Martin Grund | VLDB 2011 27
  25. Sample Workload ▪  Mixed workload that is loosely based on

    the SAP Sales and Distribution scenario □  Total benchmark size of 28 GB data ▪  13 Queries □  9 OLTP Queries with typical CRUD operations □  3 OLAP-like Queries with high selectivity □  1 Planning like query with incrementally decreasing selectivity HYRISE | Martin Grund | VLDB 2011 29
  26. HYRISE Workload Evaluation ▪  Layout Example – Input table are

    sales order headers □  3 containers: VBELN (id) is used by many different queries; (KUNNR, AEDAT) are evaluated as predicates together; third partition is accessed by “SELECT *” operators HYRISE | Martin Grund | VLDB 2011 30 ... AEDAT ... ... KUNNR ... VBELN VBELN AEDAT KUNNR ...
  27. HYRISE Workload Evaluation Total Cycles 0 50 100 150 200

    250 300 350 400 450 500 Thousands Row Column HYRISE HYRISE | Martin Grund | VLDB 2011 31 ▪  HYRISE uses 4x less cycles than the all row layout, and is about 1.6 times faster than the all column layout ▪  Depending on the query weight HYRISE’s advantage can vary
  28. HYRISE Workload Evaluation ▪  Strong tension between the layouts, since

    most of the times the hybrid layout can only be as good as one of them ▪  The mixed workload increases the benefit of a hybrid layout ▪  Hybrid layout is usually better than the comparable layout HYRISE | Martin Grund | VLDB 2011 32
  29. HYRISE Layout Tension ▪  Q6 (Insert) – HYRISE has to

    update multiple containers, row must be better ▪  Q5 (Select) – HYRISE clearly outperforms both approaches HYRISE | Martin Grund | VLDB 2011 33 0 0.2 0.4 0.6 0.8 1 1.2 Q6 Q5 Row Column HYRISE
  30. Conclusions ▪  Presented HYRISE □  Main memory hybrid database for

    mixed (OLTP + OLAP) workloads □  Novel algorithms to find optimal workload aware vertical partitioning –  Using a highly accurate cache-miss based model □  On SAP-based benchmark, 4x better than all rows and 60% better than all columns ▪  Come see HYRISE live at our Demo booth HYRISE | Martin Grund | VLDB 2011 35