Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speeding up your BCAs with Flash

Speeding up your BCAs with Flash

Presented at Sydney vForum 2013

99943c4dd162771ea10a70cd53984204?s=128

Nick Marshall

October 21, 2013
Tweet

Transcript

  1. Speeding up your Business Critical Apps with Flash Nick Marshall,

    VMware Michael Webster, VMware
  2. 2 Who are these guys? §  Nick Marshall •  VMware

    PSO Senior Consultant •  Blog: www.nickmarshall.com.au •  Twitter: @nickmarshall9 §  Michael Webster •  VMware COE Strategic Architect •  Blog: www.longwhiteclouds.com •  Twitter: @vcdxnz001 Book giveaway at the end of the session!
  3. 3 Flash Storage §  Flash is everywhere §  Used extensively

    in smartphones, tablets, laptop computers, storage arrays, etc. §  Adopting flash in enterprise servers? •  Presents an economical alternative to having a storage array §  How does VMware embrace flash technology in vSphere 5.5? •  Native support for provisioning of flash resources •  Flash caching support in ESXi storage stack •  VSAN leverages flash storage for high performance §  Today’s Focus: Application performance on vSphere 5.5 when leveraging flash
  4. 4 Agenda vSphere  Flash     Read  Cache   (vFRC)

      Virtual  SAN   (VSAN)  
  5. 5 vFRC Overview

  6. 6 vFRC – Overview §  vSphere 5.5 introduces vSphere Flash

    Infrastructure layer •  Aggregates flash storage devices into a unified flash resource •  Supports locally connected flash devices (PCIe, SAS/SATA drives, etc.,) §  Flash resource can be used for read caching of VM I/Os •  vSphere Flash Read Cache (vFRC) §  Write Policy •  Write-through : write I/Os are written to persistent storage and vFRC simultaneously •  Large writes are filtered – avoids cache pollution with log/streaming data §  Caches configured on per-VMDK basis •  Can be custom configured based on workload
  7. 7 vFRC - Overview Flash  Read  Cache VMDKs VM Layer

    ESX Layer Storage Layer Cache Hit Cache Miss No Cache
  8. 8 vFRC Tunables

  9. 9 Performance Tunables in vFRC §  What workloads can benefit

    from vFRC ? •  Read-dominated I/O pattern •  High repeated access of data (E.g. 20% of working set accessed 80% of time) •  Sufficient flash capacity to hold data that is accessed repeatedly §  What impacts vFRC performance ? •  Cache Size – should be big enough to hold active working set of workload •  Cache Block Size – should match the dominant I/O size of workload •  Flash Device Types – PCIe flash cards vs. SSD drives
  10. 10 a. Cache Size §  Cache sizes are specified manually

    when enabling vFRC for a VMDK •  Depends on working set of the application •  Should be sized to hold active working set §  Inadequate cache sizes lead to increased cache miss rate §  Over-sized cache leads to wastage of flash resources and sub- optimal performance during vMotion •  By default cache is migrated during vMotion •  Over-sized cache increases vMotion time §  How to determine the right working set size? •  vscsiStats workload tracing §  Cache size can be modified at run-time if necessary
  11. 11 b. Cache Block Size §  Basic unit of cache

    fill and cache eviction operation §  Affects effective utilization of cache capacity §  Bigger cache blocks lead to internal fragmentation, but consumes less memory §  Smaller cache blocks consumes more memory (upto 2% memory space overhead) §  Default cache block size is 8KB 0" 0.5" 1" 1.5" 2" 2.5" 4" 8" 16" 32" 64" 128" 256" 512" Memory'Consumed/Cache'size'(%)' Cache'Block'Size'(KB)' Memory'Overhead'wrt.'vFRC'block'size' 0" 300" 600" 900" 1200" 1500" 4KB" 8"KB" 16KB" 32KB" 64KB" 128KB" 256KB" 512KB" Baseline"(no" vFRC)" Latency(in(Microseconds( Cache(Block(Size(( Performance(Impact(of(Cache(Block(Size(
  12. 12 b. Cache Block Size §  Larger Cache Block Size

    (Example: 512KB cache block size for workload I/O Size of 8KB) – Internal Fragmentation vFRC Cache Blocks Valid Cached Data
  13. 13 c. Flash Device Type 30k – 40k Random Read

    IOPS 200 – 270 MB/s Read Bandwidth Read Latency – 75 microseconds Write Latency – 90 microseconds Upto 750k Random Read IOPS Upto 3 GB/s Read Bandwidth Read Latency – 75 microseconds Write Latency – 15 microseconds High Performance Low Cost VENDOR SPECIFICATIONS
  14. 14 vFRC Performance

  15. 15 vFRC Performance – Applications §  What workloads can benefit

    from vFRC? •  Read-dominated I/O pattern •  High repeated access of data (E.g. 20% of working set accessed 80% of time) •  Sufficient flash capacity to hold data that is accessed repeatedly §  Applications Considered: •  Data Warehousing (Swingbench DSS) •  Database Transactions (DVDstore) •  Real-World Enterprise Server Workloads (Publicly available I/O Traces)
  16. 16 1. Data Warehousing Application §  Decision Support System [TPC-H]

    §  Benchmark : Swingbench 2.4 using ‘Sales History’ Schema on Oracle 11g R2 database SWINGBENCH  DSS   BENCHMARK  ON  RHEL   6.4  VM   QUERIES >> << RESULTS ORACLE  11G  R2  ON   WINDOWS  2008   SERVER  VM   vFRC   EMC  VNX  5700   1TB  LUN,  RAID5  OVER  5  FC   15k  RPM  HDDs  
  17. 17 1. Data Warehousing Application §  Workload: Read dominated, High

    re-access rate §  vFRC Configuration: 8GB Cache Size and 8KB Cache block size 0" 2000" 4000" 6000" 8000" 10000" 12000" SRMC" SCMC" PSCR" SMA" PPSC" TSQ" SQC" #"of"transac+ons" Transac+on"Type" Transac+on"Count" Baseline" VFRC"
  18. 18 1. Data Warehousing Application §  Workload: Read dominated, High

    re-access rate §  vFRC Configuration: 8GB Cache Size and 8KB Cache block size §  Up to 84% improvement in average throughput §  Up to 2X reduction in latency 61.7% 112.9% 0" 20" 40" 60" 80" 100" 120" Baseline% VFRC% TPM% Transac8ons%Per%Minute% 20.389' 10.859' 0" 5" 10" 15" 20" 25" Baseline' VFRC' Response'TIme'(s)' Average'Response'Time'
  19. 19 2. Database Transaction Application §  Benchmark : DVDStore • 

    Simulates online e-commerce site operations •  Database : MS SQL Server 2008 •  Database Size : 15 GB §  Workload Characteristics •  60% reads •  Mostly random I/Os •  Predominant I/O size : 8KB §  VM Configuration •  8 vCPUs, 8GB Memory •  25GB Database disk, 10GB Log disk §  Storage Array •  VNX 5700, 1TB LUN – RAID5 over 5 FC 15k RPM disk drives
  20. 20 2. Database Transaction Application 8802$ 8937$ 12319$ 0$ 2000$

    4000$ 6000$ 8000$ 10000$ 12000$ 14000$ Baseline$ vFRC$6$10GB$ vFRC$6$15GB$ Orders$Per$Minute$ Up to 39% improvement in application throughput
  21. 21 3. Enterprise Server I/O Traces 1.23% 0.321% 0% 0.2%

    0.4% 0.6% 0.8% 1% 1.2% 1.4% Baseline% vFRC% Average%Latency%(ms)% §  a. Hardware Monitoring Server Workload •  Trace from servers that logs data from multiple hardware monitoring programs across a datacenter •  Collected at Microsoft Research, Cambridge* •  Trace replayed using IOAnalyzer §  95% reads §  vFRC size – 4GB §  vFRC block size – 4KB §  vFRC hit percentage – 85% * Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. Trans. Storage 4, 3, Article 10 (November 2008).
  22. 22 3. Enterprise Server I/O Traces §  b. Proxy Server

    Workload •  Trace from a web proxy server •  Collected at Microsoft Research, Cambridge* •  Trace replayed using IOAnalyzer §  67% reads §  vFRC Size : 16GB §  vFRC block Size : 4KB §  vFRC hit percentage : 83% * Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. Trans. Storage 4, 3, Article 10 (November 2008). 1.357& 0.612& 0& 0.2& 0.4& 0.6& 0.8& 1& 1.2& 1.4& 1.6& Baseline& vFRC& Average&Latency&(ms)&
  23. 23 vMotion Performance with vFRC §  vFRC is fully supported

    by vMotion and other vSphere features §  vMotion behavior of vFRC-enabled VM •  VM caches are migrated by default •  Option to drop cache during vMotion §  Migrating cache preserves application performance gains •  Consumes more network bandwidth •  Increased vMotion time §  Dropping cache during vMotion leads to temporary dip in application performance gains •  No extra overhead in vMotion •  Re-warms up the cache at destination
  24. 24 vFRC Performance Best Practices

  25. 25 vFRC Configuration Guidelines §  Cache size may be configured

    based on working set of workload •  Start with about 20% of VMDK size, and monitor vFRC stats to re-configure it §  Cache block size must match dominant I/O size of workload •  Workload I/O size is not equal to VM I/O Size! §  vFRC performs better with PCIe flash devices §  Decide on cache migration behavior during vMotion based on: •  Criticality of application performance •  Time taken for vMotion •  Network bandwidth availability
  26. 26 vscsiStats – DEEP Storage Diagnostics §  vscsiStats characterizes IO

    for each virtual disk •  Allows us to separate out each different type of workload into its own container and observe trends §  Histograms only collected if enabled; no overhead otherwise §  Metrics §  I/O Size §  Seek Distance §  Outstanding I/Os §  I/O Interarrival Times §  Latency
  27. 27 Making sense of vscsiStats and vFRC Stats §  vscsiStats

    can be used to know more about the workload •  IO Length Histogram •  Read Write Ratio •  I/O trace to compute working set size §  vFRC stats provide information about cache effectiveness •  numBlocks – total number of cache blocks for a VMDK •  numBlocksCurrentlyCached – number of cache blocks that actually contains data •  Evict:avgNumBlocksPerOp – average number of evictions •  avgCacheLatency – average device latency of flash resource •  maxCacheLatency – maximum device latency of flash resource •  cacheHitPercentage – percentage of read cache hits
  28. 28 vFRC Sizing Decisions based on vFRC Stats §  Using

    vFRC stats to make sizing decisions •  numBlocksCurrentlyCached < numBlocks : cache size may be reduced •  numBlocksCurrentlyCached = numBlocks and Evict:avgNumBlocksPerOp is high : cache size may be inadequate •  maxCacheLatency is very high : may be because of spike in device latency, which may mean device has worn out •  cacheHitPercentage is high and Evict:avgNumBlocksPerOp is low : means cache is correctly configured §  For more detailed information, please refer to our performance whitepaper http://www.vmware.com/files/pdf/techpaper/vfrc-perf-vsphere55.pdf
  29. 29 Agenda vSphere  Flash     Read  Cache   (vFRC)

        Virtual  SAN   (VSAN)  
  30. 30 Virtual SAN - Architecture §  Each ESXi host contributes:

    •  Flash storage to absorb IOPS •  Hard disk drives to provide capacity §  Virtual SAN aggregates these resources from multiple servers in a vSphere cluster •  Provides a global datastore for VMs in the cluster §  HA/DRS ensures that the VM restarts on a host crash §  Virtual SAN objects can be split into multiple components for performance and data protection •  Governed by storage policies ESX VSAN cluster ESX ESX VM virtual disk VSAN object replica-1 replica-2 Witness
  31. 31 Experiment Setup §  Hardware •  16 core 2.9 GHz

    Dell R720 machines •  2 x Intel PCIe R910 SSD – 200GB (1 PCIe slot) •  12 x 10K RPM Seagate SAS disks •  10G VSAN dedicated network, 1G for VM network §  VSAN Configuration •  2 x Disk groups per machine, 6 x Disks per disk group •  hostFailuresToTolerate 1, stripeWidth 1 §  Workload Characteristics •  ViewPlanner 3.0 Standard benchmark with 2 sec think-time (heavy user) •  ViewPlanner Group A : CPU intensive & Group B: I/O intensive apps. •  1900 x 1200 resolution, PCoIP •  Windows 7 desktop’s and Winxp Clients. •  VDI workload is known to be CPU intensive but sensitive to I/O latency.
  32. 32 Virtual SAN Delivers IOPS Required by VDI •  Virtual

    SAN can meet the IOPS required by VDI workload
  33. 33 Virtual SAN Scale.. 275 460 667 0 100 200

    300 400 500 600 700 800 3 node 5 node 7 node Number of Heavy VDI Users Virtual SAN scale VSAN Linear (VSAN)
  34. 34 Group A Score Comparison 0 0.2 0.4 0.6 0.8

    1 1.2 Avg Application Latency Group A VSAN SAN •  Impact to Group A application latencies is marginal •  Virtual SAN uses very few cycles of Host CPU.
  35. 35 Group B Score Comparison 0 1 2 3 4

    5 6 7 Avg Application Latency Group B VSAN All-Flash-SAN •  Group B application latencies are close to All-Flash-SAN •  Virtual SAN can meet the IOPS required by VDI workload
  36. 36 VSAN VDI Consolidation compared to physical SAN array § 

    VSAN performs better than a typical mid-range FC storage array •  VSAN benefits from local flash storage that provide high performance §  Impact of VSAN CPU consumption on application performance is low §  Physical SAN array is not required to run VDI workload 0" 100" 200" 300" 400" 500" 600" 700" 800" 3" 5" 7" Number'of'VMs'' Number'of'Nodes'(Servers)' VDI'Consolida9on'Ra9o' Mid.range"FC"Array" vSAN" All.Flash"FC"Array"
  37. 37 Q & A Best Question Wins:

  38. Speeding up your Business Critical Apps with Flash Nick Marshall,

    VMware Michael Webster, VMware