Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speeding up your BCAs with Flash

Speeding up your BCAs with Flash

Presented at Sydney vForum 2013

Avatar for Nick Marshall

Nick Marshall

October 21, 2013
Tweet

More Decks by Nick Marshall

Other Decks in Technology

Transcript

  1. 2 Who are these guys? §  Nick Marshall •  VMware

    PSO Senior Consultant •  Blog: www.nickmarshall.com.au •  Twitter: @nickmarshall9 §  Michael Webster •  VMware COE Strategic Architect •  Blog: www.longwhiteclouds.com •  Twitter: @vcdxnz001 Book giveaway at the end of the session!
  2. 3 Flash Storage §  Flash is everywhere §  Used extensively

    in smartphones, tablets, laptop computers, storage arrays, etc. §  Adopting flash in enterprise servers? •  Presents an economical alternative to having a storage array §  How does VMware embrace flash technology in vSphere 5.5? •  Native support for provisioning of flash resources •  Flash caching support in ESXi storage stack •  VSAN leverages flash storage for high performance §  Today’s Focus: Application performance on vSphere 5.5 when leveraging flash
  3. 6 vFRC – Overview §  vSphere 5.5 introduces vSphere Flash

    Infrastructure layer •  Aggregates flash storage devices into a unified flash resource •  Supports locally connected flash devices (PCIe, SAS/SATA drives, etc.,) §  Flash resource can be used for read caching of VM I/Os •  vSphere Flash Read Cache (vFRC) §  Write Policy •  Write-through : write I/Os are written to persistent storage and vFRC simultaneously •  Large writes are filtered – avoids cache pollution with log/streaming data §  Caches configured on per-VMDK basis •  Can be custom configured based on workload
  4. 7 vFRC - Overview Flash  Read  Cache VMDKs VM Layer

    ESX Layer Storage Layer Cache Hit Cache Miss No Cache
  5. 9 Performance Tunables in vFRC §  What workloads can benefit

    from vFRC ? •  Read-dominated I/O pattern •  High repeated access of data (E.g. 20% of working set accessed 80% of time) •  Sufficient flash capacity to hold data that is accessed repeatedly §  What impacts vFRC performance ? •  Cache Size – should be big enough to hold active working set of workload •  Cache Block Size – should match the dominant I/O size of workload •  Flash Device Types – PCIe flash cards vs. SSD drives
  6. 10 a. Cache Size §  Cache sizes are specified manually

    when enabling vFRC for a VMDK •  Depends on working set of the application •  Should be sized to hold active working set §  Inadequate cache sizes lead to increased cache miss rate §  Over-sized cache leads to wastage of flash resources and sub- optimal performance during vMotion •  By default cache is migrated during vMotion •  Over-sized cache increases vMotion time §  How to determine the right working set size? •  vscsiStats workload tracing §  Cache size can be modified at run-time if necessary
  7. 11 b. Cache Block Size §  Basic unit of cache

    fill and cache eviction operation §  Affects effective utilization of cache capacity §  Bigger cache blocks lead to internal fragmentation, but consumes less memory §  Smaller cache blocks consumes more memory (upto 2% memory space overhead) §  Default cache block size is 8KB 0" 0.5" 1" 1.5" 2" 2.5" 4" 8" 16" 32" 64" 128" 256" 512" Memory'Consumed/Cache'size'(%)' Cache'Block'Size'(KB)' Memory'Overhead'wrt.'vFRC'block'size' 0" 300" 600" 900" 1200" 1500" 4KB" 8"KB" 16KB" 32KB" 64KB" 128KB" 256KB" 512KB" Baseline"(no" vFRC)" Latency(in(Microseconds( Cache(Block(Size(( Performance(Impact(of(Cache(Block(Size(
  8. 12 b. Cache Block Size §  Larger Cache Block Size

    (Example: 512KB cache block size for workload I/O Size of 8KB) – Internal Fragmentation vFRC Cache Blocks Valid Cached Data
  9. 13 c. Flash Device Type 30k – 40k Random Read

    IOPS 200 – 270 MB/s Read Bandwidth Read Latency – 75 microseconds Write Latency – 90 microseconds Upto 750k Random Read IOPS Upto 3 GB/s Read Bandwidth Read Latency – 75 microseconds Write Latency – 15 microseconds High Performance Low Cost VENDOR SPECIFICATIONS
  10. 15 vFRC Performance – Applications §  What workloads can benefit

    from vFRC? •  Read-dominated I/O pattern •  High repeated access of data (E.g. 20% of working set accessed 80% of time) •  Sufficient flash capacity to hold data that is accessed repeatedly §  Applications Considered: •  Data Warehousing (Swingbench DSS) •  Database Transactions (DVDstore) •  Real-World Enterprise Server Workloads (Publicly available I/O Traces)
  11. 16 1. Data Warehousing Application §  Decision Support System [TPC-H]

    §  Benchmark : Swingbench 2.4 using ‘Sales History’ Schema on Oracle 11g R2 database SWINGBENCH  DSS   BENCHMARK  ON  RHEL   6.4  VM   QUERIES >> << RESULTS ORACLE  11G  R2  ON   WINDOWS  2008   SERVER  VM   vFRC   EMC  VNX  5700   1TB  LUN,  RAID5  OVER  5  FC   15k  RPM  HDDs  
  12. 17 1. Data Warehousing Application §  Workload: Read dominated, High

    re-access rate §  vFRC Configuration: 8GB Cache Size and 8KB Cache block size 0" 2000" 4000" 6000" 8000" 10000" 12000" SRMC" SCMC" PSCR" SMA" PPSC" TSQ" SQC" #"of"transac+ons" Transac+on"Type" Transac+on"Count" Baseline" VFRC"
  13. 18 1. Data Warehousing Application §  Workload: Read dominated, High

    re-access rate §  vFRC Configuration: 8GB Cache Size and 8KB Cache block size §  Up to 84% improvement in average throughput §  Up to 2X reduction in latency 61.7% 112.9% 0" 20" 40" 60" 80" 100" 120" Baseline% VFRC% TPM% Transac8ons%Per%Minute% 20.389' 10.859' 0" 5" 10" 15" 20" 25" Baseline' VFRC' Response'TIme'(s)' Average'Response'Time'
  14. 19 2. Database Transaction Application §  Benchmark : DVDStore • 

    Simulates online e-commerce site operations •  Database : MS SQL Server 2008 •  Database Size : 15 GB §  Workload Characteristics •  60% reads •  Mostly random I/Os •  Predominant I/O size : 8KB §  VM Configuration •  8 vCPUs, 8GB Memory •  25GB Database disk, 10GB Log disk §  Storage Array •  VNX 5700, 1TB LUN – RAID5 over 5 FC 15k RPM disk drives
  15. 20 2. Database Transaction Application 8802$ 8937$ 12319$ 0$ 2000$

    4000$ 6000$ 8000$ 10000$ 12000$ 14000$ Baseline$ vFRC$6$10GB$ vFRC$6$15GB$ Orders$Per$Minute$ Up to 39% improvement in application throughput
  16. 21 3. Enterprise Server I/O Traces 1.23% 0.321% 0% 0.2%

    0.4% 0.6% 0.8% 1% 1.2% 1.4% Baseline% vFRC% Average%Latency%(ms)% §  a. Hardware Monitoring Server Workload •  Trace from servers that logs data from multiple hardware monitoring programs across a datacenter •  Collected at Microsoft Research, Cambridge* •  Trace replayed using IOAnalyzer §  95% reads §  vFRC size – 4GB §  vFRC block size – 4KB §  vFRC hit percentage – 85% * Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. Trans. Storage 4, 3, Article 10 (November 2008).
  17. 22 3. Enterprise Server I/O Traces §  b. Proxy Server

    Workload •  Trace from a web proxy server •  Collected at Microsoft Research, Cambridge* •  Trace replayed using IOAnalyzer §  67% reads §  vFRC Size : 16GB §  vFRC block Size : 4KB §  vFRC hit percentage : 83% * Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. Trans. Storage 4, 3, Article 10 (November 2008). 1.357& 0.612& 0& 0.2& 0.4& 0.6& 0.8& 1& 1.2& 1.4& 1.6& Baseline& vFRC& Average&Latency&(ms)&
  18. 23 vMotion Performance with vFRC §  vFRC is fully supported

    by vMotion and other vSphere features §  vMotion behavior of vFRC-enabled VM •  VM caches are migrated by default •  Option to drop cache during vMotion §  Migrating cache preserves application performance gains •  Consumes more network bandwidth •  Increased vMotion time §  Dropping cache during vMotion leads to temporary dip in application performance gains •  No extra overhead in vMotion •  Re-warms up the cache at destination
  19. 25 vFRC Configuration Guidelines §  Cache size may be configured

    based on working set of workload •  Start with about 20% of VMDK size, and monitor vFRC stats to re-configure it §  Cache block size must match dominant I/O size of workload •  Workload I/O size is not equal to VM I/O Size! §  vFRC performs better with PCIe flash devices §  Decide on cache migration behavior during vMotion based on: •  Criticality of application performance •  Time taken for vMotion •  Network bandwidth availability
  20. 26 vscsiStats – DEEP Storage Diagnostics §  vscsiStats characterizes IO

    for each virtual disk •  Allows us to separate out each different type of workload into its own container and observe trends §  Histograms only collected if enabled; no overhead otherwise §  Metrics §  I/O Size §  Seek Distance §  Outstanding I/Os §  I/O Interarrival Times §  Latency
  21. 27 Making sense of vscsiStats and vFRC Stats §  vscsiStats

    can be used to know more about the workload •  IO Length Histogram •  Read Write Ratio •  I/O trace to compute working set size §  vFRC stats provide information about cache effectiveness •  numBlocks – total number of cache blocks for a VMDK •  numBlocksCurrentlyCached – number of cache blocks that actually contains data •  Evict:avgNumBlocksPerOp – average number of evictions •  avgCacheLatency – average device latency of flash resource •  maxCacheLatency – maximum device latency of flash resource •  cacheHitPercentage – percentage of read cache hits
  22. 28 vFRC Sizing Decisions based on vFRC Stats §  Using

    vFRC stats to make sizing decisions •  numBlocksCurrentlyCached < numBlocks : cache size may be reduced •  numBlocksCurrentlyCached = numBlocks and Evict:avgNumBlocksPerOp is high : cache size may be inadequate •  maxCacheLatency is very high : may be because of spike in device latency, which may mean device has worn out •  cacheHitPercentage is high and Evict:avgNumBlocksPerOp is low : means cache is correctly configured §  For more detailed information, please refer to our performance whitepaper http://www.vmware.com/files/pdf/techpaper/vfrc-perf-vsphere55.pdf
  23. 30 Virtual SAN - Architecture §  Each ESXi host contributes:

    •  Flash storage to absorb IOPS •  Hard disk drives to provide capacity §  Virtual SAN aggregates these resources from multiple servers in a vSphere cluster •  Provides a global datastore for VMs in the cluster §  HA/DRS ensures that the VM restarts on a host crash §  Virtual SAN objects can be split into multiple components for performance and data protection •  Governed by storage policies ESX VSAN cluster ESX ESX VM virtual disk VSAN object replica-1 replica-2 Witness
  24. 31 Experiment Setup §  Hardware •  16 core 2.9 GHz

    Dell R720 machines •  2 x Intel PCIe R910 SSD – 200GB (1 PCIe slot) •  12 x 10K RPM Seagate SAS disks •  10G VSAN dedicated network, 1G for VM network §  VSAN Configuration •  2 x Disk groups per machine, 6 x Disks per disk group •  hostFailuresToTolerate 1, stripeWidth 1 §  Workload Characteristics •  ViewPlanner 3.0 Standard benchmark with 2 sec think-time (heavy user) •  ViewPlanner Group A : CPU intensive & Group B: I/O intensive apps. •  1900 x 1200 resolution, PCoIP •  Windows 7 desktop’s and Winxp Clients. •  VDI workload is known to be CPU intensive but sensitive to I/O latency.
  25. 32 Virtual SAN Delivers IOPS Required by VDI •  Virtual

    SAN can meet the IOPS required by VDI workload
  26. 33 Virtual SAN Scale.. 275 460 667 0 100 200

    300 400 500 600 700 800 3 node 5 node 7 node Number of Heavy VDI Users Virtual SAN scale VSAN Linear (VSAN)
  27. 34 Group A Score Comparison 0 0.2 0.4 0.6 0.8

    1 1.2 Avg Application Latency Group A VSAN SAN •  Impact to Group A application latencies is marginal •  Virtual SAN uses very few cycles of Host CPU.
  28. 35 Group B Score Comparison 0 1 2 3 4

    5 6 7 Avg Application Latency Group B VSAN All-Flash-SAN •  Group B application latencies are close to All-Flash-SAN •  Virtual SAN can meet the IOPS required by VDI workload
  29. 36 VSAN VDI Consolidation compared to physical SAN array § 

    VSAN performs better than a typical mid-range FC storage array •  VSAN benefits from local flash storage that provide high performance §  Impact of VSAN CPU consumption on application performance is low §  Physical SAN array is not required to run VDI workload 0" 100" 200" 300" 400" 500" 600" 700" 800" 3" 5" 7" Number'of'VMs'' Number'of'Nodes'(Servers)' VDI'Consolida9on'Ra9o' Mid.range"FC"Array" vSAN" All.Flash"FC"Array"