Twitter, feel free to tweet about this session (use hashtag #tovmug) • I encourage you to take photos or videos of today’s session and share them online • This presentation will be made available online after the event
cluster” • A stretched cluster is a cluster with ESX/ESXi hosts in different physical locations, usually different geographic sites • Stretched clusters are typically built as a way to create “active/active” data centers in order to: • Provide high availability across sites • Do dynamic workload balancing across sites
stretched cluster, although they are typically deployed • Many pros and cons of stretched clusters stem directly from the use of HA/DRS • Stretched clusters are not a requirement for long-distance vMotion
prerequisite for long-distance vMotion • However, vMotion does derive from benefit from stretched clusters: • Intra-cluster vMotions are highly parallelized • Inter-cluster vMotions are serial • Using a stretched cluster could offer benefits to disaster avoidance use cases
storage at both ends • Storage performance suffers otherwise • Storage vMotion required to fix the performance hit • There are a couple different ways of addressing this requirement; each approach has its benefits/drawbacks • Solutions are generally limited to synchronous (~100km) distances
SAN solution • Stretching the SAN fabric between locations will usually involve multiple VSANs and inter-VSAN routing • Typically read/write in one location (read-only in the other) • Cross-connect topology allows hosts on both side to access read/write storage but introduces multipathing considerations • Implementations with only a single storage controller at each location create a SPoF (single point of failure)
storage • Distributes storage in a read/write fashion across multiple sites • Uses data locality algorithms to maximize cache benefits • Typically uses multiple controllers in a scale-out fashion • Needs a clustered file system for simultaneous host access • Must address “split brain” scenarios
failover • However, vSphere HA is not currently “site aware” • You can’t control failover destination • You can’t designate or define things like: • Per-site failover capacity • Per-site failover hosts • Per-site admission controls
than 8 hosts in HA-enabled stretched clusters or you’ll run afoul of HA primary node limitations • Can only deploy 4 hosts or less per site per cluster • Ensures distribution of HA primary nodes • No supported method to increase the number of primary nodes or to specify HA primary nodes • vSphere 5 has a new HA architecture that eliminates this consideration
sites using vSphere DRS • However, vSphere DRS (like HA) is not “site aware” • DRS host affinity rules can mimic “site awareness” • DRS host affinity rules are not dynamic • DRS host affinity rules create administrative overhead • DRS host affinity rules are defined and managed on a per- cluster basis
DRS • Like vSphere DRS, Storage DRS is not “site aware” • Manually align datastore clusters with storage topology to avoid introducing unnecessary latency • User-defined storage capabilities and profile-driven storage could be used to help mimic site awareness • Watch for impact on storage replication/synchronization
complexity • More complex network configuration is required to provide Layer 2 adjacency (or its equivalent) • More complex networking required to address routing issues created by VM mobility • Technologies to address these concerns are new (OTV, LISP) and require networking expertise to configure and maintain • What about the virtual network design?
(perhaps due to HA/DRS) could impact other areas: • Backups • Personnel/support • Disaster recovery/replication • Performance of multi-tier applications
strength and greatest weakness” for stretched clusters • vSphere 5 improves the situation significantly • Stretched clusters would directly benefit from improvements to HA/DRS in these areas: • “Site awareness” • More scalable/dynamic DRS host affinity rule management (policy-based placement)
from further networking developments such as: • LISP (or equivalent) to decouple network routing from network identity • OTV, EoMPLS, or other equivalents to enable Layer 2 adjacency • Not yet clear how VXLAN will play in this space • In the longer-term future, the need for Layer 2 adjacency needs to be addressed and resolved
will help stretched clusters • Active/active read-write storage at greater distances • Better handling of “split brain” scenarios • Better/more direct integration with replication for topologies with >2 sites (Sync-Sync-Async, for example)