OpenSolaris High Availability Cluster (OHAC)

OpenSolaris High Availability Cluster (OHAC)

311e7060e962b7a9bd9016221e7bf184?s=128

Miguel Vidal

October 27, 2009
Tweet

Transcript

  1. 1.

    Open HA Cluster OpenSolaris High Availability Cluster OSDevCon, Dresden 2009

    Jos´ e Castro / Miguel Vidal jfcastro@libresoft.es / mvidal@libresoft.es GSyC/Libresoft – URJC October 27-30, 2009 Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  2. 2.

    Open HA Cluster (cc) 2009 JF Castro - Miguel Vidal.

    Some rights reserved. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License, available at http://creativecommons.org/licenses/by-sa/3.0/ Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  3. 3.

    Open HA Cluster About OSDevCon OpenSolaris Developer Conference. A worldwide

    technical conference for developers and enthusiasts of OpenSolaris. The most important OSOL meeting outside of USA. Third Edition: Berlin (2007), Prague (2008) and Dresden (2009). Organized by Czech OpenSolaris User Group and the German Unix User Group (GUUG). Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  4. 4.

    Open HA Cluster OSDevCon 2009 Tutorials/Workshops (2 days) DTrace (4

    hours) OpenSolaris Kernel Debugging (4 hours) Open Solaris HA Cluster (8 hours) Presentations (2 days) 19 talks Fast deployment on hundreds machines, Network virtualisation using Crossbow technology, ZFS internal structures... OSUG Leaders Meeting (1 day) Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  5. 5.

    Open HA Cluster OpenSolaris High Availability Cluster (OHAC) Jos´ e

    Castro / Miguel Vidal OpenSolaris High Availability Cluster
  6. 6.

    Open HA Cluster What is OHAC? The latest free/libre version

    of Solaris Cluster software is now Open High Avalaibility Cluster 2009.06. The Solaris Cluster product group contains Sun Cluster software, Sun Cluster agents (data services), and Sun Cluster Geographic Edition software. Includes the core clustering framework, a suite of HA agents for various applications, an automated test framework, and a disaster recovery extension. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  7. 7.

    Open HA Cluster Development Context Jos´ e Castro / Miguel

    Vidal OpenSolaris High Availability Cluster
  8. 8.

    Open HA Cluster Licensing issues The majority of Open High

    Availability (Open HA) Cluster code is released under the CDDL (OSD-compliant). Some binary components are covered under the OpenSolaris Binary License (right to redistribute bundled with OpenSolaris), and some are covered under other open source licenses. OHAC initiative could form the basis of future versions of the Solaris Cluster product group. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  9. 9.

    Open HA Cluster Typical HA Cluster Hardware config Two or

    more physical machines Four or more network adapters on each machine Dedicated interconnects between nodes Shared disk storage (network-attached storage) Redundant storage paths from each node Quorum arbitration device Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  10. 10.

    Open HA Cluster Solaris Cluster Geographic Edition Jos´ e Castro

    / Miguel Vidal OpenSolaris High Availability Cluster
  11. 11.

    Open HA Cluster Typical HA Cluster software components Heartbeats Membership

    Distributed configuration repository Service management Cluster-private networking layer shared disk file system (GFS, OCFS...) Network load-balancing etc. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  12. 12.

    Open HA Cluster Bad things Two types of problems arise

    from cluster partitions: Split brain: Connection between nodes is lost and the cluster becomes partitioned into subclusters. Each partition “believes” that it is the only partition, possibly due to network faults. When multiple nodes attempt to write to the disks, data corruption can occur. Amnesia: When the cluster restarts after a shutdown with cluster configuration data older than at the time of the shutdown. OHAC software avoids split brain and amnesia by “quorum device” and “membership”. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  13. 13.

    Open HA Cluster Cluster Membership The Cluster Membership Monitor (CMM)

    is a distributed set of agents that exchange messages over the cluster interconnect to complete the following tasks: Enforcing a consistent membership view on all nodes (quorum) Driving synchronized reconfiguration in response to membership changes Handling cluster partitioning Ensuring full connectivity among all cluster members by leaving unhealthy nodes out of the cluster until it is repaired Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  14. 14.

    Open HA Cluster Quorum Assigning each node one vote A

    partition with the majority of votes gains quorum and is allowed to operate. More than two nodes in a cluster is required. If a two-node cluster becomes partitioned, an external vote is needed for either partition to gain quorum. This external vote is provided by a Quorum Device (QD). Quorum devices can contain user data. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  15. 15.

    Open HA Cluster Quorum in TwoNode Configurations Jos´ e Castro

    / Miguel Vidal OpenSolaris High Availability Cluster
  16. 16.

    Open HA Cluster Quorum in Greater Than TwoNode Configurations Figure:

    In this configuration, the combination of any one or more nodes and the quorum device can form a cluster. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  17. 17.

    Open HA Cluster Fencing The Sun Cluster system uses SCSI

    disk reservations to implement failure fencing. Using SCSI reservations, failed nodes are “fenced” away from the multihost devices, preventing them from accessing those disks. When a cluster member detects that another node is no longer communicating over the cluster interconnect, it initiates a failure fencing procedure to prevent the other node from accessing shared disks. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  18. 18.

    Open HA Cluster Solaris Cluster Geographic Edition Jos´ e Castro

    / Miguel Vidal OpenSolaris High Availability Cluster
  19. 19.

    Open HA Cluster Project Colorado: Hardware Minimization The goal is

    to provide a minimal and extensible binary distribution of Open HA Cluster Basic cluster functionality with a low barrier to entry and easy deployment for OpenSolaris. Using local disks as “Poor man’s shared storage” with COMSTAR iSCSI and ZFS Using Crossbow VNICs for private cluster traffic over public network “Weak membership” (preview-only feature) Taken together, allow any two-nodes on the same IP subnet to form a functional cluster. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  20. 20.

    Open HA Cluster Minimalist HA Conf High Availability with a

    minimal Cluster: Figure: Project Colorado. Jos´ e Castro / Miguel Vidal OpenSolaris High Availability Cluster
  21. 21.

    Open HA Cluster Questions? Questions? Jos´ e Castro / Miguel

    Vidal OpenSolaris High Availability Cluster