Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Running Splunk on AWS

Running Splunk on AWS

Learn why and how Autodesk runs Splunk Enterprise in AWS with the goals of increasing automation, scalability and responsiveness. Including sample architecture, AWS CloudFormation templates and Ansible playbook (links to Github provided).

Presented @ Splunk .conf 2014

Alan Williams

October 09, 2014
Tweet

More Decks by Alan Williams

Other Decks in Technology

Transcript

  1. © 2014 Autodesk Running Splunk on Amazon Web Services Alan

    Williams alanwill on Twitter & GitHub Splunk .conf 2014 Principal Engineer
  2. © 2014 Autodesk During the course of this presentation, we

    may make forward looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in the this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk Disclaimer
  3. © 2014 Autodesk §  Engineer @ Autodesk §  Technology Generalist

    §  Background in Infrastructure §  AWS for ~4 years §  Splunk for ~1 year §  Motorcyclist §  Soft spot for pit bulls Who Am I?
  4. © 2014 Autodesk §  Leader in 3D design, engineering and

    entertainment software §  Introduced AutoCAD in 1982 §  Empowering the Maker movement §  Help our customers imagine, design and create a better world Who is Autodesk?
  5. © 2014 Autodesk §  Splunk 4.3 §  5 year old

    hardware §  Performance issues §  Global §  Now Make this better!
  6. © 2014 Autodesk Decisions…decisions Where we are Where we wanted

    to be Splunk 4.3 Latest Splunk version (6.x) EOL hardware Hardware refresh Fragile environment Resiliency Not rocket science…can we do this NOW?
  7. © 2014 Autodesk §  Take inventory of existing hardware § 

    Use the AWS Calculator §  http://calculator.s3.amazonaws.com/index.html §  Cost/compute analysis Where to begin
  8. © 2014 Autodesk Cost Analysis – Account for Everything Hardware

    & Maintenance Power & Cooling Rack space Storage (FC + SATA) Servers Load Balancers Network Data Transfer
  9. © 2014 Autodesk What we noticed… Total cost of server

    hardware vs Total cost of AWS instances = 35% lower for AWS Total cost of all on- premise infra vs Total cost of all AWS infra = 50% lower for AWS
  10. © 2014 Autodesk §  We can’t compete on price § 

    Economies of scale §  We can’t compete on speed §  Time to provision §  Time to deliver new features Outcome
  11. © 2014 Autodesk Architecture Auto scaling Group Availability Zone (1b)

    Virtual Private Cloud Availability Zone (1a) Auto scaling Group Application Subnet Application Subnet Presentation Subnet Internal LB Subnet Presentation Subnet Internal LB Subnet Auto scaling Group Auto scaling Group Auto scaling Group Auto scaling Group Search Head 1 Indexer Peer 1 Indexer Peer …8 Bastion (Ansible Master) Direct Connect splunk.mycompany.com IGW ELB Deploy Server A NFS 200 GB Inter- mediate Forwader Cluster Master Node NAT 1 Gbps 1 Gbps Internet LB Subnet Internet LB Subnet Arrow Legend Load Balancing Traffic Splunk inter-communication S3 shuttl archiving NFS Search Head 3 Auto scaling Group License Master Node
  12. © 2014 Autodesk §  AWS CloudFormation Template §  Infrastructure provisioning

    §  Ansible Playbook §  Software install and configuration Automated and Dynamic
  13. © 2014 Autodesk §  An AWS service §  JSON based

    template framework §  Describe almost all AWS resources §  Enables infrastructure as code §  Version control infrastructure §  Infrastructure portability What is CloudFormation?
  14. © 2014 Autodesk §  Configuration Automation tool §  Configure, deploy

    and orchestrate tasks §  Agentless §  YAML based §  Fairly simple to get up and running quickly What is Ansible?
  15. © 2014 Autodesk CloudFormation Template (splunk-app.json) §  Search Heads § 

    Peer Nodes §  Cluster Master §  License Master §  Deployment Server §  NFS Instance §  Elastic Load Balancer §  Security Groups §  IAM Roles §  EBS Volumes §  Auto Scaling Groups https://github.com/alanwill/cfn-splunk ~10 minutes to complete
  16. © 2014 Autodesk Ansible Playbook (ansible-splunk) §  Update latest OS

    packages §  Update hostname §  Download & install Splunk §  Configure inputs.conf §  Deploy custom certs §  Change default password §  Install sysstat §  Install & configure: §  License Master §  Cluster Master §  Peer Nodes §  Search Heads §  Deployment Server https://github.com/alanwill/ansible-splunk ~15-30 minutes to complete depending on instance type
  17. © 2014 Autodesk §  Easy to add/remove nodes §  Cloudformation

    + Ansible §  Dynamic §  Auto Scaling Groups for everything §  …even single instanced nodes (1/1/1) §  Splunk Search Head Pooling (NFS) Scalable
  18. © 2014 Autodesk §  Can be applied to all Splunk

    components §  Bootstrap Ansible playbook §  Could pre-bake but haven’t tried §  Consider dynamic portions Auto Scaling Groups
  19. © 2014 Autodesk §  Search Heads §  CPU based policy

    §  Peer Nodes §  Manual scaling, no policies §  Cluster/License Master, Deployment instance §  1/1/1 ASG (Single instance) §  Use EBS for persistent data Auto Scaling Groups
  20. © 2014 Autodesk §  Create EC2 instance with CloudFormation § 

    Run Ansible Playbook §  Install and configure Splunk §  Mount Search Head Pooling NFS volume Search Head Provisioning Code Example
  21. © 2014 Autodesk "SearchHeadInstance5" : { "Type" : "AWS::EC2::Instance", "Properties"

    : { "InstanceType" : { "Ref" : "SearchHeadInstanceType" }, "KeyName" : { "Ref" : "AppKeyName" }, "SubnetId" : { "Ref" : "PresentationSubnetAZ1" }, "ImageId" : ... "SecurityGroupIds" : ... "IamInstanceProfile": { "Ref": "SplunkInternalComponentsInstanceProfile" }, "BlockDeviceMappings" : [ { "DeviceName" : "/dev/xvda", "Ebs" : { "VolumeSize" : "10", "VolumeType":"gp2" } }, { "DeviceName" : "/dev/sdb", "VirtualName" : "ephemeral0" }, { "DeviceName" : "/dev/sdc", "VirtualName" : "ephemeral1" } ] , "Tags" : [ { "Key" : "purpose", "Value" : "Search Head" }, { "Key" : "stack", "Value" : { "Ref" : "EnvironmentName" } }, { "Key" : "app", "Value" : { "Ref" : "AppName" } }, { "Key" : "Name", "Value" : "Splunk Search Head" } ] } } Add new Search Head – Create EC2 instance
  22. © 2014 Autodesk - name: Dynamically change hostname shell: "hostname

    `curl http://169.254.169.254/latest/meta-data/instance-id`. {{ splunk_host_domain }}" - name: Download Splunk server binary get_url: dest=/home/ec2-user url={{ splunk_binary_url }} sha256sum={{ splunk_binary_sha256sum }} when: splunk_installed_result|failed - name: Install Splunk server binary yum: pkg=/home/ec2-user/{{ splunk_binary_file }} state=installed when: splunk_installer_present.stat.exists == true - name: Execute config_splunk_inputs.sh script shell: /home/ec2-user/config_splunk_inputs.sh when: splunk_running|failed - name: Start Splunk for the first time command: /bin/su --shell=/bin/bash --session-command="/opt/splunk/bin/splunk start --accept- license" splunk when: splunk_running|failed Add new Search Head – Ansible Splunk build
  23. © 2014 Autodesk §  CloudFormation §  Crete EC2 instance § 

    *Create EBS volumes and attach to instance §  *Mount EBS volumes §  Run Ansible Playbook §  Install and configure Splunk §  Mount Search Head Pooling NFS volume Peer Node Provisioning Code Example
  24. © 2014 Autodesk "PeerNodeInstance8" : { "Type" : "AWS::EC2::Instance", "Properties"

    : { "InstanceType" : { "Ref" : "PeerNodeInstanceType" }, "KeyName" : { "Ref" : "AppKeyName" }, "SubnetId" : { "Ref" : "ApplicationSubnetAZ1" }, "ImageId" : ... , "SecurityGroupIds" : ..., "IamInstanceProfile": { "Ref": "SplunkInternalComponentsInstanceProfile" }, "EbsOptimized" : true, "BlockDeviceMappings" : [ { "DeviceName" : "/dev/xvda", "Ebs" : { "VolumeSize" : "10", "VolumeType":"gp2" } }, { "DeviceName" : "/dev/sdb", "VirtualName" : "ephemeral0" }, { "DeviceName" : "/dev/sdc", "VirtualName" : "ephemeral1" } ], "Tags" : [ { "Key" : "purpose", "Value" : "Peer Node" }, { "Key" : "stack", "Value" : { "Ref" : "EnvironmentName" } }, { "Key" : "app", "Value" : { "Ref" : "AppName" } }, { "Key" : "Name", "Value" : "Splunk Peer Node" } ] } }, Add new Peer Node – Create EC2 instance
  25. © 2014 Autodesk "PeerNodeInstance8Volume1" : { "Type" : "AWS::EC2::Volume", "Properties"

    : { "Size" : { "Ref" : "PeerNodeVolumeSize" }, "VolumeType" : "gp2", "AvailabilityZone" : { "Fn::GetAtt" : [ "PeerNodeInstance1", "AvailabilityZone" ] }, "Tags" : [ { "Key" : "purpose", "Value" : "Peer Node Instance 1 storage" }, { "Key" : "stack", "Value" : { "Ref" : "EnvironmentName" } }, { "Key" : "app", "Value" : { "Ref" : "AppName" } }, { "Key" : "Name", "Value" : "Splunk Data" } ] }, "DeletionPolicy" : "Snapshot" }, "PeerNodeInstance8Volume2" : { "Type" : "AWS::EC2::Volume", "Properties" : { "Size" : { "Ref" : "PeerNodeVolumeSize" }, "VolumeType" : "gp2", "AvailabilityZone" : { "Fn::GetAtt" : [ "PeerNodeInstance1", "AvailabilityZone" ] }, "Tags" : [ { "Key" : "purpose", "Value" : "Peer Node Instance 1 storage" }, { "Key" : "stack", "Value" : { "Ref" : "EnvironmentName" } }, { "Key" : "app", "Value" : { "Ref" : "AppName" } }, { "Key" : "Name", "Value" : "Splunk Data" } ] }, "DeletionPolicy" : "Snapshot" }, Add new Peer Node – Create EBS volumes
  26. © 2014 Autodesk "PeerNodeInstance8Mount1" : { "Type" : "AWS::EC2::VolumeAttachment", "Properties"

    : { "InstanceId" : { "Ref" : "PeerNodeInstance1" }, "VolumeId" : { "Ref" : "PeerNodeInstance1Volume1" }, "Device" : "/dev/sdf" } }, "PeerNodeInstance8Mount2" : { "Type" : "AWS::EC2::VolumeAttachment", "Properties" : { "InstanceId" : { "Ref" : "PeerNodeInstance1" }, "VolumeId" : { "Ref" : "PeerNodeInstance1Volume2" }, "Device" : "/dev/sdg" } }, Add new Peer Node – Mount EBS volumes
  27. © 2014 Autodesk - name: Enable Peer Nodes command: runuser

    -l splunk -c "splunk edit cluster-config -mode slave -master_uri https:// {{ splunk_cluster_master_ip }}:8089 -replication_port 9887 -secret {{ replication_key }}" when: peer_nodes_clustering_enabled|failed register: peer_nodes_cluster_configure - name: Prewarm EBS volume1 command: dd if=/dev/zero of=/dev/sdf bs=1M when: splunk_volume_exists|failed ignore_errors: True - name: Create RAID 0 device command: mdadm --create --verbose /dev/md0 --level=stripe --raid-devices=2 /dev/sdf /dev/sdg when: splunk_volume_exists|failed - name: Create filesystem filesystem: fstype=ext4 dev=/dev/md0 when: splunk_volume_exists|failed - name: Mount volume mount: name=/opt/splunk/data src=/dev/md0 fstype=ext4 state=mounted when: splunk_volume_exists|failed Add new Peer Node – Ansible add to Cluster
  28. © 2014 Autodesk §  Search Heads – CPU bound § 

    C3 instances §  Peer Nodes/Indexers – IO bound §  *C3 instances + EBS §  I2 instances §  HS1 instances Responsive
  29. © 2014 Autodesk §  Maximize IOPs with RAID 0 § 

    Pre-warm volumes with dd for improved initial access times §  Not needed for i2 ephemeral SSD §  I2 instances – Terabytes of SSD §  35K+ read and write IOPs Responsive
  30. © 2014 Autodesk §  Project took ~4 weeks §  Took

    longer to co-ordinate cutover §  Time to delivery = biggest win §  Repeatable builds enables new use cases §  Very happy with results What did we learn
  31. © 2014 Autodesk §  Increase the “idempotency” of Ansible playbook

    §  Make CFN more dynamic for varied sized clusters §  Auto Scaling Groups Lifecycle actions §  Termination hooks for clean removal from cluster §  Test on Google Compute Engine Future
  32. © 2014 Autodesk §  Why we chose AWS to run

    Splunk §  Cost analysis process §  How we did it §  Infrastructure Goals §  Code examples §  What we learned §  Still to come In Summary
  33. © 2014 Autodesk §  CloudFormation Splunk Cluster Template §  https://github.com/alanwill/cfn-splunk

    §  Ansible Splunk Playbook §  https://github.com/alanwill/ansible-splunk §  Follow Me: @alanwill §  Email: [email protected] Contribute, PRs encouraged…
  34. Autodesk is a registered trademark of Autodesk, Inc., and/or its

    subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. © 2014 Autodesk. All rights reserved.