Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud patterns applied

Cloud patterns applied

Making the most of EC2 at EyeEm

Lars Fronius

November 11, 2014
Tweet

More Decks by Lars Fronius

Other Decks in Technology

Transcript

  1. • Site Reliability Engineer at EyeEm • How do computers

    even work? • Started as an operations guy in a scientific datacenter • Now mostly developing and making users and developers happy M E @LarsFronius [email protected]
  2. —Paul Hammond “If you think you can prevent failure, then

    you aren’t developing your ability to respond.”
  3. • Have as few as possible machines containing application state

    • Test restores of stateful machines • …all the time.
  4. • Have as few as possible machines containing application state

    • Test restores of stateful machines • …all the time.
  5. • Throw away stateless servers • Make sure they can

    come up again towards their expected behaviour
  6. —John Allspaw, Richard Cook “The goal of operations is to

    have every day be just another boring day. Achieving this boredom depends on foreseeing the future performance of the system and making adjustments accordingly.”
  7. • Single responsibility servers / services • Security Groups control

    the interface how services are supposed to talk to another
  8. • Single responsibility servers / services • Security Groups control

    the interface how services are supposed to talk to another • …and can be used to assign server role.
  9. Backend Security Group role=backend Database Security Group role=database Redis Security

    Group role=feeds Allow Inbound Backend 3306 Allow Inbound Backend 6379
  10. Backend Security Group role=backend Database Security Group role=database Redis Security

    Group role=feeds Allow Inbound Backend 3306 Allow Inbound Backend 6379 Base Security Group Metrics Security Group role=metrics Allow Inbound Base 8125
  11. Backend Security Group role=backend Database Security Group role=database Redis Security

    Group role=feeds Allow Inbound Backend 3306 Allow Inbound Backend 6379 Base Security Group Metrics Security Group role=metrics Allow Inbound Base 8125 production branch=master
  12. Backend Security Group role=backend Database Security Group role=database Redis Security

    Group role=feeds Allow Inbound Backend 3306 Allow Inbound Backend 6379 Base Security Group Metrics Security Group role=metrics Allow Inbound Base 8125 production branch=master feature_x staging branch=feature_x Backend Security Group role=backend Database Security Group role=database Redis Security Group role=feeds Allow Inbound Backend 3306 Allow Inbound Backend 6379 Base Security Group
  13. {! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI

    staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }! }! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/ provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },! "puppeteers"! ],! "Tags": [! {! "Key": "restore_from_extract",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/ provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "db1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "db1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "db1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "db1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "dbprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "extract-access"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "dbsg": {! "Properties": {! "GroupDescription": "db",! "SecurityGroupIngress": [! {! "FromPort": "3306",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "3306"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "db"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "puppetprovisioningprofile": {! "Properties": {! "Path": "/",! "Roles": [! "puppet-provisioning"! ]! },! "Type": "AWS::IAM::InstanceProfile"! },! "redis1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "redissg"! },! "puppeteers"! ],! "Tags": [],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! "#!/bin/bash\ncurl https://s3-eu-west-1.amazonaws.com/eyeem-deb-packages/gpg-key.asc | apt-key add -\necho \"deb http://eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com $(lsb_release -cs) stable\" > /etc/apt/sources.list.d/eyeem.list\necho \"Package: *\nPin: origin eyeem-deb-packages.s3-website-eu-west-1.amazonaws.com\nPin-Priority: 550\" > /etc/apt/preferences.d/eyeem\naptitude update\naptitude install -y python-boto\nfetch_file s3://eyeem-configuration-management/provisioning/sg_tags.py > sg_tags.py\nexport puppetbranch=$(python sg_tags.py puppetbranch)\nif [ $puppetbranch != \"\" ]; then\n fetch_file \"s3://eyeem-configuration-management/ provisioning-${puppetbranch}/base.user-data\" > ./base.sh\nelse\n fetch_file \"s3://eyeem-configuration-management/provisioning/base.user-data\" > ./base.sh\nfi\nbash ./base.sh\n",! "\ncurl -X PUT -H 'Content-Type:' --data-binary '{\"Status\":\"SUCCESS\",\"Reason\":\"we made it here.\",\"UniqueId\":\"puppetwait\",\"Data\":\"Its gonna be alright.\"}' '",! {! "Ref": "redis1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "redis1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "redis1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "redis1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "redissg": {! "Properties": {! "GroupDescription": "redis",! "SecurityGroupIngress": [! {! "FromPort": "6379",! "IpProtocol": "tcp",! "SourceSecurityGroupName": {! "Ref": "backendsg"! },! "ToPort": "6379"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "redis"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! }! }! }
  14. {! "Outputs": {! "ApiEndpoint": {! "Description": "DNS Endpoint to feature_xAPI

    staging",! "Value": {! "Ref": "apiendpoint"! }! },! "backend1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PrivateDnsName"! ]! }! },! "backend1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 backend1 instance",! "Value": {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! },! "db1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PrivateDnsName"! ]! }! },! "db1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 db1 instance",! "Value": {! "Fn::GetAtt": [! "db1",! "PublicDnsName"! ]! }! },! "redis1PrivateDNS": {! "Description": "Private DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PrivateDnsName"! ]! }! },! "redis1PublicDNS": {! "Description": "Public DNSName of the newly created EC2 redis1 instance",! "Value": {! "Fn::GetAtt": [! "redis1",! "PublicDnsName"! ]! }! }! },! "Resources": {! "apiendpoint": {! "Properties": {! "HostedZoneId": "Z3HTG0V9588TAA",! "Name": "api.feature_x.eyeem.com.",! "ResourceRecords": [! {! "Fn::GetAtt": [! "backend1",! "PublicDnsName"! ]! }! ],! "TTL": 300,! "Type": "CNAME"! },! "Type": "AWS::Route53::RecordSet"! },! "backend1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "puppetprovisioningprofile"! },! "ImageId": "ami-f2191786",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "backendsg"! },! "base"! ],! "Tags": [! {! "Key": "background_tasks",! "Value": "false"! },! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "jenkins_access",! "Value": ""! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "service_discovery",! "Value": "true"! }! ],! "UserData": {! "Fn::Base64": {! "Fn::Join": [! "",! [! {! "Ref": "backend1puppetwaithandle"! },! "'"! ]! ]! }! }! },! "Type": "AWS::EC2::Instance"! },! "backend1puppetwaitcondition": {! "Properties": {! "Handle": {! "Ref": "backend1puppetwaithandle"! },! "Timeout": "7200"! },! "Type": "AWS::CloudFormation::WaitCondition"! },! "backend1puppetwaithandle": {! "Type": "AWS::CloudFormation::WaitConditionHandle"! },! "backendsg": {! "Properties": {! "GroupDescription": "backend",! "SecurityGroupIngress": [! {! "CidrIp": "0.0.0.0/0",! "FromPort": "22",! "IpProtocol": "tcp",! "ToPort": "22"! },! {! "CidrIp": "0.0.0.0/0",! "FromPort": "80",! "IpProtocol": "tcp",! "ToPort": "80"! }! ],! "Tags": [! {! "Key": "branch",! "Value": "feature_x"! },! {! "Key": "monitoring",! "Value": "false"! },! {! "Key": "puppetbranch",! "Value": "master"! },! {! "Key": "role",! "Value": "backend"! }! ]! },! "Type": "AWS::EC2::SecurityGroup"! },! "db1": {! "Properties": {! "IamInstanceProfile": {! "Ref": "dbprovisioningprofile"! },! "ImageId": "ami-25488752",! "InstanceType": "c3.large",! "KeyName": "eyeem-prod-new",! "SecurityGroups": [! {! "Ref": "dbsg"! },!
  15. –json.org “JSON (JavaScript Object Notation) is a lightweight data-interchange format.

    It is easy for humans to read and write. It is easy for machines to parse and generate.”
  16. –json.org “JSON (JavaScript Object Notation) is a lightweight data-interchange format.

    It is easy for humans to read and write. It is easy for machines to parse and generate.”
  17. eyeemstack create --machines backend db feed --restore_db extract --branch feature_x

    • Python tool on top of troposphere, a python library to create CloudFormation descriptions
  18. class eyeem::profiles::backend::deploy {! eyeem::deploy_codebase { “backend”:! directory => ‘/var/www/backend’,! bucket

    => ‘eyeem-web-backend’,! filename => “backend-${::branch}.tar.gz”,! restart => [‘nginx’, ’php5-fpm’]! }! }! ! ! ! define eyeem::deploy_codebase (! $prefix = '',! $directory,! $bucket,! $filename,! $restart ) {! ! if (member($::mountpoints, “${directory}/current”) and $::environment == ‘local’) {! notice(“Looks like we are on Vagrant and you mounted the code in, skipping deploy.”)! } else {! ( . . . )! }! }
  19. • ~70 Cents for a single test run. • ~3.50

    $ per workday. • ~17.64 $ for always on staging per day. • Tests disaster recovery on a sample dataset. • Scalable setup. • < 10 minutes
  20. Backend Security Group role=backend Base Security Group Metrics Security Group

    role=metrics Allow Inbound Base 8125 production branch=master feature_x staging branch=feature_x Backend Security Group role=backend Base Security Group Inventory Service Security Group role=inventory Instance Instance branch=feature_x public_dns=api.feature_x.eyeem.com branch=master public_dns=api.eyeem.com
  21. Backend Security Group role=backend Base Security Group Metrics Security Group

    role=metrics Allow Inbound Base 8125 production branch=master feature_x staging branch=feature_x Backend Security Group role=backend Base Security Group Inventory Service Security Group role=inventory Instance Instance branch=feature_x public_dns=api.feature_x.eyeem.com branch=master public_dns=api.eyeem.com Jobrunner Service Security Group role=jobrunner
  22. • ~350 Job Executions last month • 350 times self

    service operations • Stagings everywhere • Definition of Done: Can you boot it up using EyeEmStack and Vagrant? • Lots of 99.999s%
  23. • “Everything fails all the time.” • Test your repairs,

    automate everything. • Distribute your data. • Applications should be able to handle state transitions of service-parts and diagnose failure. • Design your infrastructure towards acting as a service provider to your developers.