Building infrastructure on AWS with Ruby

Building infrastructure on AWS with Ruby

Transcript

  1. #VJMEJOHJOGSBTUSVDUVSF PO"84XJUI3VCZ !#SJTUPM$MPVE/BUJWF.FFUVQ 

  2. • Takayuki Watanabe • takanabe • @takanabe_w • Cookpad Inc.

    • Site Reliability Engineer 8IP
  3. What is Cookpad ?

  4. None
  5. https://cookpad.com Cookpad is the largest recipe sharing service in the

    world
  6. https://cookpad.com Originally a recipe site for Japan

  7. (MPCBMαʔϏεͷઆ໌

  8. (MPCBMαʔϏεͷઆ໌ We are currently expanding regions

  9. (MPCBMαʔϏεͷઆ໌ 21 Languages 67 Countries https://cookpad.com/us https://cookpad.com/id ɾ ɾ ɾ

  10. (MPCBMαʔϏεͷઆ໌ Of course …

  11. (MPCBMαʔϏεͷઆ໌ We have a site for the United Kingdom !!

    https://cookpad.com/uk
  12. 8BMLUIPVHIUIJTUBML • My experiences and opinions about organization growth •

    How we manage our infrastructure in Ruby DSL
  13. Whobuilds infrastructures? Organization Story

  14. *ODSFBTJOHPG&OHJOFFST ZFBS    PGFOHJOFFST ɾ 'FXFOHJOFFSTBSFJO+BQBO ɾ PGVTFSTBSFTNBMM

    ɾ "SPVOEFOHJOFFST ɾ PGFOHJOFFSTBSFPVUTJEFPG+BQBO ɾ 5IFOVNCFSPGUFBNTBOETFSWJDFTJODSFBTF ɾ 6TFSTJODSFBTF ɾ "SPVOEFOHJOFFST ɾ .PTUPGFOHJOFFSTBSFJO+BQBO *KPJOFE$PPLQBE
  15. *ODSFBTJOHPG&OHJOFFST ZFBS    PGFOHJOFFST ɾ "SPVOEFOHJOFFST ɾ PGFOHJOFFSTBSFPVUTJEFPG+BQBO

    ɾ 5IFOVNCFSPGUFBNTBOETFSWJDFTJODSFBTF ɾ 6TFSTJODSFBTF ɾ "SPVOEFOHJOFFST ɾ .PTUPGFOHJOFFSTBSFJO+BQBO TUBHF TUBHF *KPJOFE$PPLQBE ɾ 'FXFOHJOFFSTBSFJO+BQBO ɾ PGVTFSTBSFTNBMM TUBHF
  16. Selection of infrastructures and development approaches

  17. 4FMFDUJPOPGJOGSBTUSVDUVSFTBOEBQQSPBDIFT • Which platform do we use for production environment?

    • PaaS, IaaS, on premise or etc … • Which development approaches(software architecture)? • monolithic architecture, SOA, microservices or etc…
  18. 1MBUGPSNBQQSPBDIFTGPSTUBHF • Managed PaaS like Heroku • Enables developers to

    focus on development of service • We don’t need infra engineers • Monolithic Architecture • Communication is quite easy and comfortable
  19. 1MBUGPSNBQQSPBDIFTGPSTUBHF • Cloud Service Providers like AWS, GCP, Azure •

    Use Virtual Machines and managed service for production environment • Enable us to build more flexible infrastructures • Users gradually increase and scalabilities of capacity are important • Monolithic Architecture • Communication is still easy and comfortable
  20. 1MBUGPSNBQQSPBDIFTGPSTUBHF • Cloud Service Providers like AWS, GCP, Azure •

    Use Containers and managed service for production environment • stage 2 is a warmup period to move stage3 • Enable us to build more flexible infrastructures • Scalabilities of capacity are important • Microservice Architecture • Communication costs are high due to the number of engineers • Separation of authorities and responsibilities are necessary to scale out an organization
  21. infrastructures for Cookpad

  22. "84 Most servers for our services exist on AWS

  23. 3FBTPOTGPSVTJOH"84 • We use AWS since its early stage •

    AWS resources can be controlled via APIs • Pioneering cloud provider in this context • We use management tools written in Ruby for AWS • Declare AWS resources in Ruby DSL
  24. 8IZXFVTF3VCZ 

  25. SFGIUUQTTQFBLFSEFDLDPNB@NBUTVEBUIFSFDJQFGPSUIFXPSMETMBSHFTUSBJMTNPOPMJUI

  26. SFGIUUQTTQFBLFSEFDLDPNB@NBUTVEBUIFSFDJQFGPSUIFXPSMETMBSHFTUSBJMTNPOPMJUI Most of our products are implemented in Ruby

  27. SFGIUUQTTQFBLFSEFDLDPNB@NBUTVEBUIFSFDJQFGPSUIFXPSMETMBSHFTUSBJMTNPOPMJUI Site Reliability Engineers also use primarily in Ruby

  28. Tools for our infrastructures

  29. 5PPMTGPSPVSJOGSBTUSVDUVSFT • AWS resource management • Other server resource management

    • Database management tools • CDN management tools • Server configuration management (provisioning) • Deployment tools
  30. "84SFTPVSDFNBOBHFNFOU We codify AWS resources in Ruby DSL • Change

    history can be investigated from VCS(git, svn) • Idempotent • Current conditions of AWS resources should be synced to our codes • Don’t allow manual configuration changes to avoid chaos • If codes don’t have manual changes, they will be forcibly erased • Learning costs are low • Non-SRE engineer also can create PRs • Most of tools have: • dry-run feature to confirm changes before applying them • export feature to reflect current AWS condition to Ruby DSL
  31. "84SFTPVSDFNBOBHFNFOU • We ride on codenize.tools (https://codenize.tools/) • mainly maintained

    by one of our SRE • terraform was not ready when we started to use AWS • stable enough • easy to use
  32. "84SFTPVSDFNBOBHFNFOU Without any change tracking measures for the following resources,

    it tends to be linked to high operational costs • Route53 • Route Tables for Virtual Private Cloud • Identity and Access Management • Security Group • Elastic IP Addresses • Elastic Load Balancer • S3 Bucket Policy • CloudWatch Logs & Alarms
  33. 3PVUF • DNS service of AWS • Use Roadworker (https://github.com/codenize-tools/

    roadworker) to define states of Route53 using Ruby DSL
  34. hosted_zone "example.com." do rrset "example.com.", "A" do ttl 300 resource_records(

    "127.0.0.1", "127.0.0.2" ) end end 3PVUF (e.g) Declaration of “example.com” A record to Route53
  35. 71$3PVUF5BCMFT • Set of rules to determine where network traffic

    is directed • Use Mappru (https://github.com/codenize-tools/mappru) to define states of VPC Route Tables using Ruby DSL
  36. vpc "vpc-12345678" do route_table "foo-rt" do subnets "subnet-12345678" route destination_cidr_block:

    "0.0.0.0/0", gateway_id: "igw-12345678" route destination_cidr_block: “192.168.100.101/32", network_interface_id: "eni-12345678" end route_table "bar-rt" do subnets "subnet-87654321" route destination_cidr_block: "192.168.100.102/32", network_interface_id: "eni-87654321" end # Undefined Route Table will be ignored end 71$3PVUF5BCMFT (e.g) Declaration of Route Tables for vpc-12345678
  37. 4FDVSJUZ(SPVQT • Security Groups is a virtual firewall that controls

    the traffic for one or more instances • Use Piculet (https://github.com/codenize-tools/piculet) to define states of Route53 using Ruby DSL
  38. 4FDVSJUZ(SPVQT ec2 "vpc-XXXXXXXX" do security_group "default" do description "default VPC

    security group" tags( "key1" => "value1", "key2" => "value2" ) ingress do permission :tcp, 22..22 do ip_ranges( "0.0.0.0/0", ) end permission :tcp, 80..80 do ip_ranges( "0.0.0.0/0" ) end permission :udp, 60000..61000 do ip_ranges( "0.0.0.0/0" ) end # ESP (IP Protocol number: 50) permission :"50" do ip_ranges( "0.0.0.0/0" ) end permission :any do groups( "any_other_group", "default" ) end end # Continue to the right codes # Continue from the left codes egress do permission :any do ip_ranges( "0.0.0.0/0" ) end end end security_group "any_other_group" do description "any_other_group" tags( "key1" => "value1", "key2" => "value2" ) egress do permission :any do ip_ranges( "0.0.0.0/0" ) end end end end (e.g) Declaration of Security Groups for vpc-XXXXXXXX
  39. &MBTUJD*1"EESFTTFT • An Elastic IP address is a static IPv4

    address designed for dynamic cloud computing • Use Eipmap (https://github.com/codenize-tools/eipmap) to define states of Elastic IP Addresses using Ruby DSL
  40. domain "standard" do ip "54.256.256.1" ip "54.256.256.2", :instance_id=>"i-12345678" end domain

    "vpc" do ip "54.256.256.11", :network_interface_id=>"eni-12345678", :private_ip_address=>"10.0.1.1" ip "54.256.256.12", :network_interface_id=>"eni-12345678", :private_ip_address=>"10.0.1.2" ip "54.256.256.13" end &MBTUJD*1"EESFTTFT • (e.g) Declaration of Elastic IP Addresses
  41. *EFOUJUZBOE"DDFTT.BOBHFNFOU • AWS Identity and Access Management is a web

    service that helps us securely control access to AWS resources • Use Miam (https://github.com/codenize-tools/miam) to define states of Elastic IP Addresses using Ruby DSL
  42. user "takayuki-watanabe", :path=>"/infra/" do login_profile :password_reset_required=>false groups( "Admin" ) end

    group "Admin", :path => "/admin/" do policy "Admin" do {"Statement"=>[{"Effect"=>"Allow", "Action"=>"*", "Resource"=>"*"}]} end end *EFOUJUZBOE"DDFTT.BOBHFNFOU
  43. &MBTUJD-PBE#BMBODJOH • Elastic Load Balancing distributes incoming application traffic across

    multiple targets, such as Amazon EC2 instances, containers, and IP addresses • Use Kelbim (https://github.com/codenize-tools/kelbim) to define states of Elastic IP Addresses using Ruby DSL
  44. ec2 "vpc-XXXXXXXXX" do load_balancer "my-load-balancer", :internal => true do instances(

    "nyar", "yog" ) # or `any_instances` listeners do listener [:tcp, 80] => [:tcp, 80] listener [:https, 443] => [:http, 80] do app_cookie_stickiness "CookieName"=>"20" ssl_negotiation ["Protocol-TLSv1", "Protocol-SSLv3", "AES256-SHA", ...] server_certificate "my-cert" end end health_check do target "TCP:80" timeout 5 interval 30 healthy_threshold 10 unhealthy_threshold 2 end attributes do access_log :enabled => true, :s3_bucket_name => "any_bucket", :s3_bucket_prefix => nil, :emit_interval => 60 cross_zone_load_balancing :enabled => true connection_draining :enabled => false, :timeout => 300 end subnets( "subnet-XXXXXXXX" ) security_groups( "default" ) end end &MBTUJD-PBE#BMBODJOH • (e.g) Declaration of Elastic Load Balancing for vpc-XXXXXXXX
  45. 4#VDLFU1PMJDZ • Bucket policy and user policy are two of

    the access policy options available for you to grant permission to your Amazon S3 resources • Use Bukelatta (https://github.com/codenize-tools/kelbim) to define states of Elastic IP Addresses using Ruby DSL
  46. bucket "foo-bucket" do { "Version"=>"2012-10-17", "Id"=>"AWSConsole-AccessLogs-Policy-XXX", "Statement"=> [ { "Sid"=>"AWSConsoleStmt-XXX",

    "Effect"=>"Allow", "Principal"=>{"AWS"=>"arn:aws:iam::XXX:root"}, "Action"=>"s3:PutObject", "Resource"=> "arn:aws:s3:::foo-bucket/AWSLogs/XXX/*" } ] } end 4#VDLFU1PMJDZ • (e.g) Declaration of S3 Bucket Policy for foo-bucket
  47. $MPVE8BUDI-PHT"MBSN • Amazon CloudWatch offers cloud monitoring services for customers

    of AWS resources • Use Meteorlog (https://github.com/codenize-tools/ meteorlog) to define states of CloudWatch Logs using Ruby DSL • Use Radiosonde (https://github.com/codenize-tools/ radiosonde) to define states of CloudWatch Alarms using Ruby DSL
  48. log_group "/var/log/messages" do log_stream "my-stream" metric_filter "MyAppAccessCount" do metric :name=>"EventCount",

    :namespace=>"YourNamespace", :value=>"1" end metric_filter "MyAppAccessCount2" do filter_pattern '[ip, user, username, timestamp, request, status_code, bytes > 1000]' metric :name=>"EventCount2", :namespace=>"YourNamespace2", :value=>"2" end end log_group "/var/log/maillog" do log_stream "my-stream2" metric_filter "MyAppAccessCount" do filter_pattern '[..., status_code, bytes]' metric :name=>"EventCount3", :namespace=>"YourNamespace", :value=>"1" end metric_filter "MyAppAccessCount2" do filter_pattern '[ip, user, username, timestamp, request = *html*, status_code = 4*, bytes]' metric :name=>"EventCount4", :namespace=>"YourNamespace2", :value=>"2" end end $MPVE8BUDI-PHT • (e.g) Declaration of CloudWatch Logs streams
  49. alarm "alarm1" do namespace "AWS/EC2" metric_name "CPUUtilization" dimensions "InstanceId"=>"i-XXXXXXXX" period

    300 statistic :average threshold ">=", 50.0 evaluation_periods 1 actions_enabled true alarm_actions [] ok_actions [] insufficient_data_actions ["arn:aws:sns:us-east-1:123456789012:my_topic"] end alarm "alarm2" do ... end $MPVE8BUDI"MBSNT • (e.g) Declaration of CloudWatch Alarms
  50. Database management

  51. .Z42-QSJWJMFHFT • The privileges granted to a MySQL account determine

    which operations the account can perform • Use Gratan (https://github.com/codenize-tools/gratan) to define states of MySQL access privileges using Ruby DSL
  52. user "bob", "%" do on "*.*" do grant "USAGE" end

    on "test.*", expired: '2014/10/08', identified: "PASSWORD '*ABCDEF'" do grant "SELECT" grant "INSERT" end on /^foo\.prefix_/ do grant "SELECT" grant "INSERT" end end user "bob", ["localhost", "192.168.%"], expired: '2014/10/10' do on "*.*", with: 'GRANT OPTION' do grant "ALL PRIVILEGES" end end .Z42-QSJWJMFHFT • (e.g) Declaration of MySQL privileges
  53. 0OMJOFTDIFNBNJHSBUJPO • pt-online-schema-change (https://www.percona.com/doc/ percona-toolkit/3.0/pt-online-schema-change.html) performs online, non-blocking schema changes

    to a table • Use Departure (https://github.com/departurerb/departure) without needing to use a different DSL other than Rails' migrations DSL (under trial) • Departure uses pt-online-schema-change command-line tool of Percona Toolkit which runs MySQL alter table statements without downtime
  54. CDN management

  55. 'BTUMZ • We use Fastly as our CDN • Use

    codily (https://github.com/sorah/codily) to define states of Fastly using Ruby DSL
  56. service "foo" do response_object "method not allowed" do status "405"

    response "Method Not Allowed" content "405" content_type "text/plain" request_condition "request method is not GET, HEAD or FASTLYPURGE" do priority 10 statement '!(req.request == "GET" || req.request == "HEAD" || req.request == "FASTLYPURGE")' end end end # equals as follows: service "foo" do condition "request method is not GET, HEAD or FASTLYPURGE" do priority 10 statement '!(req.request == "GET" || req.request == "HEAD" || req.request == "FASTLYPURGE")' type "REQUEST" end response_object "method not allowed" do status "405" response "Method Not Allowed" content "405" content_type "text/plain" request_condition "request method is not GET, HEAD or FASTLYPURGE" end end 'BTUMZ • (e.g) Declaration of Fastly configurations
  57. Server Configuration Management

  58. 4FSWFSDPOpHVSBUJPOT • Around a thousand EC2 instances are running on

    AWS • We used puppet previously as our configuration management • We want to use light tools like Ansible but also want to use Ruby DSL
  59. IUUQTHJUIVCDPNJUBNBFLJUDIFOJUBNBF

  60. *UBNBF • Configuration management tool inspired by Chef • An

    itamae (൘લ) is a cook in a Japanese kitchen • Chef-like Ruby DSL (but not compatible with Chef) • Simpler and lighter weight than Chef • Only recipes • Apply recipes to a local machine • Apply recipes to a remote machine over ssh • Idempotent
  61. *UBNBF • (e.g) A sample recipe for nginx package 'nginx'

    do action :install end service 'nginx' do action [:enable, :start] end template "/path/to/dest" do action :create source "template.erb" variables(message: "World") end # template.erb Hello, <%= @message %>
  62. Deployment tools

  63. $BQJTUSBOP • Deploy Rails applications via Capistrano3 • Use Capistrano::BundleRsync

    (https://github.com/ sonots/capistrano-bundle_rsync) • Chat bot can invoke the deploy jobs via a deploy server
  64. 1SPCMFNTGPSUIFTFUPPMT • Tools explained in the previous slides work quite

    well for stage2 and a personal usage even though limited SRE can apply them to production environments • But if only SRE has privileges to use these tools, there might be problems in development scalabilities at stage3
  65. 1SPCMFNTGPSUIFTFUPPMT FYBNQMFT • Developers cannot update environment variables • SREs

    deploy them via Itamae • Developers cannot install software by themselves • SREs install them via Itamae • Developers cannot use new AWS resources soon • SREs deploy them via Codenize tools • SREs and Developers cannot work productively • frequent ops work might be requested to SREs and it becomes bottleneck of developments • Some part of authorities and responsibilities should be given them
  66. What about containers?

  67. Amazon ECS

  68. %PDLFSDPOUBJOFSTPO&$4 • Use Amazon ECS • ECS allows us to

    easily run and manage Docker-enabled applications across a cluster of EC2 instances. • Use hako (https://github.com/eagletmt/hako) to deploy Docker containers onto ECS clusters • Some applications will use this container environment
  69. %FQMPZNFOUqPXXJUI)BLP • Deploy containers via hako to ECS clusters and

    inject necessary data • Docker images are stored in ECR • Credentials are stored in Vault • Container app definitions are managed in yaml
  70. %FQMPZNFOUqPXXJUI)BLP • (e.g) A sample hako app definition file scheduler:

    type: ecs region: ap-northeast-1 cluster: eagletmt desired_count: 2 task_role_arn: arn:aws:iam::012345678901:role/Hello deployment_configuration: maximum_percent: 200 minimum_healthy_percent: 50 app: image: ryotarai/hello-sinatra memory: 128 cpu: 256 links: - redis:redis env: $providers: - type: file path: hello.env PORT: '3000' MESSAGE: '#{username}-san' # Continue to the right codes # Continue from the right codes additional_containers: front: image_tag: hako-nginx memory: 32 cpu: 32 redis: image_tag: redis:3.0 cpu: 64 memory: 512 scripts: - <<: !include front.yml backend_port: 3000
  71. %FQMPZNFOUGSPN4MBDL • Invoke deploy jobs defined on Rundeck via a

    chat bot • Use ruboty (https://github.com/r7kamura/ruboty) for chatops on Slack
  72. *OUSPEVDUJPOPGDPOUBJOFSFOWJSPONFOUT • Developers can update environment variables • hako app

    yaml has environment variables for each application • Developers can install necessary software by themselves • Docker images include all software for each application • Developers can use new AWS resources soon • Many AWS resources are ready for use after deploying containers to our ECS clusters • SRE and Developers will become productively • Authority and responsibilities are given them
  73. Conclusions

  74. )VNCMFPQJOJPOT • Organization becomes big suddenly • Traditional development styles

    might not work suddenly and need to change them • There are technologies to support us and give more scalable environments • (e.g) Virtual Machine → Containers • (e.g) Monolithic architecture → Microservice architecture • But engineers cannot change their traditional workflows suddenly • Investigation and research at stage 2 is really important in terms of development scalabilities at stage 3 • At this moment, for container orchestrations, using kubernetes is better instead of ECS • Many players around containers join to kubernetes and develop eco systems (standing on the shoulders of giants)
  75. 3FDBQ • Selection of infrastructure platforms and approaches are important

    depend on the organization expansions • Writing infrastructures in Ruby DSL is pretty easy and works well • When organization becomes big, traditional workflow might not work
  76. 5IBOL:PV