Architecting for the Cloud

Slide 1

Slide 1 text

ARCHITECTING for the CLOUD leonidas tsementzis aka @goldstein

Slide 2

Slide 2 text

# get social # awsuggr

Slide 3

Slide 3 text

leonidas tsementzis aka @goldstein # who’s talking * software architect, engineer [all major web/mobile platforms] * devOps [enthusiast, not a real sysadmin] * entrepreneur [n00b]

Slide 4

Slide 4 text

# format * the problem * development * deployment * failure * limitations * conclusion * questions

Slide 5

Slide 5 text

# the problem * increasing/decreasing resources on the fly using auto scaling * availability * performance * multi server painless deployment

Slide 6

Slide 6 text

:development:

Slide 7

Slide 7 text

# your stack matters * the single most important aspect * cloud-ready open source libraries for major platforms * saves you a lot of development time * rapid changes * can lock you in

Slide 8

Slide 8 text

# memory * avoid application level variables/ sessions * centralized storage: ✔ fast ✔ scalable ✔ efficient Amazon DynamoDB

Slide 9

Slide 9 text

:storage:

Slide 10

Slide 10 text

# storage - single server server

Slide 11

Slide 11 text

# storage - multi server server 1 server farm server 2 server 3 server 4 - scripts - static files

Slide 12

Slide 12 text

# storage - multi server - S3 server 1 server farm server 2 server 3 server 4 - scripts - static files

Slide 13

Slide 13 text

# storage application /local/address/site \\unc\path\site S3 API STORAGE MIDDLEWARE

Slide 14

Slide 14 text

# storage * local filesystem * network storage * Amazon S3 * Rackspace CloudFiles * database (BLOB) * GridFS (MongoDB) * FTP, SFTP * Azure using a pluggable storage middleware, we can create storages like...:

Slide 15

Slide 15 text

# storage ...and hopefully we don’t have to:

Slide 16

Slide 16 text

# storage * avoid using HEAD/GET requests to check for existing files * store file list in memory instead * use S3 “PRELOAD_METADATA” ...but if we have to:

Slide 17

Slide 17 text

:task queuing:

Slide 18

Slide 18 text

# task queuing * image resizes * external api calls * low priority updates * intensive calculations * big data queues * preparing hot caches * indexing updates * logging use message/task queues for long running operations:

Slide 19

Slide 19 text

# task queuing * organize tasks into different queues * organize queues into priority workers * scale workers using AWS auto scaling - send custom alerts using AWS CloudWatch API * it’s all about priorities Amazon SQS

Slide 20

Slide 20 text

:database:

Slide 21

Slide 21 text

# database * Amazon RDS does the trick if you’re on MySQL or Oracle * shard early mark down table dependencies from the start, work around this while you grow

Slide 22

Slide 22 text

:deployment:

Slide 23

Slide 23 text

# huh? * it’s your code * you know the dependencies * you know it’s breaking points * it’s your job to deal with deployment failures * continuous deployment? yes please!

Slide 24

Slide 24 text

# requirements * 50+ deployments per day from n devs * secure * fast rollbacks on failure * zero downtime * dependency handling (restart services, migrate dbs etc.)

Slide 25

Slide 25 text

# continuous deployment repo dev dev dev dev git pull master git push/pull 0.0.0.1 server farm 0.0.0.2 0.0.0.3 0.0.0.4 $: fab production deploy

Slide 26

Slide 26 text

# where the magic happens

Slide 27

Slide 27 text

pull from master -> run test suite (abort on failure) -> deploy/compress static files on S3 -> install new dependencies -> run db migration scripts -> cleanup -> rollback if something fails -> clone previous production for backup -> backup live db -> pre-compile less etc -> restart services if required

Slide 28

Slide 28 text

# continuous deployment * master is always production safe use pull request for large teams * bootstrapped pre-configured AMIs * handle stale servers with care assumptions: tools:

Slide 29

Slide 29 text

:failure:

Slide 30

Slide 30 text

# failure “Design for failure and nothing will fail” “Everything fails, all the time” ~ Amazon CTO

Slide 31

Slide 31 text

# failure * backup/restore strategy * bootstrapped AMIs * multi-AZ deployment

Slide 32

Slide 32 text

:limitations:

Slide 33

Slide 33 text

# limitations * disk I/O ✔ use multiple EBS in RAID config * database ✔ sharding ✔ multiple read-only ✔ clustering * ram ✔ memcache/redis replication

Slide 34

Slide 34 text

# recap * the problem * development * deployment * failure * limitations * conclusion * questions

Slide 35

Slide 35 text

:one more thing:

Slide 36

Slide 36 text

:vendor lock-in: if you’re still following, there’s no such thing on AWS

Slide 37

Slide 37 text

# vendor lock-in * S3 ✔ pluggable storages * EC2 ✔ normal unix box * DynamoDB ✔ fully compatible NoSQL * RDS ✔ fully compatible MySQL/Oracle

Slide 38

Slide 38 text

:conclusion:

Slide 39

Slide 39 text

# conclusion * use best practices and you’ll be safe * your stack matters * Cloud != high availability * Cloud != high performance * Cloud != magic (but it’s close)

Slide 40

Slide 40 text

# questions? challenges? ? @goldstein aka leonidas tsementzis leotsem [at] gmail.com

Slide 41

Slide 41 text

# thank you @goldstein aka leonidas tsementzis leotsem [at] gmail.com !