mainly builders interested in ECS(especially Fargate) operations ◦ mainly builders interested in ECS-related OSS tools • Takeaways: ◦ knowledge of some OSS tools for ECS ◦ know-how of operate ECS • Story: ◦ Operational problems that tend to arise as products grow using ECS/Fargate. ▪ How to solve them
MVP-Product preferred. ◦ Container orchestration based on AWS services ▪ familiarize with AWS Builders ◦ Simple and low catch up cost. ▪ k8s is flexible and scalable • but also complex and expensive to set up and manage. ▪ Not taking up a lot of time with maintenance issue • such as periodic cluster updates 💡ECS=primitive; k8s=platform; tends to be preferred by those who aren’t planning a large scale web services or platform environment.
RDS) ◆First project organization Single-scrum team ◆Growth phase architecture More services(batch, worker, admin app, external integration) More tasks(monitoring-agent, log-router, web-server) ◆Growth phase organization Application team Infra team Security team BI team Intro: Why use ECS However, ECS will also scale as the product grows
have experienced • Deployment operational issues ◦ Organizing repository and team responsibilities/ownership. ◦ How to manage the increasing number of task definitions. ◦ How to add additional architectures such as batch processing. • Operational Issues for Troubleshooting ◦ Developers/SREs struggling to debug with log-only info. ◦ Some developers want to exec to the container ▪ to check the status of directories and processes.
have experienced • Deployment operational issues ◦ Organizing repository and team responsibilities/ownership. ◦ How to manage the increasing number of task definitions. ◦ How to add additional architectures such as batch processing. • Operational Issues for Troubleshooting ◦ Developers/SREs struggling to debug with log-only info. ◦ Some developers want to exec to the container ▪ to check the status of directories and processes
tool allows centralized management of ECS resources must have seemed useful at the early stage(e.g. CDK, Terraform, Copilot). • However, app/infra teams are split and ownership is separated, centralized manage would become noisy. ◦ Appropriate separation of interests may be necessary. • Must avoid to operational accidents. ◦ e.g. task-def redundant management: ▪ created in IaC repo but deploy is app-side CI/CD • update omission on one side teams ◦ e.g. different Lifecycle management: ▪ Network resources such as VPC, SG, ALB, are not updated as frequently as apps, so shouldn’t be managed together.
task definitions • Some CLI/API tools is useful to simply swap containers in the shell within the scope of CD(e.g. ecs-deploy, aws cli). • However, as the number of tasks such as side-car or worker increased, version control become more complex. • And you will have multiple redundant task definitions with similar properties. ◦ Some update omissions of redundant params, etc. will occur, leading to failures in deploy and catch-up and maintenance costs.
Deploy tool to manage only ECS service and tasks with App. ▪ enables separation of concerns! • How to manage the increasing number of task definitions ◦ can be managed in code-based. ▪ codebase principle(from Twelve factor app) is excellent! ◦ can be extended/integrated, so multiple tasks can be maked. ▪ json extension by jsonnet, tfstate integration. • Enables DRY principle (pronounced same as "espresso")
have experienced • Deployment operational issues ◦ ✅Organizing repository and team responsibilities/ownership. ◦ ✅How to manage the increasing number of task definitions. ◦ 💡How to add additional architectures such as batch processing. • Operational Issues for Troubleshooting ◦ Developers/SREs struggling to debug with log-only info. ◦ Some developers want to exec to the container ▪ to check the status of directories and processes
problems when dealing with batches in ECS ◦ Management and consistency with IaC ▪ Task defs are app side, but event bridge is infra. • avoid to version mismatches. • want to manage batches in one place. ◦ CLI/manual configuration ▪ Minimal start was good at first, but version management becomes complicated. • want to trace and manage by code. ◦ As more batches are added, many task defs increases ▪ many task defs with only different command and environment variables.
Scheduled settings can be managed together with task def. ◦ No more mismatches between latest task and scheduled def. • CLI/manual configuration ◦ Code management allows PR review and version history. ▪ help prevent accidents, such as when you want to temporarily disable/enable a system for maintenance. • As more batches are added, many task defs increases ◦ Prepare common definition file, and inject unique values for cron schedules, execution commands, etc. to achieve DRY state.
have experienced • Deployment operational issues ◦ ✅Organizing repository and team responsibilities/ownership ◦ ✅How to manage the increasing number of task definitions ◦ ✅How to add additional architectures such as batch processing • Operational Issues for Troubleshooting ◦ Developers/SREs struggling to debug with log-only info. ◦ Some developers want to exec to the container ▪ to check the status of directories and processes
have experienced • Deployment operational issues ◦ ✅Organizing repository and team responsibilities/ownership ◦ ✅How to manage the increasing number of task definitions ◦ ✅How to add additional architectures such as batch processing • Operational Issues for Troubleshooting ◦ Developers/SREs struggling to debug with log-only info. ◦ Some developers want to exec to the container ▪ to check the status of directories and processes
◦ Investigating only using logs become challenging. ▪ ecs-exec is available, but can be time-consuming to set up each member. • need to prerequisite knowledge of some aws resources(network, session manager, etc.) ◦ When the number of tasks/services increases, it becomes troublesome to manage them. ▪ A make file become obsolete someday. ◦ Want to retrieve/transfer files in a container ▪ some developers might say.
◦ ecs-exec is available, but can be time-consuming to set up each members. ▪ ecs-exec-checker is literally excellent check tool! • lightweight shell script, only need jq and aws cli. • if can't exec, will identify the problem. ◦ also specifies how to solve the problem. • Supports many operating systems(e.g.Cygwin)
to manage them. & Want to retrieve/transfer files in a container. ◦ ecsk can help you! ▪ ecs-exec’s wrapper • enables interactive ecs-exec. ◦ no more need to specify task id, etc. • enable retrieve/transfer files. ◦ as if using scp protocol(※internally use S3) ecsk: https://github.com/yukiarrr/ecsk
have experienced • Deployment operational issues ◦ ✅Organizing repository and team responsibilities/ownership. ◦ ✅How to manage the increasing number of task definitions. ◦ ✅How to add additional architectures such as batch processing. • Operational Issues for Troubleshooting ◦ ✅Developers/SREs struggling to debug with log-only info. ◦ ✅Some developers want to exec to the container ▪ ✅to check the status of directories and processes
operations ◦ currently having trouble with deploy? ▪ important is choose a deploy tool suits your use case! ▪ many good tools as well as ecspresso/ecschedule. ◦ Having trouble with debugging or troubleshooting? ▪ ecs-exec-checker/ecsk will make ecs-exec more convenient! ▪ also sure there may be other good tools out there.