Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient ECS Operations with OSS Tools

Efficient ECS Operations with OSS Tools

AWS Community Builders APJ Open Mic vol.3.

Masayoshi Haruta

April 14, 2023
Tweet

Other Decks in Programming

Transcript

  1. Efficient ECS Operations with OSS Tools
 Apr 9th, 2023
 


    AWS Community Builders: APJ Open Mic
 
 Masayoshi Haruta

  2. Who am I
 Name: 
 Masayoshi Haruta 
 
 Career:


    2021: Site Reliability Engineer 
 2018: Solution Architect 
 2014: System Administator, developer 
 
 Location:
 Region: Japan 
 Linkedin: masayoshi-h 
 Twitter: @MahPaprika 
 GitHub: masayoshi644 
 
 My favorite AWS services: 
 Lambda, ECS, Fargate 
 
 Community:
 JAWS-UG(SRE division), AWS Community Builders 

  3. Today’s talk
 • Targets:
 ◦ mainly SREs and Devlopers
 ◦

    mainly builders interested in ECS(especially Fargate) operations 
 ◦ mainly builders interested in ECS-related OSS tools
 
 • Takeaways:
 ◦ knowledge of some OSS tools for ECS
 ◦ know-how of operate ECS
 
 • Story:
 ◦ Operational problems that tend to arise as products grow using ECS/Fargate. 
 ▪ How to solve them

  4. Index
 • Intro
 • Deploy
 ◦ ecspresso
 ◦ ecschedule
 •

    Trouble shoot
 ◦ ecs-exec-checker
 ◦ ecsk
 • Conclusion
 

  5. Intro: Why use ECS
 • Early stage startups, PoC or

    MVP-Product preferred.
 ◦ Container orchestration based on AWS services 
 ▪ familiarize with AWS Builders
 ◦ Simple and low catch up cost.
 ▪ k8s is flexible and scalable
 • but also complex and expensive to set up and manage.
 ▪ Not taking up a lot of time with maintenance issue 
 • such as periodic cluster updates
 
 💡ECS=primitive; k8s=platform; 
 tends to be preferred by those who aren’t planning a large scale web services or platform environment.

  6. ◆First phase architecture
 Simple three-tier-archtecture
 (like ALB -> ECS ->

    RDS)
 
 
 
 
 ◆First project organization
 Single-scrum team
 
 ◆Growth phase architecture
 More services(batch, worker, admin app, external integration)
 
 More tasks(monitoring-agent, log-router, web-server)
 
 ◆Growth phase organization
 Application team
 Infra team
 Security team
 BI team
 Intro: Why use ECS
 However, ECS will also scale as the product grows

  7. Intro: Have you ever got problems ECS?
 Operational troubles I

    have experienced
 
 • Deployment operational issues
 ◦ Organizing repository and team responsibilities/ownership.
 ◦ How to manage the increasing number of task definitions.
 ◦ How to add additional architectures such as batch processing.
 
 • Operational Issues for Troubleshooting
 ◦ Developers/SREs struggling to debug with log-only info.
 ◦ Some developers want to exec to the container 
 ▪ to check the status of directories and processes.

  8. Intro: Have you ever got problems ECS?
 Operational troubles I

    have experienced
 
 • Deployment operational issues
 ◦ Organizing repository and team responsibilities/ownership.
 ◦ How to manage the increasing number of task definitions.
 ◦ How to add additional architectures such as batch processing.
 
 • Operational Issues for Troubleshooting
 ◦ Developers/SREs struggling to debug with log-only info.
 ◦ Some developers want to exec to the container
 ▪ to check the status of directories and processes

  9. • IaC
 ◦ CDK, Terraform, Cloud Formation, Pulumi
 
 •

    API
 ◦ AWS CLI, ecs-deploy, Copilot, amazon-ecs-task-deploy-definition
 
 • Docker Compose
 ◦ ECS CLI, Docker Compose ECS integration
 
 • GUI
 ◦ AWS Management Console
 
 
 💡No Silver Bullet! No De Facto Standard... so use-case is important(IMO)
 Actually many deploy tools but...

  10. use-case is important
 Organizing repository and team responsibilities/ownership
 • A

    tool allows centralized management of ECS resources must have seemed useful at the early stage(e.g. CDK, Terraform, Copilot).
 
 • However, app/infra teams are split and ownership is separated, centralized manage would become noisy. 
 ◦ Appropriate separation of interests may be necessary.
 
 • Must avoid to operational accidents.
 ◦ e.g. task-def redundant management: 
 ▪ created in IaC repo but deploy is app-side CI/CD
 • update omission on one side teams
 
 ◦ e.g. different Lifecycle management: 
 ▪ Network resources such as VPC, SG, ALB, are not updated as frequently as apps, so shouldn’t be managed together.

  11. use-case is important
 How to manage the increasing number of

    task definitions 
 • Some CLI/API tools is useful to simply swap containers in the shell within the scope of CD(e.g. ecs-deploy, aws cli).
 
 • However, as the number of tasks such as side-car or worker increased, version control become more complex.
 
 • And you will have multiple redundant task definitions with similar properties.
 ◦ Some update omissions of redundant params, etc. will occur, leading to failures in deploy and catch-up and maintenance costs.

  12. ecspresso: https://github.com/kayac/ecspresso 
 • Organizing repository and team responsibilities/ownership
 ◦

    Deploy tool to manage only ECS service and tasks with App.
 ▪ enables separation of concerns!
 
 • How to manage the increasing number of task definitions
 ◦ can be managed in code-based.
 ▪ codebase principle(from Twelve factor app) is excellent!
 ◦ can be extended/integrated, so multiple tasks can be maked.
 ▪ json extension by jsonnet, tfstate integration.
 • Enables DRY principle
 
 (pronounced same as "espresso")

  13. Intro: Have you ever got problems ECS?
 Operational troubles I

    have experienced
 
 • Deployment operational issues
 ◦ ✅Organizing repository and team responsibilities/ownership.
 ◦ ✅How to manage the increasing number of task definitions.
 ◦ 💡How to add additional architectures such as batch processing.
 
 • Operational Issues for Troubleshooting
 ◦ Developers/SREs struggling to debug with log-only info.
 ◦ Some developers want to exec to the container
 ▪ to check the status of directories and processes

  14. How to add additional architectures such as batch
 • Common

    problems when dealing with batches in ECS
 ◦ Management and consistency with IaC
 ▪ Task defs are app side, but event bridge is infra.
 • avoid to version mismatches.
 • want to manage batches in one place.
 
 ◦ CLI/manual configuration
 ▪ Minimal start was good at first, but version management becomes complicated. 
 • want to trace and manage by code.
 
 ◦ As more batches are added, many task defs increases
 ▪ many task defs with only different command and environment variables.
 

  15. ecschedule: https://github.com/Songmu/ecschedule 
 • Management and consistency with IaC
 ◦

    Scheduled settings can be managed together with task def.
 ◦ No more mismatches between latest task and scheduled def.
 
 • CLI/manual configuration
 ◦ Code management allows PR review and version history.
 ▪ help prevent accidents, such as when you want to temporarily disable/enable a system for maintenance.
 
 • As more batches are added, many task defs increases 
 ◦ Prepare common definition file, and inject unique values for cron schedules, execution commands, etc. to achieve DRY state.

  16. Intro: Have you ever got problems ECS?
 Operational troubles I

    have experienced
 
 • Deployment operational issues
 ◦ ✅Organizing repository and team responsibilities/ownership
 ◦ ✅How to manage the increasing number of task definitions
 ◦ ✅How to add additional architectures such as batch processing
 
 • Operational Issues for Troubleshooting
 ◦ Developers/SREs struggling to debug with log-only info.
 ◦ Some developers want to exec to the container
 ▪ to check the status of directories and processes

  17. Intro: Have you ever got problems ECS?
 Operational troubles I

    have experienced
 
 • Deployment operational issues
 ◦ ✅Organizing repository and team responsibilities/ownership
 ◦ ✅How to manage the increasing number of task definitions
 ◦ ✅How to add additional architectures such as batch processing
 
 • Operational Issues for Troubleshooting
 ◦ Developers/SREs struggling to debug with log-only info.
 ◦ Some developers want to exec to the container
 ▪ to check the status of directories and processes

  18. how to troubleshooting
 • Eventually, problems that arise in troubleshooting


    ◦ Investigating only using logs become challenging.
 ▪ ecs-exec is available, but can be time-consuming to set up each member.
 • need to prerequisite knowledge of some aws resources(network, session manager, etc.)
 
 ◦ When the number of tasks/services increases, it becomes troublesome to manage them.
 ▪ A make file become obsolete someday.
 
 ◦ Want to retrieve/transfer files in a container
 ▪ some developers might say.

  19. ecs-exec-checker: https://github.com/aws-containers/amazon-ecs-exec-checker
 
 • Investigating only using logs become challenging.


    ◦ ecs-exec is available, but can be time-consuming to set up each members.
 ▪ ecs-exec-checker is literally excellent check tool!
 • lightweight shell script, only need jq and aws cli.
 • if can't exec, will identify the problem.
 ◦ also specifies how to solve the problem.
 • Supports many operating systems(e.g.Cygwin)
 

  20. • When the number of tasks/services increases, it becomes troublesome

    to manage them. & Want to retrieve/transfer files in a container.
 ◦ ecsk can help you!
 ▪ ecs-exec’s wrapper
 • enables interactive ecs-exec.
 ◦ no more need to specify task id, etc.
 • enable retrieve/transfer files.
 ◦ as if using scp protocol(※internally use S3)
 
 ecsk: https://github.com/yukiarrr/ecsk 

  21. Intro: Have you ever got problems ECS?
 Operational troubles I

    have experienced
 
 • Deployment operational issues
 ◦ ✅Organizing repository and team responsibilities/ownership.
 ◦ ✅How to manage the increasing number of task definitions.
 ◦ ✅How to add additional architectures such as batch processing.
 
 • Operational Issues for Troubleshooting
 ◦ ✅Developers/SREs struggling to debug with log-only info.
 ◦ ✅Some developers want to exec to the container
 ▪ ✅to check the status of directories and processes

  22. Conclusion
 • For those who have been struggling with ECS

    operations
 ◦ currently having trouble with deploy?
 ▪ important is choose a deploy tool suits your use case!
 ▪ many good tools as well as ecspresso/ecschedule.
 
 ◦ Having trouble with debugging or troubleshooting?
 ▪ ecs-exec-checker/ecsk will make ecs-exec more convenient!
 ▪ also sure there may be other good tools out there.
 

  23. Conclusion
 If you have any useful OSS, please let me

    know! 
 Let's talk about it together

  24. Appendix
 • ecspresso
 ◦ https://github.com/kayac/ecspresso 
 • ecschedule
 ◦ https://github.com/Songmu/ecschedule

    
 • ecs-exec-checker
 ◦ https://github.com/aws-containers/amazon-ecs-exec-checker 
 • ecsk
 ◦ https://github.com/yukiarrr/ecsk