Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building large-scale batch of media with AWS

Avatar for muratak muratak
August 22, 2025
4

Building large-scale batch of media with AWS

Dedicated for Japanglish Tech Talk 2024.6

Avatar for muratak

muratak

August 22, 2025
Tweet

Transcript

  1. Building a large-scale batch is difficult. - It’s like tackling

    big waves while many different batches spark and come along intensely. - We should prepare well.
  2. Self-Introduction - My name is Kaito Murata - I work

    at istyle, creating @cosme we are now at 25 years anniversary! - I am surfer lover - I am 5th year software developer keywords: Next.js, Node.js, AWS
  3. Today I will talk about… How we have been building

    the large scale batch of - existing batch system is composed of 30+ different jobs, and each batch influence several tables which influence 10+ subsystems - how we realized complexing development style with 7+ members. - how we replaced the existing on-premise batch with AWS system.
  4. And you will learn - How to control large-scale batch

    systems. like surfers control waves.
  5. Batch System Requirements: - Composed of 30+ different batches, we

    need a powerful tool to monitorize each batch’s behavior and performance. - As each batch runs and sometimes fails, we need the quickest way to re-run if any batch fails. Also, we should notice errors quickly. Infrastructure batch1 batch2 batchN ・・・
  6. Infrastructure to meet the batch requirements: - Event Bridge -

    Step Functions - ECS on Fargete ※ Be careful so as not to reoccur the same batch(Next Page).
  7. Infrastructure to meet the batch requirements: - EventBridge ensures 1+

    run in each job, and this sometimes causes more than 1 time run - To prevent this, use the unique ID of which each workflow issues in each time. Start End Run Error ID Verification
  8. How 7+ SWEs work simultaneously without conflicts? - We adapted

    DI(Dependency Injection)patterns with Tsrynge. - While each repository(DB-connection)parts are separated, each repository function is used in each developer’s batch use case. - We started writing unit-tests from the first.
  9. Created Development Standard(Criteria) The criteria includes the points such as:

    - Logging information is well written enough with parameters. - Unit Tests are written enough (which was made easy with DI pattern). - Summarization of every table and pages on which batch has influence. - Each member knows how to re-run the batch with documentation.
  10. Logging and Error Monitorization - All errors are captured in

    New Relic. - All errors and warnings are notified in a slack channel. - Any developers can fix errors with manual.