Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Amazon S3 NYJavaSIG 2024-12-12

sullis
December 12, 2024

Amazon S3 NYJavaSIG 2024-12-12

Amazon S3
NYJavaSIG
New York, NY

sullis

December 12, 2024
Tweet

More Decks by sullis

Other Decks in Programming

Transcript

  1. AWS re:Invent 2024 automatic generation of metadata that is captured

    when S3 objects are added or modi fi ed stored in fully managed Apache Iceberg tables
  2. Amazon S3 S3 is an object storage service with an

    HTTP REST API https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
  3. Amazon S3 “There is a frontend fl eet with a

    REST API, a namespace service, a storage fl eet that’s full of hard disks, and a fl eet that does background operations.” https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
  4. S3 core concepts An Amazon S3 object represents a fi

    le or collection of data Every object must reside within a bucket
  5. S3 bucket names an Amazon S3 bucket name is globally

    unique the namespace is shared by all AWS accounts
  6. S3 pricing https://aws.amazon.com/s3/pricing/ “You pay for storing objects in your

    S3 buckets. The rate you’re charged depends on your objects' size, how long you stored the objects during the month, and the storage class”
  7. S3 storage classes https://aws.amazon.com/s3/storage-classes/ “Amazon S3 o ff ers a

    range of storage classes that you can choose from based on the performance, data access, resiliency, and cost requirements of your workloads.”
  8. Creating an S3 bucket AWS Console UI AWS CLI AWS

    SDK CloudFormation AWS CDK Terraform Pulumi Infrastructure as Code Other
  9. Apache Iceberg 2024 A table format is a method of

    structuring a dataset’s files to present them as a unified “table.”
  10. Apache Iceberg 2024 In a data lake, all your data

    is stored as files in some storage solution (e.g. Amazon S3)
  11. S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html "your application can achieve at least 3,500

    PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 pre fi x”
  12. S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html "There are no limits to the number

    of pre fi xes in a bucket. You can increase your read or write performance by using parallelization”
  13. S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html “While Amazon S3 is scaling to your

    new higher request rate, you may see some 503 (Slow Down) errors. These errors will dissipate when the scaling is complete.”