AWS re:Invent 2024
Iceberg tables as
fi
rst-class
AWS resources
Slide 8
Slide 8 text
AWS re:Invent 2024
Slide 9
Slide 9 text
AWS re:Invent 2024
pom.xml
Slide 10
Slide 10 text
AWS re:Invent 2024
automatic generation of metadata that is
captured when S3 objects are added or
modi
fi
ed
stored in fully managed Apache Iceberg
tables
Slide 11
Slide 11 text
Amazon S3
“Simple Storage Service”
Launched on March 14, 2006
Slide 12
Slide 12 text
Amazon S3
S3 is an object storage service with an HTTP REST API
https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
Slide 13
Slide 13 text
Amazon S3
“There is a frontend
fl
eet with a REST API, a namespace
service, a storage
fl
eet that’s full of hard disks, and a
fl
eet that
does background operations.”
https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
Slide 14
Slide 14 text
S3 core concepts
Buckets
Objects
Slide 15
Slide 15 text
S3 core concepts
An Amazon S3 object represents a
fi
le or collection of data
Every object must reside within a bucket
Slide 16
Slide 16 text
S3 bucket types
General purpose buckets
Directory buckets
Table buckets
Slide 17
Slide 17 text
S3 bucket names
an Amazon S3 bucket name is globally unique
the namespace is shared by all AWS accounts
Slide 18
Slide 18 text
S3 pricing
https://aws.amazon.com/s3/pricing/
“You pay for storing objects in your S3 buckets.
The rate you’re charged depends on your
objects' size, how long you stored the objects
during the month, and the storage class”
Slide 19
Slide 19 text
S3 storage classes
https://aws.amazon.com/s3/storage-classes/
“Amazon S3 o
ff
ers a range of storage classes that you can
choose from based on the performance, data access, resiliency,
and cost requirements of your workloads.”
Slide 20
Slide 20 text
S3 storage classes
https://aws.amazon.com/s3/storage-classes/
Standard
Intelligent Tiering
Express One Zone
… and many others
Slide 21
Slide 21 text
Storage class choice matters
https://www.youtube.com/watch?v=RxgYNrXPOLw
S3 Conditional Writes
https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
Conditional writes can ensure there is no existing object
with the same key name in you bucket
during PUT operations
Slide 57
Slide 57 text
S3 bucket permissions
https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleCon
fi
guration.html
By default, all Amazon S3 resources are private,
including buckets, objects, and related subresources
Slide 58
Slide 58 text
S3 performance
considerations?
Slide 59
Slide 59 text
S3 performance
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
"your application can achieve at least
3,500 PUT/COPY/POST/DELETE
or 5,500 GET/HEAD requests per second
per partitioned Amazon S3 pre
fi
x”
Slide 60
Slide 60 text
S3 performance
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
"There are no limits to the number of pre
fi
xes in a bucket.
You can increase your read or write performance by using
parallelization”
Slide 61
Slide 61 text
S3 performance
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
“While Amazon S3 is scaling to your new higher request rate,
you may see some 503 (Slow Down) errors. These errors will
dissipate when the scaling is complete.”