Slide 1

Slide 1 text

Amazon S3 NYJavaSIG 2024-12-12 Sean Sullivan

Slide 2

Slide 2 text

AWS Developers

Slide 3

Slide 3 text

Agenda AWS re:Invent 2024 Amazon S3 AWS SDK Bonus topics

Slide 4

Slide 4 text

AWS re:Invent 2024 S3 Tables S3 Metadata

Slide 5

Slide 5 text

AWS re:Invent 2024 fully managed Iceberg tables

Slide 6

Slide 6 text

AWS re:Invent 2024 “Table buckets”

Slide 7

Slide 7 text

AWS re:Invent 2024 Iceberg tables as fi rst-class AWS resources

Slide 8

Slide 8 text

AWS re:Invent 2024

Slide 9

Slide 9 text

AWS re:Invent 2024 pom.xml

Slide 10

Slide 10 text

AWS re:Invent 2024 automatic generation of metadata that is captured when S3 objects are added or modi fi ed stored in fully managed Apache Iceberg tables

Slide 11

Slide 11 text

Amazon S3 “Simple Storage Service” Launched on March 14, 2006

Slide 12

Slide 12 text

Amazon S3 S3 is an object storage service with an HTTP REST API https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html

Slide 13

Slide 13 text

Amazon S3 “There is a frontend fl eet with a REST API, a namespace service, a storage fl eet that’s full of hard disks, and a fl eet that does background operations.” https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html

Slide 14

Slide 14 text

S3 core concepts Buckets Objects

Slide 15

Slide 15 text

S3 core concepts An Amazon S3 object represents a fi le or collection of data Every object must reside within a bucket

Slide 16

Slide 16 text

S3 bucket types General purpose buckets Directory buckets Table buckets

Slide 17

Slide 17 text

S3 bucket names an Amazon S3 bucket name is globally unique the namespace is shared by all AWS accounts

Slide 18

Slide 18 text

S3 pricing https://aws.amazon.com/s3/pricing/ “You pay for storing objects in your S3 buckets. The rate you’re charged depends on your objects' size, how long you stored the objects during the month, and the storage class”

Slide 19

Slide 19 text

S3 storage classes https://aws.amazon.com/s3/storage-classes/ “Amazon S3 o ff ers a range of storage classes that you can choose from based on the performance, data access, resiliency, and cost requirements of your workloads.”

Slide 20

Slide 20 text

S3 storage classes https://aws.amazon.com/s3/storage-classes/ Standard Intelligent Tiering Express One Zone … and many others

Slide 21

Slide 21 text

Storage class choice matters https://www.youtube.com/watch?v=RxgYNrXPOLw

Slide 22

Slide 22 text

S3 REST API 3314 pages

Slide 23

Slide 23 text

S3 operations Upload object List objects Download object Copy Move Delete

Slide 24

Slide 24 text

using S3 in a Java application

Slide 25

Slide 25 text

AWS SDK for Java v1 AWS SDK for Java v2 AWS SDK for Kotlin

Slide 26

Slide 26 text

AWS SDK for Java v1 https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/

Slide 27

Slide 27 text

Open source SDK’s

Slide 28

Slide 28 text

AWS SDK for Java v2 pom.xml

Slide 29

Slide 29 text

AWS SDK for Java v2 pom.xml

Slide 30

Slide 30 text

AWS SDK for Java v2 : HTTP clients Apache Client Netty Client CRT Client

Slide 31

Slide 31 text

AWS SDK for Java v2: CRT client pom.xml

Slide 32

Slide 32 text

CRT @ AWS re:Invent 2024 https://youtu.be/2DSVjJTRsz8?t=833

Slide 33

Slide 33 text

Di ff erent fl avors of S3 clients Async Sync

Slide 34

Slide 34 text

how to create an S3 bucket?

Slide 35

Slide 35 text

Creating an S3 bucket AWS Console UI AWS CLI AWS SDK CloudFormation AWS CDK Terraform Pulumi Infrastructure as Code Other

Slide 36

Slide 36 text

CloudFormation

Slide 37

Slide 37 text

Pulumi

Slide 38

Slide 38 text

s3-playground https://github.com/sullis/s3-playground

Slide 39

Slide 39 text

s3-playground https://github.com/sullis/s3-playground

Slide 40

Slide 40 text

testing S3 locally Localstack MinIO Adobe S3Mock Testcontainers Testcontainers Testcontainers

Slide 41

Slide 41 text

S3 with MinIO S3MinioTest.java

Slide 42

Slide 42 text

how to upload an object? PutObjectRequest

Slide 43

Slide 43 text

how to retrieve an object? GetObjectRequest

Slide 44

Slide 44 text

how to upload large objects? CreateMultipartUploadRequest

Slide 45

Slide 45 text

parallel uploads? S3TransferManager

Slide 46

Slide 46 text

Big Data analytics?

Slide 47

Slide 47 text

Iceberg @ Net fl ix 2018 June 2018 https://www.youtube.com/watch?v=nWwQMlrjhy0 S3

Slide 48

Slide 48 text

Apache Iceberg 2024 A table format is a method of structuring a dataset’s files to present them as a unified “table.”

Slide 49

Slide 49 text

Apache Iceberg 2024 In a data lake, all your data is stored as files in some storage solution (e.g. Amazon S3)

Slide 50

Slide 50 text

AWS re:Invent 2023 Ryan Blue

Slide 51

Slide 51 text

AWS re:Invent 2023 S3

Slide 52

Slide 52 text

AWS re:Invent 2023

Slide 53

Slide 53 text

AWS re:Invent 2023 “Too many small fi les are a problem”

Slide 54

Slide 54 text

Apache Iceberg project https://github.com/apache/iceberg

Slide 55

Slide 55 text

Iceberg pull request https://github.com/apache/iceberg/pull/11349

Slide 56

Slide 56 text

S3 Conditional Writes https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/ Conditional writes can ensure there is no existing object with the same key name in you bucket during PUT operations

Slide 57

Slide 57 text

S3 bucket permissions https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleCon fi guration.html By default, all Amazon S3 resources are private, including buckets, objects, and related subresources

Slide 58

Slide 58 text

S3 performance considerations?

Slide 59

Slide 59 text

S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html "your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 pre fi x”

Slide 60

Slide 60 text

S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html "There are no limits to the number of pre fi xes in a bucket. You can increase your read or write performance by using parallelization”

Slide 61

Slide 61 text

S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html “While Amazon S3 is scaling to your new higher request rate, you may see some 503 (Slow Down) errors. These errors will dissipate when the scaling is complete.”

Slide 62

Slide 62 text

KubeCon November 2024

Slide 63

Slide 63 text

re:Invent December 2024

Slide 64

Slide 64 text

The End

Slide 65

Slide 65 text

Bonus content

Slide 66

Slide 66 text

AWS re:Invent 2024

Slide 67

Slide 67 text

AWS re:Invent 2024

Slide 68

Slide 68 text

AWS re:Invent 2024

Slide 69

Slide 69 text

AWS re:Invent 2024

Slide 70

Slide 70 text

AWS re:Invent 2024

Slide 71

Slide 71 text

AWS re:Invent 2024