Slide 1

Slide 1 text

Amazon S3 Portland Java User Group 2024-09-17 Sean Sullivan

Slide 2

Slide 2 text

Agenda Amazon S3 AWS SDK Code Bonus topics

Slide 3

Slide 3 text

Amazon S3 “Simple Storage Service” Launched on March 14, 2006

Slide 4

Slide 4 text

Amazon S3 S3 is an object storage service with an HTTP REST API https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html

Slide 5

Slide 5 text

Amazon S3 “There is a frontend fl eet with a REST API, a namespace service, a storage fl eet that’s full of hard disks, and a fl eet that does background operations.” https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html

Slide 6

Slide 6 text

S3 core concepts Buckets Objects

Slide 7

Slide 7 text

S3 core concepts An Amazon S3 object represents a fi le or collection of data Every object must reside within a bucket

Slide 8

Slide 8 text

S3 pricing https://aws.amazon.com/s3/pricing/ “You pay for storing objects in your S3 buckets. The rate you’re charged depends on your objects' size, how long you stored the objects during the month, and the storage class”

Slide 9

Slide 9 text

S3 storage classes https://aws.amazon.com/s3/storage-classes/ “Amazon S3 o ff ers a range of storage classes that you can choose from based on the performance, data access, resiliency, and cost requirements of your workloads.”

Slide 10

Slide 10 text

S3 storage classes https://aws.amazon.com/s3/storage-classes/ Standard Intelligent Tiering Express One Zone … and many others

Slide 11

Slide 11 text

S3 storage classes https://aws.amazon.com/s3/storage-classes/ You can con fi gure S3 storage classes at the object level, and a single general purpose bucket can contain objects stored across all storage classes except S3 Express One Zone

Slide 12

Slide 12 text

Storage class choice matters https://www.youtube.com/watch?v=RxgYNrXPOLw

Slide 13

Slide 13 text

S3 REST API 1917 pages

Slide 14

Slide 14 text

S3 operations Upload object List objects Download object Copy Move Delete

Slide 15

Slide 15 text

using S3 in a Java application

Slide 16

Slide 16 text

AWS SDK for Java v1 AWS SDK for Java v2 AWS SDK for Kotlin

Slide 17

Slide 17 text

AWS SDK for Java v1 https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/

Slide 18

Slide 18 text

Open source SDK’s

Slide 19

Slide 19 text

AWS SDK for Java v2 pom.xml

Slide 20

Slide 20 text

AWS SDK for Java v2 pom.xml

Slide 21

Slide 21 text

AWS SDK for Java v2 Apache Client Netty Client CRT Client

Slide 22

Slide 22 text

AWS SDK for Java v2: CRT client pom.xml

Slide 23

Slide 23 text

Di ff erent fl avors of S3 clients Async Sync

Slide 24

Slide 24 text

how to create an S3 bucket?

Slide 25

Slide 25 text

Creating an S3 bucket AWS Console UI AWS CLI AWS SDK CloudFormation AWS CDK Terraform Pulumi Infrastructure as Code Other

Slide 26

Slide 26 text

CloudFormation

Slide 27

Slide 27 text

Pulumi

Slide 28

Slide 28 text

Code https://github.com/sullis/s3-playground

Slide 29

Slide 29 text

s3-playground https://github.com/sullis/s3-playground

Slide 30

Slide 30 text

testing S3 locally Localstack Minio Adobe S3Mock Testcontainers Testcontainers Testcontainers

Slide 31

Slide 31 text

S3 with Localstack S3LocalstackTest.java

Slide 32

Slide 32 text

how to upload an object? PutObjectRequest

Slide 33

Slide 33 text

how to retrieve an object? GetObjectRequest

Slide 34

Slide 34 text

how to upload large objects? CreateMultipartUploadRequest

Slide 35

Slide 35 text

parallel uploads? S3TransferManager

Slide 36

Slide 36 text

Bonus Topics

Slide 37

Slide 37 text

Big Data analytics?

Slide 38

Slide 38 text

Iceberg @ Net fl ix 2018 June 2018 https://www.youtube.com/watch?v=nWwQMlrjhy0 S3

Slide 39

Slide 39 text

Apache Iceberg 2024 A table format is a method of structuring a dataset’s files to present them as a unified “table.”

Slide 40

Slide 40 text

Apache Iceberg 2024 In a data lake, all your data is stored as files in some storage solution (e.g. Amazon S3)

Slide 41

Slide 41 text

AWS Re:Invent 2023

Slide 42

Slide 42 text

AWS Re:Invent 2023

Slide 43

Slide 43 text

AWS Re:Invent 2023 S3

Slide 44

Slide 44 text

AWS Re:Invent 2023

Slide 45

Slide 45 text

AWS Re:Invent 2023

Slide 46

Slide 46 text

AWS Re:Invent 2023 “Too many small fi les are a problem”

Slide 47

Slide 47 text

Apache Iceberg project https://iceberg.apache.org/

Slide 48

Slide 48 text

Apache Iceberg project https://github.com/apache/iceberg

Slide 49

Slide 49 text

Iceberg pull request https://github.com/apache/iceberg/pull/10217 work-in-progress

Slide 50

Slide 50 text

zero disk architecture?

Slide 51

Slide 51 text

Zero Disk Architecture November 2023 September 2024

Slide 52

Slide 52 text

S3 Conditional Writes https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/ Conditional writes can ensure there is no existing object with the same key name in you bucket during PUT operations

Slide 53

Slide 53 text

S3 Storage Lens https://aws.amazon.com/s3/storage-lens/ “S3 Storage Lens delivers organization-wide visibility into object storage usage, activity trends, and makes actionable recommendations to optimize costs ”

Slide 54

Slide 54 text

S3 bucket permissions https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleCon fi guration.html By default, all Amazon S3 resources are private, including buckets, objects, and related subresources

Slide 55

Slide 55 text

S3 performance considerations?

Slide 56

Slide 56 text

S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html "your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 pre fi x”

Slide 57

Slide 57 text

S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html "There are no limits to the number of pre fi xes in a bucket. You can increase your read or write performance by using parallelization”

Slide 58

Slide 58 text

S3 performance https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html “While Amazon S3 is scaling to your new higher request rate, you may see some 503 (Slow Down) errors. These errors will dissipate when the scaling is complete.”

Slide 59

Slide 59 text

how can I migrate from AWS SDK v1 to AWS SDK v2 ?

Slide 60

Slide 60 text

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-tool.html OpenRewrite recipe

Slide 61

Slide 61 text

https://repo1.maven.org/maven2/software/amazon/awssdk/v2-migration/ OpenRewrite recipe in Maven Central

Slide 62

Slide 62 text

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-tool.html running the OpenRewrite recipe gradle rewriteRun \ --init-script init.gradle \ -Drewrite.activeRecipes=software.amazon.awssdk.v2migration.AwsSdkJavaV1ToV2

Slide 63

Slide 63 text

Questions

Slide 64

Slide 64 text

The End