Object storage: An exploration of AWS S3

Object storage Pierre GOUDJO An exploration of AWS S3

Let’s explore the different types of storage architectures

https://stone fl y.com/resources/what-is- fi le-level-storage-vs-block-level-storage Block level storage, or block
storage, is storage used for structured data and is commonly deployed in Storage Area Network (SAN) systems

https://stone fl y.com/resources/what-is- fi le-level-storage-vs-block-level-storage Block storage uses blocks, which
are a set of sequence of bytes, to store structured workloads.

https://stone fl y.com/resources/what-is- fi le-level-storage-vs-block-level-storage File level storage, or file
storage, is storage used for unstructured data and is commonly deployed in Network Attached Storage (NAS) systems

https://stone fl y.com/resources/what-is- fi le-level-storage-vs-block-level-storage File storage, as opposed to
block storage, stores data in a hierarchical architecture; as such that the data and its metadata are stored as is – in the form of files and folders.

https://cloud.google.com/learn/what-is-object-storage Object storage is a data storage architecture for large
stores of unstructured data

https://cloud.google.com/learn/what-is-object-storage It designates each piece of data as an object,
keeps it in a separate storehouse, and bundles it with metadata and a unique identifier for easy access and retrieval.

Let’s focus on object storage

Amazon S3 concepts

• Container of objects stored in Amazon S3 • Play
a role in access control • Serve as unit of aggregation for usage reporting Buckets

• Fundamentals entities stored in Amazon S3 • Objects consist
of object data and metadata • The metadata is a set of name- value pairs that describe the object • The metadata can be default ones (date last modi fi ed, standard HTTP metadata such as Content-Type) or custom ones Objects

Keys • Every object in a bucket has exactly one
key • The combination of a bucket, key, and version ID uniquely identi fi es each object • Eg. in https://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, doc is the name of the bucket and 2006-03-01/AmazonS3.wsdl is the key

• You can chose the AWS Region where S3 stores
the buckets • A region might be chosen to optimise latency, minimise costs or address regulatory requirements Regions

S3 consistency model • Strong read-after-write consistency for PUTs and
DELETEs • Strongly consistent read operations on Amazon S3 Select, Amazon S3 Access Control Lists, Amazon S3 Object Tags, and object metadata • Updates to a single key are atomic

Atomic Last write wins Read-after-write

Storage classes

Data durability vs availability • Availability refers to system uptime,
i.e. the storage system is operational and can deliver data upon request • Durability, on the other hand, refers to long-term data protection, i.e. the stored data does not su ff er from bit rot, degradation or other corruption

Storage classes • Each object has a storage class •
You choose a class depending on your use case scenario and performance access requirements • All storage classes o ff er high durability

99.999999999% Durability

Frequently Accessed Objects • S3 Standard: The default storage class
• 99.99% (Four 9s) Availability • Reduced Redundancy: Storage class designed for non-critical, reproducible data. • 99.99% Durability • 99.99% (Four 9s) Availability • Not Recommended as less cost e ff ective than S3 Standard

Infrequently Accessed Objects • S3 Standard-IA: for data that is
accessed less frequently, but requires rapid access when needed. • Same low latency and high throughput performance of S3 Standard • 99.9% (Three 9s) availability • S3 One Zone-IA: Amazon S3 stores the object data in only one Availability Zone, which makes it less expensive than S3 Standard-IA • Same low latency and high throughput performance of S3 Standard • 99.5% (Two and half 9s) availability

Infrequently Accessed Objects Notes and Recommendations • S3 Standard-IA and
S3 One Zone-IA are suitable for objects larger than 128KB that you plan to store for at least 30 days • S3 Standard-IA — Use for your primary or only copy of data that can't be re- created. • S3 One Zone-IA — Use if you can re-create the data if the Availability Zone fails, and for object replicas

Archiving objects • S3 Glacier: Use for archives where portions
of the data might need to be retrieved in minutes • Low cost • Con fi gurable retrieval times, from minutes to hours • S3 Glacier Deep Archive: Use for archiving data that rarely needs to be accessed • Lowest cost • Retrieval times from 12 hours to 48 hours

Archiving objects Notes and Recommendations • Minimum storage duration: •
90 days for S3 glacier • 180 days for S3 Glacier Deep Archive • Retrieval times ranked by fastest/expensive: • Bulk • Standard • Expedite • You can directly upload data in glacier as archives inside vaults using the Glacier API

S3 Intelligent-Tiering • S3 Intelligent-tiering optimises storage cost by automatically
moving data to the most cost e ff ective access tier • It works by monitoring access patterns • The objects are moved rolling the pattern: • After 30 days of non access : move to infrequent access tier • After 90 days of non access: move to archive access tier • After 180 days of non access: move to deep archive access tier

Storage Class Availability Durability Retrieval time Standard 99,999999999 % 99,99
% Immediate Standard-IA 99,999999999 % 99,9 % Immediate S3 Intelligent-Tiering 99,999999999 % 99,9 % Immediate One Zone-IA 99,999999999 % 99,5 % Immediate Glacier 99,999999999 % 99,99 % Minutes to Hours Deep Glacier Archive 99,999999999 % 99,99 % Hours

Storage lifecycle

S3 Lifecycle • An Amazon S3 Lifecycle is a set
of rules that de fi ne actions S3 applies to a group of objects • There are two types of actions: • Transition actions: De fi ne when objects transition from a storage class to another • Expiration actions: De fi ne when objects expire and should be deleted by S3 • Lifecycle con fi guration fi les are XML documents

Transitions between storage classes

<LifecycleConfiguration> <Rule> <ID>Transition and Expiration Rule</ID> <Filter> <Prefix>tax/</Prefix> </Filter> <Status>Enabled</Status>
<Transition> <Days>365</Days> <StorageClass>S3 Glacier</StorageClass> </Transition> <Expiration> <Days>3650</Days> </Expiration> </Rule> </LifecycleConfiguration> Example of lifecycle con fi guration

Amazon S3 Versioning

Amazon S3 versioning • Versioning in Amazon S3 allows keeping
multiple versions of the same object in the same bucket • If S3 receives multiple write requests for the same object, it stores all of those objects • When versioning is activated: • A simple GET request retrieves the current version of the object. To retrieve a speci fi c version, you have to specify the version ID • A simple DELETE request cannot delete an object. To delete version object de fi nitely, you have to also specify the version ID

Object locking • With S3 Object Lock it is possible
to prevent objects for being deleted or overwritten • Works only in versioned buckets and applies to individual object versions • There is two types of object locking: • Retention period: Speci fi es a fi xed amount of time during which the object remains locked • Legal hold: Provides the same protection as retention period but has no expiration date. Legal holds remain in place until explicitly removed.

Security

Data encryption • Amazon S3 uses SSL/TLS to protect data
in-transit • Amazon S3 allows the following options for protection data at rest: • Server-side encryption: Requesting S3 to encrypt your objects before saving them on disks in the data centres • Client-side encryption: Encrypt data client-side and upload the encrypted data to Amazon S3.

Data encryption Server-side Encryption • Server-Side Encryption with Amazon S3-Managed
Keys (SSE-S3): AES-256 keys is used to encrypt object and the master key is managed and rotated by S3 • Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management Service (SSE-KMS): Uses a customer managed master key with AWS KMS. It allows audit trail (who and when the key was used) • Server-Side Encryption with Customer-Provided Keys (SSE-C): Uses AES-256 encryption keys provided by the customer when uploading to encrypt the object. When retrieving the object, the same key must be provided in order to allow Amazon S3 to decrypt and return the object data

Customer master key vs Data key

Data encryption Client-side encryption • The client encrypts the data
before sending them to S3 • It can be done by: • Using a customer master key (CMK) stored in KMS • Using a master key that you store in your application • When uploading an object, a symmetric key is generated using the CMK ID via KMS or the owned master key via AWS Encryption SDK. The plainkey is used to encrypt the data and the encrypted one is stored in object metadata • When downloading an object, the encrypted key is retrieved from the object metadata, decrypted with the master key and used to decrypt the object data

Identity and Access Management • By default all Amazon S3
resources are private. Only the resource owner can access the resource • You can create and con fi gure bucket policies to grant permission to your Amazon S3 resources. Bucket policies use JSON-based access policy language • You can use ACLs to grant basic read/write permissions to other AWS account

Bucket Policies

Blocking Public Access

Additional Features

Replication • Replication enables automatic, asynchronous copying of objects across
Amazon S3 buckets. • Object may be replicated to a single destination bucket or multiple destination buckets • Original metadata are replicated across buckets • Note: Object created with SSE-C encryption keys are not replicated

Amazon Select S3 • With Amazon S3 Select, you can
use SQL statements to fi lter the contents of an Amazon S3 object (on S3 or Glacier) and retrieve just the subset of data that you need • Works on objects stored in CSV, JSON or Apache Parquet. It also works on objects compressed with GZIP or BZIP2 and server-side encrypted objects. • The following standard clauses are supported: • SELECT list • FROM clause • WHERE clause • LIMIT clause (Amazon S3 Select only)

Batch Operations • S3 Batch Operations can perform a single
operation on lists of Amazon S3 objects that you specify • Can be used to copy objects, set tags, access control lists or invoke lambda to perform custom actions using your objects • S3 Batch terminology • Job: A basic unit of work for S3 batch operations • Operation: The type of API action (copying objects, call Lambda) that you want the batch operation to run • Task: The unit of execution for the job. A task represents a single call to an Amazon S3 or AWS Lambda API

Static Web Hosting • You can use Amazon S3 to
host a static website (static HTML, SPA web app) • Only http available, to provide https you can use Cloudfront • For your customers to access content at the website endpoint, you must make all your content publicly readable • You can optionally enable Amazon S3 server access logging

S3 Transfer Acceleration • S3 Transfer Acceleration enables fast, easy
and secure transfer of fi le over long distance • S3 Transfer Acceleration uses CloudFront distributed edge location and routes data to S3 using an optimised network • Works both with IPv4 (bucketname.s3-accelerate.amazonaws.com) and IPv6 (bucketname.s3-accelerate.dualstack.amazonaws.com) • S3 Transfer Acceleration Speed Comparison tool can be used to benchmark accelerated vs non-accelerated S3 upload across AWS Regions

Object storage: An exploration of AWS S3

Object storage: An exploration of AWS S3

More Decks by Pierre GOUDJO

Other Decks in Technology

Featured

Transcript