Slide 1

Slide 1 text

An Introduction to Amazon S3, RDB, and EMR Weijia Song 11/14/2016

Slide 2

Slide 2 text

Amazon Simple Storage Service(S3) • A Easy-to-use, Scalable, Reliable, and Secure Cloud Storage • Scalable: no size limit • Reliable: 99.99% availability and 99.999999999% durability • Secure: SSL transfer, Data encryption, and Access Control • Easy-to-use: web interface, REST api, and SDK • Price: (~3 cents per GB month, much cheaper with I/A or Glacier)

Slide 3

Slide 3 text

Use cases of Amazon S3 • File backup storage • Sharing and Collaboration(Host static webpages, Git Repository) • Host data for Applications

Slide 4

Slide 4 text

Amazon Relational Database Service(RDS) • Easy-to-use relational database in the Cloud. • Six Engines available: • Amazon Aurora, MySQL, MariaDB, Oracle, PostgreSQL, SQL Server • High availability: backup, multi-AZ, Read Replica, Snapshot transfer… • Scalability: vertical scaling, data sharding, and clustering • Security: SSL, data encryption • Price includes instances/Storage and IO/Data transfer

Slide 5

Slide 5 text

Amazon RDS Demo • Creating a DB service in the Cloud • Manipulating data using DBMS Client

Slide 6

Slide 6 text

Amazon Elastic MapReduce(EMR) • MapReduce is a distributed application framework. • Processing a vast amount of data (TBs) in parallel on large cluster • Reliable, fault-tolerant • MapReduce Input Data Map() Map() Map() Reduce( ) Reduce( ) Output Data [K1,V1] [K2,V2] [K3,V3]

Slide 7

Slide 7 text

Amazon Elastic MapReduce(EMR) • Amazon EMR • Easy deploying/using of Hadoop cluster • Hadoop-based tools: Hive, PIG, Hue, HBase, … etc • Spark, Mahout, … etc • Demo • Create a Hadoop cluster. • Run a “wordcount” application.