Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Redshift - AWS Meetup

Introduction to Redshift - AWS Meetup

This slide deck introduces the concept of Big Data and how Redshift can be an effective data warehousing service. This also introduces the high-level features that Redshift can offer.

Kyle Escosia

March 19, 2021
Tweet

More Decks by Kyle Escosia

Other Decks in Technology

Transcript

  1. What is Big Data? When your data sets become so

    large and diverse that you have to start innovating around how to collect, store, process, analze and share them
  2. What is Dark Data? • is a type of unstructured,

    untagged and untapped data that is found in data repositories and has not been analyzed or processed • It is similar to big data but differs in how it is mostly neglected by business and IT administrators in terms of its value • also known as dusty data
  3. Amazon Redshift • Faster performance • Easy to setup, deploy,

    and manage • Cost-effective • Scale quickly to meet your needs • Query your data lake • Secure
  4. Amazon Redshift Architecture Massively parallel, shared-nothing columnar architecture Leader node

    SQL endpoint Stores metadata Coordinates parallel SQL processing Compute nodes Local, columnar storage Executes queries in parallel Load, unload, backup, restore Amazon Redshift Spectrum nodes Execute queries directly against Amazon Simple Storage Service(Amazon S3)
  5. Faster Performance ✓Massively parallel • delivers fast query performance on

    datasets ranging in size from gigabytes to exabytes • uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries ✓Machine learning • uses machine learning to deliver high throughout, irrespective of your workloads or concurrent usage • predict incoming query run times, and assigns them to the optimal queue for the fastest processing ✓Result caching • uses result caching to deliver sub-second response times for repeat queries • dashboard, visualization, and business intelligence tools that execute repeat queries experience a significant performance boost
  6. Easy to setup, deploy, and manage ✓Automated provisioning • deploy

    a new data warehouse with just a few clicks in the AWS console, and Redshift automatically provisions the infrastructure for you • focus on your data, not the administration ✓Automated backups • automatically and continuously backs up your data to Amazon S3 • Redshift can asynchronously replicate your snapshots to S3 in another region for disaster recovery ✓Fault tolerant • continuously monitors the health of the cluster ✓Flexible querying • gives you the flexibility to execute queries within the console or connect SQL client tools, libraries, or Business Intelligence tools you love
  7. Cost-effective ✓No upfront costs, pay as you go • Amazon

    Redshift is the most cost-effective data warehouse, and you pay only for the resources you provision ✓Predictable cost • allows customers to scale with minimal cost-impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. ✓Choose your node type • Dense Compute (DC) nodes allow you to create very high performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs) • Dense Storage (DS) node types that use larger hard disk drives for a very low price point
  8. Scale quickly to meet your needs ✓Petabyte-scale data warehousing •

    simple and quickly scales as your needs change ✓Exabyte-scale data lake analytics • Redshift Spectrum, a feature of Redshift, enables you to run queries against exabytes of data in Amazon S3 without having to load or transform any data • You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats ✓Limitless concurrency • automatically adds transient capacity as concurrency increases
  9. Query your data lake ✓Amazon S3 data lake • Amazon

    Redshift is the only data warehouse that extends your queries to your Amazon S3 data lake without loading data. ✓AWS analytics ecosystem • AWS Glue can extract, transform, and load (ETL) data into Redshift • Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load streaming data into Redshift for near real-time analytics • Amazon QuickSight to create reports, visualizations, and dashboards • To accelerate your migration to Amazon Redshift, you can use the AWS Database Migration Service (DMS)
  10. Secure ✓End-to-end encryption • With just a couple of parameter

    settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware- accelerated AES-256 encryption for data at rest ✓Network isolation • enables you to configure firewall rules to control network access to your data warehouse cluster • isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using industry-standard encrypted IPsec VPN ✓Audit and compliance • integrates with AWS CloudTrail to enable you to audit all Redshift API calls • logs all SQL operations, including connection attempts, queries, and changes to your database