Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Analytics to Intelligence: Amazon Redshift

Matt Wood
January 31, 2013

From Analytics to Intelligence: Amazon Redshift

An introduction to Amazon Redshift, presented at Microstrategy World, 2013.

Matt Wood

January 31, 2013
Tweet

More Decks by Matt Wood

Other Decks in Technology

Transcript

  1. to
    from
    ANALYTICS
    INTELLIGENCE
    a presentation at
    MICROSTRATEGY WORLD 2013
    by
    DR MATT WOOD

    View Slide

  2. Hello.

    View Slide

  3. Thank you.

    View Slide

  4. I
    Data, data
    everywhere

    View Slide

  5. I II
    Collection &
    storage
    Data, data
    everywhere

    View Slide

  6. I II III
    Data
    security
    Data, data
    everywhere
    Collection &
    storage

    View Slide

  7. I II III IV
    Data
    movement
    Data, data
    everywhere
    Data
    security
    Collection &
    storage

    View Slide

  8. I II III IV
    Data, data
    everywhere
    Data
    movement
    Data
    security
    Collection &
    storage
    0.
    Amazon web
    Services

    View Slide

  9. Building blocks.

    View Slide

  10. Compute, storage & databases.

    View Slide

  11. Retail Merchant
    services
    Web
    services

    View Slide

  12. Blinding flash of the obvious.

    View Slide

  13. Available.

    View Slide

  14. Low cost.

    View Slide

  15. Flexible.

    View Slide

  16. Every day, AWS adds enough server
    capacity to power amazon.com in 2003,
    when it was a $5B enterprise

    View Slide

  17. Data, data everywhere
    I

    View Slide

  18. Data for competitive advantage.

    View Slide

  19. Customer segmentation,
    financial modeling,
    system analysis,
    line of sight,
    business intelligence...

    View Slide

  20. Generation
    Collection & storage
    Analytics & computation
    Collaboration & sharing

    View Slide

  21. Cost of data generation is falling.

    View Slide

  22. Kindle Fire HD, Kindle Fire, Kindle
    Paperwhite and Kindle hold the top four
    spots on the Amazon world wide best seller
    chart since launch.
    devices

    View Slide

  23. Amazon Appstore selection tripled in 2012.
    apps and games

    View Slide

  24. Amazon customers purchased more than
    one toy per second on mobile devices.
    commerce

    View Slide

  25. most gifted
    kindle book

    View Slide

  26. Generation
    Collection & storage
    Analytics & computation
    Collaboration & sharing
    lower cost,
    increased throughput

    View Slide

  27. Generation
    Collection & storage
    Analytics & computation
    Collaboration & sharing
    highly
    constrained

    View Slide

  28. Gap.

    View Slide

  29. 1990 2000 2010 2020
    The Data Analysis Gap
    Enterprise Data Data in Warehouse
    Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
    IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
    Generated data
    Available for analysis
    Data volume

    View Slide

  30. Enter AWS.

    View Slide

  31. Utility.

    View Slide

  32. Remove constraints.

    View Slide

  33. Generation
    Collection & storage
    Analytics & computation
    Collaboration & sharing
    highly
    constrained

    View Slide

  34. Generation
    Collection & storage
    Analytics & computation
    Collaboration & sharing

    View Slide

  35. Full value.

    View Slide

  36. Close the gap.

    View Slide

  37. Reduced time to market.

    View Slide

  38. Identify and meet new business
    opportunities.

    View Slide

  39. Lower costs.

    View Slide

  40. Collection & Storage
    II

    View Slide

  41. One schema to rule them all.

    View Slide

  42. One schema to rule them all.

    View Slide

  43. Lots of data.
    Lots of users.
    Lots of uses.
    Lots of locations.

    View Slide

  44. Cost.

    View Slide

  45. Multipliers.

    View Slide

  46. Object storage.

    View Slide

  47. 99.999999999%
    durability

    View Slide

  48. Relational databases.

    View Slide

  49. NoSQL data stores.

    View Slide

  50. HDFS based stores.

    View Slide

  51. Undi erentiated heavy lifting.

    View Slide

  52. Lower costs. Ease of use.

    View Slide

  53. Lower costs. Ease of use.
    Lower costs.
    no capital investment
    pay as you go
    no subscriptions
    only pay for what you use

    View Slide

  54. Lower costs. Ease of use.
    Ease of use.
    programmable
    zero admin
    easy to
    configure
    integrate with
    existing tools

    View Slide

  55. Data warehousing.

    View Slide

  56. Expensive. Complicated.

    View Slide

  57. Enterprises average between
    3 and 4 DBAs per data
    warehouse.
    Source: Gartner. Critical factors in calculating the data warehouse TCO, July 2009

    View Slide

  58. Source: Oracle technology global price list 11/1/2012

    View Slide

  59. Expensive. Complicated.

    View Slide

  60. Unobtainable.

    View Slide

  61. Amazon Redshift.

    View Slide

  62. Fast. Powerful. Petabyte scale.

    View Slide

  63. Managed service.

    View Slide

  64. Automated deployment
    & configuration.

    View Slide

  65. SQL access and BI tool integration.

    View Slide

  66. Parallel execution.

    View Slide

  67. Leader
    Node

    View Slide

  68. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node

    View Slide

  69. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node

    View Slide

  70. 10gigE full bisection network.

    View Slide

  71. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node

    View Slide

  72. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node
    Common BI Tools
    JDBC/ODBC

    View Slide

  73. Certified for use with
    Microstrategy.

    View Slide

  74. Data compression.

    View Slide

  75. Automated backup to S3.

    View Slide

  76. Data encrypted in transit
    & at rest.

    View Slide

  77. Automated failover.

    View Slide

  78. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node
    Common BI Tools
    JDBC/ODBC

    View Slide

  79. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node
    Common BI Tools
    JDBC/ODBC

    View Slide

  80. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node
    Common BI Tools
    JDBC/ODBC

    View Slide

  81. Elastic.

    View Slide

  82. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node
    Common BI Tools
    JDBC/ODBC

    View Slide

  83. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node
    Common BI Tools
    JDBC/ODBC
    Compute
    Node
    Compute
    Node

    View Slide

  84. Compute
    Node
    Compute
    Node
    Compute
    Node
    Leader
    Node
    Common BI Tools
    JDBC/ODBC

    View Slide

  85. Data warehouse node types.

    View Slide

  86. 15GB RAM
    2TB local attached storage
    3 drives
    2 virtual cores
    High Storage Extra Large (XL)

    View Slide

  87. High Storage Extra Large (XL)
    15GB RAM
    2TB local attached storage
    3 drives
    2 virtual cores
    8 High Storage Extra Large (8XL)
    120GB RAM
    16TB local attached storage
    24 drives
    16 virtual cores

    View Slide

  88. Pay as you go.

    View Slide

  89. 2 TB nodes 16 TB nodes
    On-demand $0.850 $6.80
    1 Year
    Reservation
    $0.50 $4.00
    3 Year
    Reservation
    $0.228 $1.824
    Hourly Prices

    View Slide

  90. 2 TB nodes 16 TB nodes
    On-demand $0.850 $6.80
    1 Year
    Reservation
    $0.50 $4.00
    3 Year
    Reservation
    $0.228 $1.824
    Hourly Prices

    View Slide

  91. $999 per TB

    View Slide

  92. Don’t pay for the leader node.

    View Slide

  93. No additional storage charge for
    backups of active clusters.

    View Slide

  94. VPC ready.

    View Slide

  95. Low cost. Easy to use.

    View Slide

  96. Focus on analysis.

    View Slide

  97. Private beta today.

    View Slide

  98. Available early this year.

    View Slide

  99. aws.amazon.com/redshift

    View Slide

  100. 2 billion row dataset. 6 representative queries.

    View Slide

  101. Compared to 32 nodes. 128 CPUs. 4.2 TB RAM. 1.6 PB storage. 2 billion row data set.
    Amazon Redshift: 2 instance cluster
    12x to 150x faster

    View Slide

  102. 29 minutes 58 seconds
    down to
    12 seconds

    View Slide

  103. Data security.
    III

    View Slide

  104. Security is our number one priority.

    View Slide

  105. Shared responsibility.

    View Slide

  106. View Slide

  107. Choose your region.

    View Slide

  108. Availability zones.

    View Slide

  109. ITAR
    FIPS 140-2
    MPAA
    ISO 27001
    SOC 2 ISAE 3402 PCI DSS
    HIPAA
    FISMA Moderate

    View Slide

  110. View Slide

  111. “You basically turn yourself into a
    polymorphic surface to which the attack guy
    has a much tougher time getting at. That,
    ultimately, is the real key advantage to drive
    security and make things much better for us
    across the board.”
    Gus Hunt, CTO
    Central Intelligence Agency

    View Slide

  112. Virtual Private Cloud.

    View Slide

  113. Network isolated environment.

    View Slide

  114. Public and private subnets.

    View Slide

  115. Redshift, relational databases, Hadoop
    can run inside the VPC.

    View Slide

  116. Extend your VPN.

    View Slide

  117. Identity and access federation.

    View Slide

  118. Identity and access management.

    View Slide

  119. Data movement.
    IV

    View Slide

  120. “How do I get my data
    into the cloud?”

    View Slide

  121. Generated and stored
    in the AWS cloud.

    View Slide

  122. Inbound transfer if free.

    View Slide

  123. Multipart upload.

    View Slide

  124. Aspera, IRODS.

    View Slide

  125. Physical media.

    View Slide

  126. AWS Direct Connect.

    View Slide

  127. 1Gbps or 10Gbps

    View Slide

  128. Built in AZ replication.

    View Slide

  129. Regional replication.

    View Slide

  130. “How do I integrate my data?”

    View Slide

  131. Amazon DynamoDB
    HDFS (Amazon EMR)
    Amazon S3
    Amazon Redshift
    On Premise
    Amazon RDS

    View Slide

  132. AWS Data Pipeline

    View Slide

  133. Data-intensive orchestration
    & automation.

    View Slide

  134. Reliable, scheduled
    data movement and analytics.

    View Slide

  135. aws.amazon.com/datapipeline

    View Slide

  136. aws.amazon.com

    View Slide

  137. I
    Data, data
    everywhere

    View Slide

  138. I II
    Collection &
    storage
    Data, data
    everywhere

    View Slide

  139. I II III
    Data
    security
    Data, data
    everywhere
    Collection &
    storage

    View Slide

  140. I II III IV
    Data
    movement
    Data, data
    everywhere
    Data
    security
    Collection &
    storage

    View Slide

  141. Thank you.

    View Slide

  142. to
    from
    ANALYTICS
    INTELLIGENCE
    get in touch
    [email protected]
    or
    @MZA
    AWS.AMAZON.COM

    View Slide