From Analytics to Intelligence: Amazon Redshift

39488f9d172ab92fd352f2cd7b73258d?s=47 Matt Wood
January 31, 2013

From Analytics to Intelligence: Amazon Redshift

An introduction to Amazon Redshift, presented at Microstrategy World, 2013.

39488f9d172ab92fd352f2cd7b73258d?s=128

Matt Wood

January 31, 2013
Tweet

Transcript

  1. to from ANALYTICS INTELLIGENCE a presentation at MICROSTRATEGY WORLD 2013

    by DR MATT WOOD
  2. Hello.

  3. Thank you.

  4. I Data, data everywhere

  5. I II Collection & storage Data, data everywhere

  6. I II III Data security Data, data everywhere Collection &

    storage
  7. I II III IV Data movement Data, data everywhere Data

    security Collection & storage
  8. I II III IV Data, data everywhere Data movement Data

    security Collection & storage 0. Amazon web Services
  9. Building blocks.

  10. Compute, storage & databases.

  11. Retail Merchant services Web services

  12. Blinding flash of the obvious.

  13. Available.

  14. Low cost.

  15. Flexible.

  16. Every day, AWS adds enough server capacity to power amazon.com

    in 2003, when it was a $5B enterprise
  17. Data, data everywhere I

  18. Data for competitive advantage.

  19. Customer segmentation, financial modeling, system analysis, line of sight, business

    intelligence...
  20. Generation Collection & storage Analytics & computation Collaboration & sharing

  21. Cost of data generation is falling.

  22. Kindle Fire HD, Kindle Fire, Kindle Paperwhite and Kindle hold

    the top four spots on the Amazon world wide best seller chart since launch. devices
  23. Amazon Appstore selection tripled in 2012. apps and games

  24. Amazon customers purchased more than one toy per second on

    mobile devices. commerce
  25. most gifted kindle book

  26. Generation Collection & storage Analytics & computation Collaboration & sharing

    lower cost, increased throughput
  27. Generation Collection & storage Analytics & computation Collaboration & sharing

    highly constrained
  28. Gap.

  29. 1990 2000 2010 2020 The Data Analysis Gap Enterprise Data

    Data in Warehouse Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Generated data Available for analysis Data volume
  30. Enter AWS.

  31. Utility.

  32. Remove constraints.

  33. Generation Collection & storage Analytics & computation Collaboration & sharing

    highly constrained
  34. Generation Collection & storage Analytics & computation Collaboration & sharing

  35. Full value.

  36. Close the gap.

  37. Reduced time to market.

  38. Identify and meet new business opportunities.

  39. Lower costs.

  40. Collection & Storage II

  41. One schema to rule them all.

  42. One schema to rule them all.

  43. Lots of data. Lots of users. Lots of uses. Lots

    of locations.
  44. Cost.

  45. Multipliers.

  46. Object storage.

  47. 99.999999999% durability

  48. Relational databases.

  49. NoSQL data stores.

  50. HDFS based stores.

  51. Undi erentiated heavy lifting.

  52. Lower costs. Ease of use.

  53. Lower costs. Ease of use. Lower costs. no capital investment

    pay as you go no subscriptions only pay for what you use
  54. Lower costs. Ease of use. Ease of use. programmable zero

    admin easy to configure integrate with existing tools
  55. Data warehousing.

  56. Expensive. Complicated.

  57. Enterprises average between 3 and 4 DBAs per data warehouse.

    Source: Gartner. Critical factors in calculating the data warehouse TCO, July 2009
  58. Source: Oracle technology global price list 11/1/2012

  59. Expensive. Complicated.

  60. Unobtainable.

  61. Amazon Redshift.

  62. Fast. Powerful. Petabyte scale.

  63. Managed service.

  64. Automated deployment & configuration.

  65. SQL access and BI tool integration.

  66. Parallel execution.

  67. Leader Node

  68. Compute Node Compute Node Compute Node Leader Node

  69. Compute Node Compute Node Compute Node Leader Node

  70. 10gigE full bisection network.

  71. Compute Node Compute Node Compute Node Leader Node

  72. Compute Node Compute Node Compute Node Leader Node Common BI

    Tools JDBC/ODBC
  73. Certified for use with Microstrategy.

  74. Data compression.

  75. Automated backup to S3.

  76. Data encrypted in transit & at rest.

  77. Automated failover.

  78. Compute Node Compute Node Compute Node Leader Node Common BI

    Tools JDBC/ODBC
  79. Compute Node Compute Node Compute Node Leader Node Common BI

    Tools JDBC/ODBC
  80. Compute Node Compute Node Compute Node Leader Node Common BI

    Tools JDBC/ODBC
  81. Elastic.

  82. Compute Node Compute Node Compute Node Leader Node Common BI

    Tools JDBC/ODBC
  83. Compute Node Compute Node Compute Node Leader Node Common BI

    Tools JDBC/ODBC Compute Node Compute Node
  84. Compute Node Compute Node Compute Node Leader Node Common BI

    Tools JDBC/ODBC
  85. Data warehouse node types.

  86. 15GB RAM 2TB local attached storage 3 drives 2 virtual

    cores High Storage Extra Large (XL)
  87. High Storage Extra Large (XL) 15GB RAM 2TB local attached

    storage 3 drives 2 virtual cores 8 High Storage Extra Large (8XL) 120GB RAM 16TB local attached storage 24 drives 16 virtual cores
  88. Pay as you go.

  89. 2 TB nodes 16 TB nodes On-demand $0.850 $6.80 1

    Year Reservation $0.50 $4.00 3 Year Reservation $0.228 $1.824 Hourly Prices
  90. 2 TB nodes 16 TB nodes On-demand $0.850 $6.80 1

    Year Reservation $0.50 $4.00 3 Year Reservation $0.228 $1.824 Hourly Prices
  91. $999 per TB

  92. Don’t pay for the leader node.

  93. No additional storage charge for backups of active clusters.

  94. VPC ready.

  95. Low cost. Easy to use.

  96. Focus on analysis.

  97. Private beta today.

  98. Available early this year.

  99. aws.amazon.com/redshift

  100. 2 billion row dataset. 6 representative queries.

  101. Compared to 32 nodes. 128 CPUs. 4.2 TB RAM. 1.6

    PB storage. 2 billion row data set. Amazon Redshift: 2 instance cluster 12x to 150x faster
  102. 29 minutes 58 seconds down to 12 seconds

  103. Data security. III

  104. Security is our number one priority.

  105. Shared responsibility.

  106. None
  107. Choose your region.

  108. Availability zones.

  109. ITAR FIPS 140-2 MPAA ISO 27001 SOC 2 ISAE 3402

    PCI DSS HIPAA FISMA Moderate
  110. None
  111. “You basically turn yourself into a polymorphic surface to which

    the attack guy has a much tougher time getting at. That, ultimately, is the real key advantage to drive security and make things much better for us across the board.” Gus Hunt, CTO Central Intelligence Agency
  112. Virtual Private Cloud.

  113. Network isolated environment.

  114. Public and private subnets.

  115. Redshift, relational databases, Hadoop can run inside the VPC.

  116. Extend your VPN.

  117. Identity and access federation.

  118. Identity and access management.

  119. Data movement. IV

  120. “How do I get my data into the cloud?”

  121. Generated and stored in the AWS cloud.

  122. Inbound transfer if free.

  123. Multipart upload.

  124. Aspera, IRODS.

  125. Physical media.

  126. AWS Direct Connect.

  127. 1Gbps or 10Gbps

  128. Built in AZ replication.

  129. Regional replication.

  130. “How do I integrate my data?”

  131. Amazon DynamoDB HDFS (Amazon EMR) Amazon S3 Amazon Redshift On

    Premise Amazon RDS
  132. AWS Data Pipeline

  133. Data-intensive orchestration & automation.

  134. Reliable, scheduled data movement and analytics.

  135. aws.amazon.com/datapipeline

  136. aws.amazon.com

  137. I Data, data everywhere

  138. I II Collection & storage Data, data everywhere

  139. I II III Data security Data, data everywhere Collection &

    storage
  140. I II III IV Data movement Data, data everywhere Data

    security Collection & storage
  141. Thank you.

  142. to from ANALYTICS INTELLIGENCE get in touch MATTHEW@AMAZON.COM or @MZA

    AWS.AMAZON.COM