Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Construyendo aplicaciones intensivas en datos

Construyendo aplicaciones intensivas en datos

"Construyendo aplicaciones intensivas en datos"

Julio Faerman

May 23, 2019
Tweet

More Decks by Julio Faerman

Other Decks in Technology

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Construyendo aplicaciones intensivas en datos Julio Faerman AWS Technical Evangelist @faermanj https://speakerdeck.com/faermanj
  2. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Que son apicaciones “intensivas en datos” Usuarios: 1M+ Volumen: TB–PB–EB Localidade: Global Rendimiento: Milliseconds Ingestion: K-M RPS Accesso: Mobile, IoT, devices Escala: Up-out-in Económica: Pay-as-you-go SLA: Managed Social media Ride hailing Media streaming Dating
  3. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Ejeplos de aplicaciones “intensivas en datos” >20M >300M >1 PB
  4. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Modelos de datos híbridos Relational Referential integrity, ACID transactions, schema- on-write Lift and shift, ERP, CRM, finance Key-value High throughput, low- latency reads and writes, endless scale Real-time bidding, shopping cart, social, product catalog, customer preferences Document Store documents and quickly access querying on any attribute Content management, personalization, mobile In-memory Query by key with microsecond latency Leaderboards, real-time analytics, caching Graph Quickly and easily create and navigate relationships between data Fraud detection, social networking, recommendation engine Time-series Collect, store, and process data sequenced by time IoT applications, event tracking Ledger Complete, immutable, and verifiable history of all changes to application data Systems of record, supply chain, health care, registrations, financial
  5. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Bases de datos de propósito específico Relational Key-value Document In-memory Graph Time-series Ledger DynamoDB Neptune Amazon RDS Aurora Commercial Community Timestream Quantum ElastiCache DocumentDB
  6. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved. con Amazon Aurora
  7. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Bases de datos comerciales del “viejo mundo” Lock-in Proprietary Punitive licensing Very expensive You’ve got mail
  8. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Cambiando para base de datos abiertas
  9. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Aurora § Base de datos relacional construida para la nube § Compatible con MySQL o PostgreSQL § Rendimiento y disponibilidad de bases de datos enterprise a 1/10 del costo Availability and durability Fault-tolerant, self-healing storage; six copies of data across three AZs; continuous backup to S3 Fully managed Managed by RDS: no hardware provisioning, software patching, setup, configuration, or backups Highly secure Network isolation, encryption at rest/transit Performance and scalability 5x throughput of standard MySQL and 3x of standard PostgreSQL; scale-out up to 15 read replicas
  10. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Arquitectura Distribuida Master Replica Replica Replica Master Shared storage volume Replica Replica SQL Transactions Caching SQL Transactions Caching SQL Transactions Caching AZ1 AZ2 AZ3 ü Voting free protocol ü Write performance ü Read scale out ü AZ + 1 failure tolerance ü Instant database redo recovery 4/6 Write Quorum & Local tracking The log is the database
  11. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Write and read throughput Aurora MySQL is 5x faster than MySQL 0 50,000 100,000 150,000 200,000 250,000 MySQL 5.6 MySQL 5.7 MySQL 8.0 Aurora 5.6 Aurora 5.7 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 MySQL 5.6 MySQL 5.7 MySQL 8.0 Aurora 5.6 Aurora 5.7 Write Throughput Read Throughput Using Sysbench with 250 tables and 200,000 rows per table on R4.16XL
  12. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Variabilidad de rendimiento bajo carga Amazon Aurora >200x mas consistente SysBench OLTP (write-only) workload with 250 tables and 200,000 rows per table on R4.16XL 0 2 4 6 8 10 12 0 100 200 300 400 500 600 Time in seconds Write Response Time (seconds) Amazon Aurora MySQL 5.6 on EBS
  13. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Escalando lecturas PAGE CACHE UPDATE Aurora Master 30% Read 70% Write Aurora Replica 100% New Reads Shared Multi-AZ Storage MySQL Master 30% Read 70% Write MySQL Replica 30% New Reads 70% Write SINGLE-THREADED BINLOG APPLY Data Volume Data Volume Logical using complete changes Same write workload Independent storage Physical using delta changes NO writes on replica Shared storage MYSQL READ SCALING AMAZON AURORA READ SCALING
  14. Master Replica Orange Master Blue Master SQL Transactions Caching SQL

    Transactions Caching Aurora Multi-Master Shared Storage Volume Ø No Pessimistic Locking Ø No Global Ordering Ø No Global Commit-Coordination Replica • Membership • Heartbeat • Replication • Metadata Cluster Services 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 1 3 ? T1 T2 AZ1 AZ2 AZ3 Decoupled Decoupled Decoupled Ø Decoupled System Ø Microservices Architecture 2 Ø Optimistic Conflict Resolution
  15. Tolerancia a fallos “AZ+1” Why? Ø In a large fleet,

    always some failures Ø AZ failures have ”shared fate” AZ 1 AZ 2 AZ 3 Quorum break on AZ failure 2/3 read 2/3 write AZ 1 AZ 2 AZ 3 Quorum survives AZ failure 3/6 read 4/6 write How? Ø 6 copies, 2 copies per AZ Ø 2/3 quorum will not work © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  16. Consultas paralelas § Procesamiento distribuido de consultas § procesando más

    cerca de los datos § Reduce la contaminación de buffer pool DATABASE NODE STORAGE NODES PUSH DOWN PREDICATES AGGREGATE RESULTS
  17. Database backtrack Lleva la base de datos a un punto

    en el tiempo sin requerir restauración de backup: § Recuperacion de operaciones no intencionales § No destructive § Ejecutar varias veces para encontrar el punto correcto en el tiempo t0 t1 t2 t0 t1 t2 t3 t4 t3 t4 Rewind to t1 Rewind to t3 Invisible Invisible
  18. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Replicación global © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. § Recuperación de desastres más rápida § Mejor localidad de datos
  19. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Perspectivas de rendimiento Dashboard showing database load § Easy – e.g. drag and drop § Powerful – drill down using zoom in Identifies source of bottlenecks § Sort by top SQL § Slice by host, user, wait events Adjustable time frame § Hour, day, week , month § Up to 2 years of data; 7 days free Max vCPU CPU bottleneck SQL w/ high CPU
  20. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Aurora Serverless Responde a tu carga de aplicación automáticamente Escala la capacidad en <10 segundos. Nueva instancia tiene buffer pool “cálido” Proxy multi-tenant de alta disponibilidad
  21. © 2018, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Web Service Data API Accede a tu base de datos desde aplicaciones HTTP Sentencias SQL empaquetadas como peticiones HTTP Agrupación de conexiones gestionada detrás de proxy Web Service Data API Aurora Serverless
  22. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T SQL vs NoSQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SQL NoSQL
  23. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved. con Amazon DynamoDB
  24. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Bases de datos en amazon.com The world’s largest e-commerce business, Amazon.com, runs on nonrelational cloud databases because of their scale, performance, and maintenance benefits. — Werner Vogels CTO, Amazon
  25. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Bases de datos en amazon.com The world’s largest e-commerce business, Amazon.com, runs on nonrelational cloud databases because of their scale, performance, and maintenance benefits. — Werner Vogels CTO, Amazon
  26. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T RETO Escalar y administrar hasta 8 veces más usuarios durante las horas de pico. SOLUCIÓN DynamoDB almacena las coordenadas GPS de todos los viajes, ahorrando en infraestructura y permitiendo el crecimiento de la plataforma para las 23M personas que usan Lyft en todo el mundo. Lyft >1M paseos / dia, 8x tráfico en pico
  27. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon DynamoDB Servicio de base de datos rápido y flexible para cualquier escala Comprehensive security Encrypts all data by default and fully integrates with AWS Identity and Access Management for robust security Performance at scale Consistent, single-digit millisecond response times at any scale; build applications with virtually unlimited throughput Global database for global users and apps Build global applications with fast access to local data by easily replicating tables across multiple AWS Regions Serverless No hardware provisioning, software patching, or upgrades; scales up or down automatically; continuously backs up your data
  28. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Modos de aprovisionamiento en Amazon DynamoDB
  29. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Modos de aprovisionamiento Limite de consumo máximo Provisioned On-Demand
  30. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T DynamoDB transactions Simplifique su código ejecutando múltiples acciones atómicas y en múltiplas tablas con una sola llamada a la API Single API Call
  31. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved. con Amazon ElastiCache
  32. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Que son apicaciones “intensivas en datos” Usuarios: 1M+ Volumen: GB-TB–PB Localidade: Global Rendimiento: Microseconds Ingestion: M+ RPS Accesso: Mobile, IoT, devices Escala: Up-out-in Económica: Pay-as-you-go SLA: Strict Social media Ride hailing Media streaming Dating
  33. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Percentiles de latencia
  34. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon ElastiCache Totalmente Gestionado Rendimiento Extremo Fácilmente Escalable AWS manages all hardware and software setup, configuration, monitoring In-memory data store and cache for sub-millisecond response times Read scaling with replicas. Write and memory scaling with sharding. Non disruptive scaling
  35. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T What’s New: Redis & Memcached • Redis Streams • SortedSets now have LIST capabilities (POP and BLOCK) ) • HyperLogLogs has an optimized algorithm • Speed Improvements (Jemalloc additions, etc.) • Active Defragmentation • Added In-line HELP command for redis-cli • Native TLS Integration Redis (ElastiCache) • Mo re at h t t p s://aw s.am az o n .co m /re dis/Wh at s_Ne w _R e dis5 • Automated Slab rebalancing • LRU crawler to background-reclaim memory • Faster hash table lookups with murmur3 algorithm Ø Redis 5.0 Ø Memcached 1.5.10
  36. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Visión general de Redis Fast <1ms latency for most commands Open source (5.0.4) Easy to learn Highly available Replication Atomic operations Supports transactions In-memory Powerful ~200 commands, Lua scripting, Geospatial, Pub/Sub Various data structures Strings, lists, hashes, sets, sorted sets, bitmaps, streams, and HyperLogLogs Backup/Restore Enables snapshotting
  37. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Mejores Prácticas de Clustering In-Memory Storage • Reserva de memoria: + 25% (Redis) + Change buffer (optional 10%) • Automatiza el desalojo con eviction policies y TTLs • Escala con alarmas CloudWatch • Usa instancias memory optimized Performance Benchmark operations using Redis Benchmark tool • For more READIOPS — Add replicas • For more WRITEIOPS — Add shards (scale out) • For more network IO— Use network optimized instances and scale out • Use pipelining for bulk reads/writes • Consider Big(O) time complexity for data structure commands
  38. DynamoDB Accelerator (DAX) § Fully managed, highly available: Handles all

    software management, fault tolerant, replication across multi-AZs within a Region § DynamoDB API compatible: Seamlessly caches DynamoDB API calls, no application rewrites required § Write-through: DAX handles caching for writes § Flexible: Configure DAX for one table or many § Scalable: Scales-out to any workload with up to 10 read replicas § Manageability: Fully integrated AWS service: Amazon CloudWatch, Tagging for DynamoDB, AWS Console § Security: Amazon VPC, AWS IAM, AWS CloudTrail, AWS Organizations DynamoDB Your Applications DynamoDB Accelerator Table #1 Table #2
  39. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T GE’s Predix Platform Powered by ElastiCache Redis Using ElastiCache Redis with Open Service Broker, Predix Platform from GE Digital allows developers to easily create Redis clusters with standard, pre-configured parameters, sizing and network security. Developers build container-based stateless applications on AWS and ElastiCache is used to manage session state for these applications. The architecture makes is easy and simple for developers to build applications. Container Runtime VPC Control Plane Data Plane App Server Broker ElastiCache VPC EC2 API GE is the world’s largest digital industrial company. We use [ElastiCache Redis] to make it super easy and simple for developers to use Amazon services. Amazon ElastiCache team implemented the Redis AUTH feature in four regions in two months enabling application level security.” – Amulya Sharma Senior Staff Software Engineer GE Digital “
  40. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Expedia’s Real-time Analytics with ElastiCache Expedia is a leader in the $1 trillion travel industry, with an extensive portfolio that includes some of the world’s most trusted travel brands. Expedia’s real-time analytics application collects data for its “test & learn” experiments on Expedia sites. The analytics application processes ~200 million messages daily. With ElastiCache Redis as caching layer, the write throughput on DynamoDB has been set to 3500, down from 35000, reducing the cost by 6x.” – Kuldeep Chowhan Engineering Manager, Expedia “ Kenesis Firehose Real-time streams of lodging mark data EC2 Redshift Aurora S3 Ingest multiple data streams Join/ compare events ElastiCache (Redis) Reference data on-premises Historical queries on up to 2 years of data Staging near real- time data Operational queries of real-time data
  41. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved. con Amazon Timestream (preview)
  42. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Amazon Timestream (Preview) Base de datos de series temporales rápida, escalable y totalmente gestionada 1,000x faster at 1/10th the cost of relational databases Trillions of daily events Analytics optimized for time series data Serverless Collect fast moving time- series data from multiple sources at the rate of millions of inserts per second Capable of processing trillions of events daily; the adaptive query processing engine maintains steady, predictable performance Built-in analytics for interpolation, smoothing, and approximation to identify trends, patterns, and anomalies No servers to manage; time-consuming tasks such as hardware provisioning, software patching, setup, & configuration done for you
  43. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  44. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  45. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Servicios arquitectados para la nube y para aplicaciones intensivas en datos Servicios específicos, optimizados para cada tipo de carga de trabajo Innovar más rápido através de servicios gestionados Nuestro enfoque Migrar aplicaciones y bases de datos existentes a la nube
  46. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Referencia rápida Situación Solución Existing application Use your existing engine on RDS • MySQL Amazon Aurora, RDS for MySQL • PostgreSQL Amazon Aurora, RDS for PostgreSQL • MariaDB Amazon Aurora, RDS for MariaDB • Oracle Use SCT to determine complexity Amazon Aurora, RDS for Oracle • SQL Server Use SCT to determine complexity Amazon Aurora, RDS for SQL Server New application • If you can avoid relational features DynamoDB • If you need relational features Amazon Aurora In-memory store/cache • Amazon ElastiCache Time series data • Amazon Timestream Track every application change, crypto verifiable. Have a central trust authority • Amazon Quantum Ledger Database (QLDB) Don’t have a trusted central authority • Amazon Managed Blockchain Data Warehouse & BI • Amazon Redshift, Amazon Redshift Spectrum, and Amazon QuickSight Adhoc analysis of data in S3 • Amazon Athena and Amazon QuickSight Apache Spark, Hadoop, HBase (needle in a haystack type queries) • Amazon EMR Log analytics, operational monitoring, & search • Amazon Elasticsearch Service and Amazon Kinesis
  47. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T ¡Gracias! Julio Faerman AWS Technical Evangelist @faermanj https://speakerdeck.com/faermanj