Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TDC Porto Alegre: O Terceiro Porquê

TDC Porto Alegre: O Terceiro Porquê

Julio Faerman

October 18, 2014
Tweet

More Decks by Julio Faerman

Other Decks in Technology

Transcript

  1. O  Terceiro  Porquê Julio  M.  Faerman   @jmfaerman   TDC

     Porto  Alegre  2014 http://jfaerman.com.br
  2. 16  years   2000+  employees   40  million  user http://aws.amazon.com/solutions/case-­‐studies/netflix/

      http://www.enotechconsulting.com/2013/04/aws-­‐s3-­‐behind-­‐netflix-­‐success/   http://variety.com/2014/digital/news/netflix-­‐youtube-­‐bandwidth-­‐usage-­‐1201179643/ Amazon  Web   Services  for  100%   of  Streaming 34.2%  of  all   downstream   during  primetime
  3. Amazon   Simple   Storage   Service • Durable, scalable

    and fast storage (99.999999999%) • 2+ Trillion (1012) objects • 1.1+ Million RPS • Native HTTP/S • Full featured: Permissions, Static Hosting, Logging, Versionamento, Archival and Expiration Lifecycle, Torrent, Tags, Redundancy, Requester Pays, Criptography, Reduced Redundancy and more DEM O
  4. 1.  “Low,  pay-­‐as-­‐you-­‐go  pricing  with   no  up-­‐front  expenses  or

     long-­‐term   commitments.” 2.  “Instantly  deploy  new   applications,  scale  up  as  your   workload  grows,  and  scale  down   based  on  demand.” http://aws.amazon.com/about-­‐aws/
  5. “We  will  make   electricity  so  cheap   that  only

     the  rich   will  burn  candles.”   Thomas  Edison The  Big  Switch:  http://amzn.com/039334522X
  6. Security   Compliance   Capacity   Fault  Tolerance   Cost

      Complexity   Billing   Scalability   Availability   Latency   Throughput   …
  7. Amazon   Kinesis • Real-time processing of streaming data •

    High Throuput and Elastic • Integrate with Amazon S3, Amazon Redshift, and Amazon DynamoDB • Locking, Sharding, Rollback and more with Kinesis Client Library Dashboard CEP Storage
  8. Amazon   Elastic   MapReduce • Distributed processing with Apache

    Hadoop • Near linear scalability • Resizable and disposable Clusters • Apache Hadoop ecosystem: Hive, Pig, Impala, Spark, ..., …, … • Instant automatic provisioning • Simplified Administration • 5.5M+ Clusters
  9. • Petabyte Scale Data Warehousing • Massively parallel OnLine Analytic

    Processing • Resizable without downtime • Managed provisioning and administration • Compatible with PostgreSQL Amazon   Redshift
  10. Amazon Redshift Architecture Leader Node   • SQL endpoint  

    • Stores metadata   • Coordinates query execution   ! Compute Nodes   • Local, columnar storage   • Execute queries in parallel   • Load, backup, restore via 
 Amazon S3; load from 
 Amazon DynamoDB or SSH   ! Two hardware platforms   • Optimized  for  data  processing   • DW1:  HDD;  scale  from  2TB  to  1.6PB   10 GigE   (HPC) Ingestion   Backup   Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader
 Node
  11. ETL  from  EMR/Hive  to  Amazon  Redshift   trough  Amazon  S3

    EMR S3 Redshift Extract  &  Transform Load ! Unstructured   Unclean   ! ! Structured   Clean   ! Columnar   Compressed  
  12. Amazon   Auto   Scaling • Adjust capacity to demand

    • Automated and customizable provisioning • Integrated monitoring and load balancing • Maintain fleet size across availability zones • On-demmand or scheduled actions DEM O
  13. Videos  e  Palestras:   https://www.youtube.com/user/AmazonWebServices   ! Blogs,  Forum  e

     Comunidade:   http://awshub.com.br   http://aws.amazon.com/blogs/aws/   https://twitter.com/AWSBrasil   https://www.facebook.com/amazonwebservices.pt   https://www.facebook.com/groups/amazon.aws/   ! Cursos:   https://aws.amazon.com/training/   ! Podcast:   http://aws.amazon.com/podcasts/aws-­‐podcast/