A Case Study in API Cost of Running Presto in t...

A Case Study in API Cost of Running Presto in the Cloud at Scale (Hope Wang & Chunxu Tang, Alluxio) | RTA Summit 2024

The migration of data-intensive analytics applications to cloud-native environments promises enhanced scalability and flexibility but introduces complex cost models that pose new challenges to traditional optimization strategies. While on-premises setups focused on speed, cloud deployments require a more nuanced approach, factoring in cloud storage operations costs, which can escalate rapidly in real-world scenarios.

In a presentation, Hope Wang and Chunxu Tang will analyze these challenges through a case study on Presto's large deployment. They will show their findings of unexpected cost implications with standard I/O optimizations like table scans, filters, and broadcast joins when implemented in cloud environments. They will also highlight the need for a paradigm shift in optimizing data-intensive applications for the cloud and advocate for developing new I/O strategies, balancing performance and costs while tailored to cloud ecosystems' unique demands.

By attending this session, you will:

• Understand the complexities and cost challenges of optimizing data-intensive applications in cloud-native environments.

• Gain insights from an in-depth case study on the economic implications of traditional I/O optimizations in cloud deployments.

• Learn about the need for a paradigm shift towards developing I/O strategies that are both performance-efficient and cost-effective in the cloud.

StarTree

May 16, 2024
Tweet

More Decks by StarTree

Other Decks in Technology

Transcript

  1. A Case Study in API Cost of Running Presto in

    the Cloud at Scale Hope Wang, Developer Advocate @ Alluxio Bin Fan, VP of Technology @ Alluxio
  2. The Evolution of the Modern Data Stack Tightly-Coupled MapReduce &

    HDFS Compute-Storage Separation On-Prem HDFS Cloud Data Lake Sequential Access Row-based Files Columnar Files in Lakes 15yr Ago Today More Elastic, Cheaper, Easier to Manage, More Scalable, More Efficient
  3. The Evolution of the Modern Data Stack Tightly-Coupled MapReduce &

    HDFS Compute-Storage Separation On-Prem HDFS Cloud Data Lake Sequential Access Row-based Files Columnar Files in Lakes 10yr+ Ago Today ⚠ Loss of Data Locality ⚠ Different Cost Model ⚠ Less Sequential Data Access
  4. In the On-Prem Days… Data Request is “Free” • For

    infrastructure on-prem, organizations pay for upfront investments in infrastructure for on-premises setups • No more costs per query or data retrieval
  5. Data Request Cost of Cloud Storage Cloud Storage Ingress Egress

    Traffic (per month) Pricing Amazon S3 Free 100 GB - 10 TB $0.09/GB 10 - 50 TB $0.085/GB 50 - 150 TB $0.07/GB > 150 TB $0.05/GB > 500 TB Contact sales Google Cloud Storage Free 0 - 1 TB $0.12/GB 1 - 10 TB $0.11/GB > 10 TB $0.08/GB Customized Contact sales Azure Blob Storage Free 100 GB - 10 TB $0.087/GB 10 TB - 50 TB $0.083/GB 50 TB - 150 TB $0.07/GB 150 TB - 500 TB $0.05/GB > 500 TB Contact sales Cloud Storage Operations Pricing Data Transfer/Egress Pricing (Cross Region, Hybrid/Multi Cloud) Notes: • Pricing applies to AWS US East Ohio region, GCP North America regions, Azure North America regions. As of May 1, 2024. • Source: https://aws.amazon.com/s3/pricing/, https://cloud.google.com/storage/pricing, https://azure.microsoft.com/en-us/pricing/details/bandwidth/, https://azure.microsoft.com/en-us/pricing/details/storage/blobs/
  6. Data Request Size of a Presto Worker • Large volume:

    A Presto cluster can process 1 ~ 10 PB per day. • Small data requests dominate the I/O ◦ 50% requests < 10 KB ◦ 90% requests < 1 MB Cumulative distribution function over ~5 days Source: Rethinking the Cloudonomics of Efficient I/O for Data-Intensive Analytics Applications
  7. A Back-on-Envelope Calculation 1 PB 10 Billion $0.0004 / 1000

    365 Days × × × Data access per day (a medium-sized Presto cluster) 100 KB per request on average Amazon S3 GET request pricing Per Year $1.4 Million =
  8. 10% of your data is hot data Caching Layer between

    compute & storage Add a Source: Alluxio
  9. Cost Reduction by Adding Cache Presto Presto AWS S3 us-east-1

    Without Cache With Cache AWS S3 us-west-1 AWS S3 us-east-1 Frequently Retrieving Data = High GET/PUT Operations Costs & Data Transfer Costs Fast Access with Hot Data Cached AWS S3 us-west-1 Only Retrieve Data When Necessary = Lower S3 Costs … … … … Cache
  10. Performance: Reduce I/O Time I/O Compute I/O Compute Compute I/O

    (first time retrieving remote data) Compute I/O Compute Without Cache With Cache Total job run time is reduced I/O Compute Compute Compute I/O
  11. Alluxio: A Caching Framework to Fit Different Needs Alluxio Edge

    Cache (Local Cache) Alluxio Distributed Cache • Run as a library in the application processes (Presto, Trino, HDFS DataNode) • Leverage local disk NVMe or memory • When the size of hot data fits local disks • Standalone cache service shareable across applications • Cache capacity scales horizontally
  12. Alluxio Edge Cache in Presto Local cache storage Alluxio Caching

    File System On Cache Hit External Storage Presto Worker On Cache Miss HDFS API Calls Alluxio Cache Manager External File System Presto Server JVM • Battle tested in Uber, Meta, Tiktok and etc. • Support Iceberg, Hudi, Delta Lake and Hive tables • Support varied file format such as Paquet, ORC and CSV • Fully optimized for local NVMe storages
  13. Alluxio Edge Cache Data Management Alluxio edge cache provides cache

    eviction and admission • Support LRU and FIFO cache eviction policy • Support customized cache admission policy • Support TTL • Support data quota
  14. Challenges to Run Presto @ Uber with Alluxio • Realtime

    Partition Updates • Cluster Membership Change • Cache Size Restriction Source: Speed Up Presto at Uber with Alluxio Local Cache
  15. Challenge #1: Realtime Partition Updates • A lot of tables/partitions

    are constantly changing ◦ Hudi tables ◦ Queries constantly performing upserts • Causing outdated partitions in cache ◦ Partition may have changed on HDFS, while Alluxio still caches the outdated version Source: Speed Up Presto at Uber with Alluxio Local Cache
  16. Challenge #1: Realtime Partition Updates • Solution: Add Hive latest

    modification time to caching key ◦ hdfs://<path> → hdfs://<path><mod time> • New partition with latest modification gets cached ◦ Each update will create a separate cache copy • Tradeoff: outdated partition still present in cache, wasting caching space until evicted Source: Speed Up Presto at Uber with Alluxio Local Cache
  17. Challenge #2: Cluster Membership Change • A file is pinned

    to a set of worker nodes for cache efficiency ◦ SOFT_AFFINITY in Presto ◦ Default use a modulo hash function to map partitions to nodes • Presto worker nodes may go up/down due to operational activities ◦ Node crash/Maintenance ◦ Ad-hoc Node restart • Hash function map to wrong nodes when node changes, e.g. ◦ For 3 nodes, key#4 → 4 mod 3 = node#1 ◦ One node down, then, key#4 → 4 mod 2 = node#0 ▪ Wrong node, cache miss Source: Speed Up Presto at Uber with Alluxio Local Cache
  18. Challenge #2: Cluster Membership Change Solution: Consistent Hashing • All

    nodes are placed on a virtual ring • Relative ordering of nodes on the ring does not change • Look up the key (file) on the ring • Replication for better robustness Node0 Node1 Node2 Partition Source: Speed Up Presto at Uber with Alluxio Local Cache
  19. Challenge #3: Cache Size Limitation • PBs read by Presto

    queries >> PBs Disks space available on Worker nodes ◦ 74PB of total data accessed daily v.s ~120TB disk space per cluster • Cache inefficiency due to: ◦ High eviction rate ◦ High cache miss rate Source: Speed Up Presto at Uber with Alluxio Local Cache
  20. Challenge #3: Cache Size Limitation • Solution: Selective Cache ◦

    Cache a subset of selected datas ▪ table to cache + partitions to cache ▪ Based on traffic pattern analysis • Greatly increased cache hit rate ▪ From ~65% to >90% { "databases": [{ "name": "database_foo", "tables": [{ "name" : "table_bar", "maxCachedPartitions": 100}]}] } Source: Speed Up Presto at Uber with Alluxio Local Cache
  21. In the dashboard showing during the TPC-DS tests running: •

    Left side shows: total # Cloud API calls / total # API calls * 100% ( eq. API call saving) • Right side shows: total volume read from Cloud / total volume read * 100% (eq. read vol. saving) Cloud Cost Saving
  22. Comparison of query execution time of TPC-DS Query 81 to

    Query 99 without and with Presto local cache. TPC-DS Benchmark of Presto+Alluxio Edge Cache
  23. Type of Cache When to Use 1. Metastore Cache •

    Slow planning time • Slow Hive Metastore • Large tables with hundreds of partitions 2. List File Cache • Overloaded HDFS namenode • Overloaded object store like S3 3. Fragment Result Cache • Duplicated queries 4. Alluxio Edge Cache • Slow or unstable external storage 5. Alluxio Distributed Cache • Cross-region, multi-cloud, hybrid-cloud • Data sharing with other compute engines Choose the Right Cache
  24. THANK YOU! Hope Wang, Alluxio ([email protected]) Bin Fan, Alluxio ([email protected])

    Scan the QR code for a Linktree including great learning resources, exciting meetups & a community of data & AI infra experts!