Prestissimo at IBM

Prestissimo at IBM Aditi Pandit VeloxCon April 2024

Talk Outline • Ahana -> IBM Data & AI •
IBM Watsonx • Prestissimo will be a core-engine for Watsonx.data Open data LakeHouse • Updates • Prestissimo in 2023 • Prestissimo in 2024 • Presto 2.0 ?

IBM WatsonX https://developer.ibm.com/articles/awb-watsonx-enterprise-data-and-ai-platform/

S c r e e n s h o t
s

• SAS • On-premise • BlueRay – Bring your own
cloud

Prestissimo introduction C++ worker Built over the Velox library SIMD
Runtime optimizations Smart I/O prefetching/caching Memory Arbitration features

Benefits of Prestissimo/Velox • Huge performance boost ◦ Query processing
can be done with much smaller clusters. • Avoids performance cliffs ◦ No Java processes, JVM or Garbage collection. ◦ Memory management and SIMD are explicitly controlled in C++. ◦ Memory arbitration improves efficiency. • Easier to build and operate at scale ◦ Reusable and extensible primitives across data engines (like Spark). ◦ Performance can be better understood.

Prestissimo in 2023 (post IBM acquisition) Goals : Inner-source Prestissimo
in the IBM tech stack. Make it available to broader teams. • Watsonx.data dev team, devops, testing, docs, research (DB and Storage), customer demos Build solid Presto OSS team. S/W Dev, Devops

Features • CTAS • S3 Compatible Reader/Writer • Parquet Reader/Writer
• Expand SQL Coverage • TPC-H and TPC-DS for 1K and 10K • Full Presto SQL SELECT statement syntax supported • Type/Function coverage • JWT and Cert based TLS

presto-native-tests status (presto-tests for native) Bugs opened : 33 Bugs
closed : 11

Prestissimo in 2024 • Iceberg Reader • Prestissimo production readiness
• Metrics collection with Prometheus • New Velox system implementation • AsyncDataCache • Spilling • Memory arbitration • TPC-DS benchmark runs

Performance testing and tracking framework • Pbench tool -> Automates
deployments, choosing workloads and running them. • Performance dashboards • Plan comparison tool

TPC-DS 1K results (Presto OSS 0.287) Native : 33.6 mins
Java : 1.15 hrs r5.4xlarge – 8 W (vCPU: 128, Memory: 1024GB)

TPC-DS 10K results Presto OSS 0.287 Native : 1.6 hrs
Java : 2.83 hrs r5.8xlarge – 16 W (vCPU: 512, Memory: 4096GB)

Top issues from TPC-DS runs • Control Velox memory usage
https://github.com/facebookincubator/velox/discussions/9008 • HashProbe performance. listJoinResults very slow for joins with keys with many multiple matches https://github.com/facebookincubator/velox/issues/9078 • Exchange performance

Presto 2.0 Native Engine SQL feature complete Production readiness Performance
improvements Lakehouse formats Iceberg(Reader/Writer) Hudi, Delta Connector SPI Feature lockdown, UDF, UDAs Connector Optimizer Expand Presto optimizer for enterprise use-cases with Prestissimo Open search space. Multi-query block merges, Comprehensive Join enumeration, Cost based logical tx. CTEs. Cardinality estimation. New theoretically sound architecture. Add histogram estimators. HBO. Plan lockdown. Prestissimo focused physical plans.

Team acknowledgements • Ahana :Ethan Zhang (mgr), Aditi Pandit, Deepak
Majeti, Ying Su, Tim Meehan, Karteek Murthy, Pramod Satya • IBM : Ashok Kumar (mgr), Christian Zentgraf, Michael Ohsaka, Minhan Cao, Sujit Madiraju, Prateek Dabre, Soumya Duriseti, Manoj Negi…

Please join our community ! Presto Native Worker Working Group
Prestissimo : GitHub, Slack, LinkedIn, Meetup Velox : GitHub, Slack, Website

Prestissimo at IBM

Prestissimo at IBM

Ali LeClerc

More Decks by Ali LeClerc

Other Decks in Technology

Featured

Transcript

Prestissimo at IBM Aditi Pandit VeloxCon April 2024

Talk Outline • Ahana -> IBM Data & AI •

IBM WatsonX https://developer.ibm.com/articles/awb-watsonx-enterprise-data-and-ai-platform/

S c r e e n s h o t

• SAS • On-premise • BlueRay – Bring your own

Prestissimo introduction C++ worker Built over the Velox library SIMD

Benefits of Prestissimo/Velox • Huge performance boost ◦ Query processing

Prestissimo in 2023 (post IBM acquisition) Goals : Inner-source Prestissimo

Features • CTAS • S3 Compatible Reader/Writer • Parquet Reader/Writer

presto-native-tests status (presto-tests for native) Bugs opened : 33 Bugs

Prestissimo in 2024 • Iceberg Reader • Prestissimo production readiness

Performance testing and tracking framework • Pbench tool -> Automates

TPC-DS 1K results (Presto OSS 0.287) Native : 33.6 mins

TPC-DS 10K results Presto OSS 0.287 Native : 1.6 hrs

Top issues from TPC-DS runs • Control Velox memory usage

Presto 2.0 Native Engine SQL feature complete Production readiness Performance

Team acknowledgements • Ahana :Ethan Zhang (mgr), Aditi Pandit, Deepak

Please join our community ! Presto Native Worker Working Group