Recap of PrestoCon Day 2024: Presto C++, performance, new connectors, use cases, and so much more

Last week the Linux Foundation/Presto Foundation hosted PrestoCon Day 2024, our annual virtual community event about the open-source Presto project. Throughout the day hundreds of data enthusiasts, engineers, and industry leaders from around the globe joined us to learn more about and celebrate the power of Presto. Big shout out to our sponsors who helped make the event possible – IBM, Uber and Women in Big Data.

We had 31 speakers from 16 companies come together to discuss ranging from Presto C++, the native Presto engine that’s in active development, to use cases in the real world, performance, connectors and integrations, hardware, and GenAI and LLMs. It was awesome!

In this blog I’ll quickly recap the day. Over the next few weeks, we’ll be putting together detailed recaps on many of our PrestoCon Day sessions, so stay tuned for those. If you want to check out of any of the on-demand sessions, they’re all available on our event platform (register for access); eventually, we’ll move them to our Presto Foundation YouTube channel.

Welcome Remarks and Keynote

The event kicked off with a welcome and keynote from Curt Hu, our Chair of the Presto Foundation, Tim Meehan, our Chair of the Technical Steering Committee, and me. We highlighted the growth and achievements of the Presto community over the last year, including 3X growth in commits, our time to close issues and PRs is much quicker, and overall we’ve added many new unique committers and submitters to the project.

Tim also shared more about the latest status of Presto 2.0, the native C++ engine – we now have some supported use cases! This includes Hive for reads and writes, Iceberg for reads (writes coming soon), V1 and V2 tables, and the TPCH connector.

Curt also shared how Presto continues to grow within Uber for SQL on all their data analytics, as well as Uber’s development and initial POCs of Presto C++, both of which are exciting to see.

A Glimpse into the Sessions

It’s hard to pick out what to highlight from PrestoCon Day – all our sessions were jam-packed with new features our community is developing, performance and benchmarking numbers, running Presto in different environments, and what our ecosystem partners are working on.

Here’s a quick rundown of what we heard at PrestoCon Day:

Optimizing Data Analytics at Etisalat Egypt: Mohamed Taha shared why they chose Presto over Trino, and why Presto is a game changer for Etisalat Egypt’s data analytics

Enabling Analytics with Presto at Apna: The Apna team shared their journey with Presto and some impressive stats of their scale

Presto C++ TPC-DS Updates: Aditi Pandit shared the latest TPC-DS benchmarking results and future enhancements planned for Presto C++

Presto 2.0 benchmarking internals at IBM: The IBM team presented their latest benchmarking results of Presto C++ v0.286 and query optimizer

Accelerating Iceberg Queries for CDC: Roy Hasson discussed optimizing MoR with equality deletes in Presto to improve query performance, and the results are great – 5.5 hours to 39 seconds

Exploring Cloud Intelligence on AWS: Henry Clavo shared insights on leveraging Presto for advanced SQL queries and accelerating analytical workflows on AWS and put together a helpful comparison chart of Presto vs. AWS Athena

Other sessions included:

Presto Native Iceberg Support: Ying Su of IBM discussed the latest developments in supporting Apache Iceberg in the Presto native C++ engine.
Unraveling the Non-Deterministic Query Conundrum: Ge Gao, Krishna Pai, and Wei He from Meta presented their work on correctness verification of Prestissimo on non-deterministic queries.
Presto C++ and IBM watsonx.data for the Open Data Lakehouse: We learned more IBM watsonx.data, the first platform offering Presto C++ for enhanced price-performance, as well as its components and use cases
Leveraging TTL in Presto’s Local Cache: Chunxu Tang and Jianjian Xie from Alluxio discussed the implementation of caching time-to-live (TTL) in Presto’s local cache for data privacy and performance optimization.
Detecting and Resolving Performance Hurdles: Goutam Verma of WS02 explored advanced monitoring strategies for detecting and resolving performance issues in Presto clusters.
Presto OpenAPI/HTTP Connector: Andrei Savu from Rippling introduced his work on the OpenAPI HTTP/JSON alternative to the Thrift Presto connector.
Presto Pinot DataLake Segment Reader: Mingjia Hang of Uber presented a new connector to access Pinot segments stored in deep store.
Enhancing Query Performance with Hudi: Ethan Guo from Onehouse explored the innovation and future enhancements of the Presto Hudi connector.
Streamlining Data Analytics with NeuroBlade’s SPU: Deepak Narain from NeuroBlade discussed enhancing the Velox analytics engine through specialized hardware acceleration.
Supporting ML Users with Presto: Pedro Pedreira of Meta highlighted the challenges and opportunities for supporting large-scale ML training datasets with Presto.
Bridging the Divide with Lance Data Lake: Lei Xu of LanceDB and Beinan Wang introduced a vector data lake based on Lance format, integrated with Presto.
Unlocking Language Insights: Satej Sahu discussed integrating Large Language Models into data ecosystems using a custom Presto connector.
Introducing Nimble File Format: Jialiang Tan and Jimmy Lu from Meta presented Nimble, a new file format designed for efficient handling of large datasets.

Reflections and Future Direction

PrestoCon Day 2024 was an incredible showcase of the passion and innovation within the Presto community. The event highlighted not just the technical advancements, but the collaborative spirit that makes Presto special. From tackling complex data challenges to sharing real-world success stories, each session underscored just how powerful and versatile Presto is.

I’ve heard from so many who attended that they left buzzing with new ideas and a renewed enthusiasm for what’s next, especially for what’s to come with Presto C++. The benchmarking numbers are just the start!

I can’t wait to see where we go from here. The future of Presto is bright, and it’s all thanks to the amazing contributions and energy of everyone involved. Let’s keep building, exploring, and pushing the boundaries of what’s possible with Presto. Stay tuned for more exciting developments! Together, we’re making data analytics better and more accessible for everyone.