The Computer History Museum in Mountain View, California, once again played host to the Linux Foundation’s PrestoCon 2023 conference on December 5 and 6. Part education, part celebration, and part good old-fashioned networking, this year’s event highlighted the excitement about Prestissimo, aka Presto 2.0.
The three hands-on sessions that made up the first day of PrestoCon — learning the basics of Presto, building an open data lakehouse, and getting started with Prestissimo — were filled to capacity. Note for next year: make sure you sign up for hands-on sessions early.
By the way – we plan to turn these into virtual workshops next year! Make sure to sign up for our community mailing list so you get notified.
Keynotes and Sponsored Keynotes
Following a welcome and opening remarks from Ali LeClerc, PrestoCon Chair and Product Manager at IBM, and Girish Baliga, Presto Foundation Chair and Director of Engineering at Uber, Day 2 of PrestoCon got rolling with a full morning of keynotes:
Tim Meehan, Chair of the Presto Technical Steering Committee and Software Engineer at IBM, explained how the Presto project is at an inflection point, declaring Presto to be “the biggest and healthiest community building a vertically integrated engine for the Open Data Lakehouse.” Tim outlined trends in data management over the years and how the Presto project is positioned to be the first choice for analytics for the lakehouse.
Naveen Cherukuri, Senior Engineering Manager at Meta, shared stories of the evolution and usage of Presto at Meta, and explained Meta’s future roadmap for Presto.
Vikram Murali, IBM’s Vice President, Development – Data and AI Software, discussed how Presto is used at IBM, including why IBM chose Presto to power watsonx.data, the open data lakehouse at IBM, why open source in general is critical to watsonx.data, and some key areas and features within Presto that the IBM Data & AI team is working on and contributing back to the open source project.
Pablo Álvarez-Yanez, a product manager at Denodo, described how Presto works with Denodo, a platform that provides data management capabilities across the distributed modern data landscape, to expand a data lake to an enterprise data fabric.
Srini Gurrapu, Founder & CEO, and Alonso Vega, Software Architect, at Bhuma talked about building real-time data apps for Presto open data lake architectures using modern APIs, with the goal of delivering actionable insights to facilitate business outcomes across all an organization’s data sources.
Satya Krishnaswamy, IBM’s Program Director for Development – IBM Data & AI, provided an in-depth exploration of IBM’s forthcoming strategic investments to elevate query performance within watsonx.data, including preliminary performance results directly from IBM Labs.
Two Tracks of Technical Sessions
Not surprisingly, there was tremendous interest in the PrestoCon track devoted to Prestissimo, the project code name for the new C++ native Presto worker that builds on the open source Velox project and represents the next generation of Presto. Prestissimo sessions included:
- An introduction to Prestissimo presented by Aditi Pandit, Principal Software Engineer at IBM.
- Prestissimo benchmarks — showing 1.7X performance gain compared with PrestoDB on TPC-DS end-to-end tests — from Changyang Gu and Shengxuan Liu, both Software Engineers at ByteDance.
- Accelerating ElasticSearch through Velox, by Sungho Park, Software Engineer at ByteDance.
- A case study presented by Shiyu Gan, Software Engineer at ByteDance, of how an accidental discovery of the difference that batch size makes in aggregation led to two HashAggregation optimizations and appreciation for efficient memory management.
- An exploration of the integration of open source support for hardware acceleration into Presto and Velox, revealing its transformative potential in data analytics, by Krishna Maheshwari, CPO, NeuroBlade.
- A synopsis by Amit Dutta, Software Engineer at Meta, of Meta’s experience running Prestissimo in production for over a year — and what they’ve learned that might help others considering Veloxification.
While Prestissimo garnered plenty of attention, the level of innovation happening in Presto overall was clear from the sessions in our other track, which included:
- Three Uber engineers — Hitarth Trivedi, Yasaman Samei, and Gurmeet Singh — described Presto production best practices at Uber and the infrastructure they’ve developed and use to govern the Presto query workload at their company.
- Zac Blanco, Software Developer at IBM, discussed statistics with sampling using Iceberg on Presto, highlighting how statistics help the optimizer make better decisions during Presto’s query planning phase.
- Meta Research Scientist Feilong Liu presented several optimization rules based on sub-optimal query patterns found in the past year in workloads within Meta.
- Two IBM Research Scientists — Berthold Reinwald and Nasrullah Sheikh — explained how they built a Presto Connector that brings vector search to Presto, and they demonstrated the Presto vector database connector.
- Representing a research collaboration, Dave Cohen, Senior Principal Engineer at Intel; Nesime Tatbul, Senior Research Scientist at Intel Labs and MIT; Christoph Anneser, Manager at the Technical University of Munich; and Ryan Marcus, Assistant Professor at the University of Pennsylvania discussed their development of AutoSteer: an ML-based solution that automatically drives query optimization in any SQL database that exposes tunable optimizer knobs.
- Mingjia Hang, Software Engineer at Uber, outlined Presto Express, a sub-project under Presto governance in Uber, that leverages historical data to predict upcoming query execution times and optimize cluster routing.
- Nadine Farah, Head of Developer Relations at Onehouse, talked about the use of Hudi, DBT, and Presto to push the boundaries of data processing speeds, leading to blazing-fast analytics.
- Two Alluxio representatives — Beinan Wang, Software Engineer, and Hope Wang, Developer Advocate — provided best practices and hints for how to use caching in Presto to overcome challenges such as slow, inconsistent query performance and high API and egress costs when using cloud storage like S3.
- Lyublena Antova, Software Engineer at Meta, told how a History Based Optimizer (HBO) makes query plans more efficient by learning from similar queries in the past.
- Beinan Wang, Software Engineer at Alluxio, and Ajay Gupte, Software Engineer at IBM, teamed up to present a case study showing how PrestoDB and Iceberg can accelerate AI/ML pipelines for computer vision use cases.
Working together as a true collective, Presto Foundation is advancing the Presto project forward in ways not possible if it were owned by a single vendor. PrestoCon 2023 highlighted our progress and was a great opportunity to celebrate our advances and look forward to the future. Plus, it was so great to get together in person. A big thanks to everyone who participated in making PrestoCon 2023 such a huge success, including our sponsors Denodo, Bhuma, IBM, and Uber, and community partneres CDInsights and Women in Big Data.
Stay tuned for our upcoming blog series where we’ll put together session-specific recaps!