
Agenda
All times in PDT timezone


Welcome to PrestoCon Day! Join us for a day of all things open-source Presto. You’ll hear more from Presto Foundation Chairs Curt and Ali as they share latest updates from the community and what to expect for the day.

Tim shares the state of the project and what to look for next. We go over advancements in native processing and improvements in the C++ experience, new plugins and functionality to modify Presto, support for table formats such as Iceberg and Delta, and more.

In this session, we will delve into how Presto is empowering various use cases for both internal and external interactive analytics. We will explore the unique challenges that come with operating at Meta’s scale and discuss our strategies for overcoming them.




This panel brings together data leaders from Intuit, Meta, and IBM to share how large-scale organizations are architecting modern data platforms for speed, scale, and flexibility. The discussion spans real-world B2C challenges, internal innovations at hyperscale, and perspectives on where the stack is headed, especially in the age of AI.


At Uber, Presto is a critical engine for interactive analytics, processing hundreds of thousands of queries and scanning hundreds of petabytes of data daily. To meet the immense demands for low-latency queries and high reliability, Uber advanced its Alluxio deployment by engineering key architectural enhancements for greater scalability and reliability. This customized Alluxio system forms the backbone of our distributed remote caching layer, managing a cache size scaling from 3 to 4 petabytes. This talk will delve into Uber’s strategies for achieving 99.99% cache reliability with this enhanced system, featuring robust client fallback mechanisms and the use of consistent hashing to maintain efficiency during cluster scaling.
A significant outcome of this implementation is substantial egress bandwidth savings from underlying storage, which is particularly crucial for performance and cost efficiency during peak hours. We will share insights into managing these large-scale cache clusters, highlighting our adaptive cache filter that has been instrumental in achieving over 80% cache hit rates and optimizing resource utilization. Attendees will learn tangible benefits, best practices for leveraging Alluxio with Presto in high-throughput environments, and key takeaways for deploying a similar high-performance caching solution.


This talk will go over the current status of the Fusionnext project, focusing on what the native sidecar and sidecar plugin support, and how they should be configured in Presto C++ deployment.


For an open-source project to thrive, it’s crucial to simplify the onboarding process for new contributors. This talk will guide you through setting up a development environment for Presto C++ projects using dev containers. By leveraging dev containers, new contributors can quickly start working on these projects, ensuring consistency and enhancing productivity across various operating systems.



In this talk, we present a recent extension to Prestissimo to support AI training data normalization at Meta. We describe the AI Data Storage system built at Meta to deduplicate user sequence data and enable fast retrieval of aggregated data in different dimensions. We then deep dive into the changes made to Prestissimo to allow user sequence data exploration through Presto SQL, including the introduction of a new Index Join Operator and AI Data Storage Connector. Our extension enables optimized index join query plan generation and end-to-end query execution optimization.

When Presto queries fail—due to memory limits, skewed joins, or connector issues—engineers often scramble to diagnose and fix them manually. What if Presto could help fix itself? In this talk, I’ll present a prototype system that captures failed queries, analyzes failure patterns using LLMs, and automatically suggests (or retries) mitigated versions—e.g., adding session configs, breaking queries into smaller parts, or rewriting joins. We’ll cover how it works with query logs, the EXPLAIN plan, retry logic, and Presto’s session properties. This approach offers a glimpse into a future where Presto is not just fast, but smart and resilient.

In this session, we will cover how to get started with creating dynamically loaded user defined functions in Presto C++. There will be an introduction into the pros and cons of using this new functionality. Afterwards, there will be an overview of the process, and finally there will be a demo where the process of loading these functions will be shown live.

This talk introduces a lightweight developer playground that demonstrates how to ingest change data from a transactional database (like Postgres or MySQL), register it via an open-source REST catalog (e.g., Polaris or LakeKeeper), and instantly make it queryable in Presto. The demo will walk through the setup, tools, and real-time experience of how quickly one can go from source data to interactive Presto queries using open standards and pluggable components. Ideal for developers and data engineers exploring modern lakehouse and federated query patterns.


Presto has a TPCDS connector that lets users generate TPCDS tables with different scale factors. Recently we worked on adding a TPCDS connector in Presto C++, building on DuckDB’s TPCDS extension. DuckDB’s TPCDS extension provides C++ files that wrap over the dsdgen data generator, which is implemented in C and provided by the TPC organization. We initially added the TPCDS connector in Presto C++, subsequenty the data generation parts including dsdgen source files were moved to Velox. The TPCDS connector lets us generate TPCDS data on the fly for different scale factors in Presto C++, and write microbenchmarks in Velox for various TPCDS queries. In this talk, we will provide an overview of the implementation, look at the challenges faced in ensuring correctness, and compare performance of the connector in Presto and Presto C++.

PrestoCon 2025 closing remarks.