Agenda

    All times in PDT timezone

    9:00 AM – 9:15 AM | Welcome Remarks
    Curt Hu
    Curt Hu
    Presto Foundation Governing Board Chair & Sr. Engineering Manager, Uber
    Ali LeClerc
    Ali LeClerc
    Presto Foundation Community Chair & Head of Open Source Strategy, IBM

    Welcome to PrestoCon Day! Join us for a day of all things open-source Presto. You’ll hear more from Presto Foundation Chairs Curt and Ali as they share latest updates from the community and what to expect for the day.

    9:15 AM – 9:45 AM | TSC Keynote
    Tim Meehan
    Tim Meehan
    Presto Foundation TSC Chair & Software Engineer, IBM

    Tim shares the state of the project and what to look for next. We go over advancements in native processing and improvements in the C++ experience, new plugins and functionality to modify Presto, support for table formats such as Iceberg and Delta, and more.

    9:45 AM – 10:15 AM | Interactive Warehouse at Meta
    Rohit Jain
    Rohit Jain
    Software Engineering Manager, Meta

    In this session, we will delve into how Presto is empowering various use cases for both internal and external interactive analytics. We will explore the unique challenges that come with operating at Meta’s scale and discuss our strategies for overcoming them.

    10:15 AM – 11:05 AM | Panel: Building for today’s workloads, designing for tomorrow’s AI: Lessons from Meta, Intuit, and IBM on scaling infrastructure for the future
    Shradha Ambekar
    Shradha Ambekar
    Sr. Staff Software Engineer, Intuit
    Kaushik Ravichandran
    Kaushik Ravichandran
    Software Engineering Manager, Meta
    Anson Kokkat
    Anson Kokkat
    Principal Product Manager, IBM
    Ali LeClerc
    Ali LeClerc
    Head of Open Source Strategy, IBM

    This panel brings together data leaders from Intuit, Meta, and IBM to share how large-scale organizations are architecting modern data platforms for speed, scale, and flexibility. The discussion spans real-world B2C challenges, internal innovations at hyperscale, and perspectives on where the stack is headed, especially in the age of AI.

    11:05 AM – 11:35 AM | Powering a Petabyte-Scale Cache: Uber’s Alluxio Implementation
    Yangjun Zhang
    Yangjun Zhang
    Software Engineer, Uber
    Beinan Wang
    Beinan Wang
    Software Engineer, Uber

    At Uber, Presto is a critical engine for interactive analytics, processing hundreds of thousands of queries and scanning hundreds of petabytes of data daily. To meet the immense demands for low-latency queries and high reliability, Uber advanced its Alluxio deployment by engineering key architectural enhancements for greater scalability and reliability. This customized Alluxio system forms the backbone of our distributed remote caching layer, managing a cache size scaling from 3 to 4 petabytes. This talk will delve into Uber’s strategies for achieving 99.99% cache reliability with this enhanced system, featuring robust client fallback mechanisms and the use of consistent hashing to maintain efficiency during cluster scaling.

    A significant outcome of this implementation is substantial egress bandwidth savings from underlying storage, which is particularly crucial for performance and cost efficiency during peak hours. We will share insights into managing these large-scale cache clusters, highlighting our adaptive cache filter that has been instrumental in achieving over 80% cache hit rates and optimizing resource utilization. Attendees will learn tangible benefits, best practices for leveraging Alluxio with Presto in high-throughput environments, and key takeaways for deploying a similar high-performance caching solution.

    11:35 AM – 12:05 PM | Enhancing Presto C++ capabilities using Sidecar
    Pramod Satya
    Pramod Satya
    Software Engineer, IBM
    Pratik Dabre
    Pratik Dabre
    Software Engineer, IBM

    This talk will go over the current status of the Fusionnext project, focusing on what the native sidecar and sidecar plugin support, and how they should be configured in Presto C++ deployment.

    12:05 PM – 12:20 PM | Lightning Talk: Setting Up a Cross-Platform Development Environment for Presto C++ Using Dev Containers
    Miguel Blanco Godón
    Miguel Blanco Godón
    Software Developer, Denodo
    Paula Santos García-Toriello
    Paula Santos García-Toriello
    Sr. Software Architect, Denodo

    For an open-source project to thrive, it’s crucial to simplify the onboarding process for new contributors. This talk will guide you through setting up a development environment for Presto C++ projects using dev containers. By leveraging dev containers, new contributors can quickly start working on these projects, ensuring consistency and enhancing productivity across various operating systems.

    12:20 PM – 1:00 PM | Break
    1:00 PM – 1:15 PM | Lightning Talk: Prestissimo Extension for AI Training Data Normalization at Meta
    Zac Wen
    Zac Wen
    Software Engineer, Meta
    Xiaoxuan Meng
    Xiaoxuan Meng
    Software Engineer, Meta
    Wenqi Wu
    Wenqi Wu
    Software Engineer, Meta

    In this talk, we present a recent extension to Prestissimo to support AI training data normalization at Meta. We describe the AI Data Storage system built at Meta to deduplicate user sequence data and enable fast retrieval of aggregated data in different dimensions. We then deep dive into the changes made to Prestissimo to allow user sequence data exploration through Presto SQL, including the introduction of a new Index Join Operator and AI Data Storage Connector. Our extension enables optimized index join query plan generation and end-to-end query execution optimization.

    1:15 PM – 1:45 PM | Self-Healing Queries: A Prototype for AI-Assisted Troubleshooting and Auto-Retry in Presto Workloads
    Satej Kumar Sahu
    Satej Kumar Sahu
    Principal Data Engineer, Zalando SE

    When Presto queries fail—due to memory limits, skewed joins, or connector issues—engineers often scramble to diagnose and fix them manually. What if Presto could help fix itself? In this talk, I’ll present a prototype system that captures failed queries, analyzes failure patterns using LLMs, and automatically suggests (or retries) mitigated versions—e.g., adding session configs, breaking queries into smaller parts, or rewriting joins. We’ll cover how it works with query logs, the EXPLAIN plan, retry logic, and Presto’s session properties. This approach offers a glimpse into a future where Presto is not just fast, but smart and resilient.

    1:45 PM – 2:15 PM | Unfenced UDF Deep Dive
    Soumya Duriseti
    Soumya Duriseti
    Software Engineer, IBM

    In this session, we will cover how to get started with creating dynamically loaded user defined functions in Presto C++. There will be an introduction into the pros and cons of using this new functionality. Afterwards, there will be an overview of the process, and finally there will be a demo where the process of loading these functions will be shown live.

    2:15 PM – 2:45 PM | From Source to Presto: A Developer Playground for Fast Analytics
    Rohan Khameshra
    Rohan Khameshra
    Co-founder, Datazip

    This talk introduces a lightweight developer playground that demonstrates how to ingest change data from a transactional database (like Postgres or MySQL), register it via an open-source REST catalog (e.g., Polaris or LakeKeeper), and instantly make it queryable in Presto. The demo will walk through the setup, tools, and real-time experience of how quickly one can go from source data to interactive Presto queries using open standards and pluggable components. Ideal for developers and data engineers exploring modern lakehouse and federated query patterns.

    2:45 PM – 3:00 PM | Lightning Talk: TPCDS connector in Presto C++
    Pramod Satya
    Pramod Satya
    Software Engineer, IBM
    Pratik Dabre
    Pratik Dabre
    Software Engineer, IBM

    Presto has a TPCDS connector that lets users generate TPCDS tables with different scale factors. Recently we worked on adding a TPCDS connector in Presto C++, building on DuckDB’s TPCDS extension. DuckDB’s TPCDS extension provides C++ files that wrap over the dsdgen data generator, which is implemented in C and provided by the TPC organization. We initially added the TPCDS connector in Presto C++, subsequenty the data generation parts including dsdgen source files were moved to Velox. The TPCDS connector lets us generate TPCDS data on the fly for different scale factors in Presto C++, and write microbenchmarks in Velox for various TPCDS queries. In this talk, we will provide an overview of the implementation, look at the challenges faced in ensuring correctness, and compare performance of the connector in Presto and Presto C++.

    3:00 PM – 3:05 PM | Closing Remarks
    Ali LeClerc
    Ali LeClerc
    Head of Open Source Strategy, IBM

    PrestoCon 2025 closing remarks.