Videos

On-Demand Recordings from PrestoCon’s, Webinars, Meetups, and more

    • Unraveling the Non Deterministic Query Conundrum for Prestissimo Verification

      Unraveling the Non Deterministic Query Conundrum for Prestissimo Verification

      We will present our work on enabling the correctness verification of Prestissimo on non-deterministic queries for Meta’s Presto production release. Non-deterministic queries constitute a large portion of production traffic, yet their results are not comparable between engines and between engine versions, hence posing a big challenge to the correctness verification for Prestissimo. In this talk, we will share how we divide the problem and leverage Presto Verifier and Velox Fuzzer to rewrite non-deterministic queries and verify correctness at the query level and expression level.

    • Sponsored session: Presto C++ and IBM watsonx.data for the Open Data Lakehouse

      Sponsored session: Presto C++ and IBM watsonx.data for the Open Data Lakehouse

      Learn more about IBM watsonx.data, the Open Data Lakehouse and first platform that offers Presto C++ for better price-performance. In this session, Kevin will dive into the watsonx.data components including Presto C++, Apache Spark, Milvus, and more. Learn how companies are using the watsonx.data platform to power all of their workloads at scale.

    • Enabling analytics with Presto at Apna

      Enabling analytics with Presto at Apna

      Apna is the largest and fastest-growing professional opportunity platform in India. In this session, we will explore Apna’s journey with Presto, including its deployment on Kubernetes and the optimizations implemented to significantly reduce query times. Discover the strategies that have helped Apna achieve efficient and scalable data analytics.

      See slides.

    • PrestoCon Day 2024 Opening Remarks

      PrestoCon Day 2024 Opening Remarks

      Welcome to PrestoCon Day! Join us for a day of all things open-source Presto. You’ll hear more from Presto Foundation Chairs Curt and Ali as they share latest updates from the community and what to expect for the day.

    • Getting started with the new Redis HBO for Presto (Aug 30, 2023)

      Getting started with the new Redis HBO for Presto (Aug 30, 2023)

      Learn more about the new open-source Redis-based Historical Statistics Provider for Presto from Jay Narale, software engineer at Uber who built it. Redis is an open-source in-memory database that integrates with Presto through a dedicated connector. Now with a Redis history-based optimizer, you can enhance the efficiency and speed of query execution for Presto by using historical stats to generate optimized plans for your queries. Jay will cover how the Redis HBO utilizes the in-memory capabilities of Redis to store & analyze historical query execution data, which helps the optimizer make informed decisions about query planning and resource allocation based on the historical patterns of queries, leading to improved execution times and resource utilization.

    • Fireside Chat: Journey to Innovation: Unleashing the Power of Open Source Through Open Governance

      Fireside Chat: Journey to Innovation: Unleashing the Power of Open Source Through Open Governance

      The Presto Foundation is the organization that oversees the development of the Presto open source project. Hosted at the Linux Foundation, the Presto Foundation operates under a community governance model with representation from all its members. In this fireside chat, we’ll hear more from Girish Baliga, Chair of the Presto Foundation, on what it actually means to be a Presto Foundation member and why this governance model is so important for open source projects. We’ll also talk with Vikram Murali of IBM, the newest member of the Presto Foundation. He’ll share more about IBM’s journey to Presto, how they’re using it in IBM’s new watsonx.data lakehouse, and why the Presto Foundation played an important role in IBM’s decision to choose Presto.

    • Velociraptor – The Next Generation of RaptorX – Vladimir Rodionov, Carrot Cache

      Velociraptor – The Next Generation of RaptorX – Vladimir Rodionov, Carrot Cache

      Vladimir Rodionov, founder of Carrot Cache will present the Velociraptor – the next evolution of PrestoDB hierarchical caching framework RaptorX. Velociraptor enables efficient data and meta-data caching well beyond RaptorX limits in terms of number of data files (multi-billions), number of table partitions (multi-millions) and number of table columns (multi-thousands). Velociraptor replaces all five RaptorX caches (Hive meta-data, file list, query result fragments, ORC/Parquet meta-data and data I/O) with a scalable solution, based on Carrot Cache, which does not pollute JVM heap memory, does not affect Java Garbage Collector, keeps all data and meta-data off Java heap memory or on disk and can scale well beyond server’s physical RAM limit. Velociraptor supports server restart, by quickly saving and loading data to/from disk for automatic cache warm up.

    • Presto at Varsity Tutors: Using Federated Queries to Power External Reporting – John Cross

      Presto at Varsity Tutors: Using Federated Queries to Power External Reporting – John Cross

      Varsity Tutors is a learning platform that enables online academic, professional, and enrichment learning. A growing part of their offering partners with school districts to provide customized support for teachers and students. Varsity Tutors for Schools provides external reporting capabilities including student assessments, progress reports, and more. To provide these timely reports, Varsity Tutors (an AWS shop) uses Presto scripts to perform federated queries across MySQL, Postgres, and Redshift and writes data back to S3. They use Ahana Cloud as their managed service for Presto. In this session, John will discuss what technologies they evaluated, why they chose Presto, and their current data architecture including how they handle security for cross-account writes and how they perform upserts into the final reporting database.

    • Utilizing Presto in Projects Involving Billions of Data Points – Biddut Sarker Bijoy, Goava

      Utilizing Presto in Projects Involving Billions of Data Points – Biddut Sarker Bijoy, Goava

      Biddut Sarker Bijoy will discuss his experience utilizing Presto in projects involving billions of data points. Many of the big data projects Biddut Sarker Bijoy worked on have had computational issues, such as the need to shrink datasets or the challenge of making sense of terabytes of data. Nonetheless, Biddut Sarker Bijoy recently worked on a project that required him to clean up a large database. Sometimes we all need to clean up a database. It was not a straightforward one for many reasons. Biddut Sarker Bijoy have found using Presto is the best-fitted one for this problem because of its architectural behavior and how it works with billions of data points within a very short amount of time as well as with a less amount of cost.

    • Keynote Panel: Presto at Scale – Shradha Ambekar, Gurmeet Singh, Neerad Somanchi & Rupa Gangatirkar

      Keynote Panel: Presto at Scale – Shradha Ambekar, Gurmeet Singh, Neerad Somanchi & Rupa Gangatirkar

      Over the last decade Presto has become one of the most widely adopted open source SQL query engines. In use at companies large and small, Presto’s performance, reliability, and efficiency at scale have become critical to many companies’ data infrastructures. In this panel we’ll hear from three of the largest companies running Presto at scale – Meta, Uber, and Intuit. They’ll share more about their learnings, some of their impressive performance metrics with Presto, and what they envision going forward for Presto at their respective companies.

    • Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

      Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

      An optimizer’s plans are only as good as the estimates available for the tables its querying. For queries over recently ingested data that is not yet ANALYZE-d to update table or partition stats, the Presto optimizer flies blind; it is unable to make good query plans and resorts to syntactic join orders. To solve this problem, we propose building ‘Quick Stats’ : By utilizing file level metadata available in open data lake formats such as Delta & Hudi, and by examining stats from Parquet & ORC footers, we can build a representative stats sample at a per partition level. These stats can be cached for use be newer queries, and can also be persisted back to the metastore. New strategies for tuning these stats, such as sampling, can be added to improve their precision.