Videos

On-Demand Recordings from PrestoCon’s, Webinars, Meetups, and more

    • Presto on ARM – Chunxu Tang & Jiaming Mai, Alluxio

      Presto on ARM – Chunxu Tang & Jiaming Mai, Alluxio

      Traditionally, the deployment of Presto has been limited to Intel processors with the x86 architecture. However, with the growing popularity of ARM architecture, Chunxu and Jiaming have extended the Presto ecosystem to ARM and conducted a series of benchmark experiments. Their objective is to evaluate the performance of Presto on ARM architecture and identify key insights from the experiments. In this presentation, Chunxu and Jiaming will share the results of their performance evaluation and discuss some of the most significant findings from their research.

    • PrestoDB in HPE Ezmeral Unified Analytics – Milind Bhandarkar, HPE

      PrestoDB in HPE Ezmeral Unified Analytics – Milind Bhandarkar, HPE


      HPE Ezmeral Unified Analytics is an end-to-end data & AI/ML platform that consists of several popular open-source frameworks for data engineering, data analytics, data science, & ML engineering in a well-integrated packaging. These open-source frameworks include Apache Spark, Apache Airflow, Apache Superset, PrestoDB, MLFlow, Kubeflow, and Feast. This platform is built atop Kubernetes and provides built in security. In this talk we will focus on the role of PrestoDB in Unified Analytics as a fast SQL query engine, and also as a secure data access layer. We will discuss some of our value-additions to PrestoDB, such as a distributed memory-centric columnar caching layer that provides both explicit and transparent caching for dataset fragments, often leading to 3x to 4x query performance. We will conclude by proposing to make caching pluggable in PrestoDB and discussing future directions.

    • Implementing Lakehouse Architecture with Presto at Bolt – Kostiantyn Tsykulenko, Bolt.eu

      Implementing Lakehouse Architecture with Presto at Bolt – Kostiantyn Tsykulenko, Bolt.eu

      Bolt.eu is the first European mobility super-app. We have over 100M users across Europe and Africa and have to deal with data at a large scale on a daily basis (over 100k queries daily). Previously we were using a traditional data warehouse solution based on Redshift but we’ve faced scalability issues that were hard to overcome and after doing our research we chose Presto as the solution. In just a single year we’ve managed to migrate to the Lakehouse architecture using AWS, Presto, Spark and Delta lake. We would like to talk about our journey, some of the challenges we’ve encountered and how we solved them.

    • Exploring New Frontiers: How Apache Flink, Apache Hudi and Presto Power New Insights and Gold Nuggets at Scale

      Exploring New Frontiers: How Apache Flink, Apache Hudi and Presto Power New Insights and Gold Nuggets at Scale

      Danny Chan & Sagar Sumit, Onehouse – In this talk, attendees will walk away with: – The current challenges of analytics on transactional data systems with data streams at scale – How the Hudi unlocks incremental processing on the lake – How Presto allows ad-hoc queries that support data exploration on Flink data – How you can leverage Flink, Hudi and Presto to build incremental materialized views

    • Presto at Adobe: How Adobe Advertising uses Presto for Adhoc Query, Custom Reporting, and Internal Pipelines

      Presto at Adobe: How Adobe Advertising uses Presto for Adhoc Query, Custom Reporting, and Internal Pipelines

      Rajmani Arya, Varun Senthilnathan & Manoj Kumar Dhakad, Adobe Advertising: We are from the Product Engineering team in Adobe Advertising (https://business.adobe.com/in/product…. Adobe Advertising is a digital advertisement platform. We take care of accumulating all data, providing platform intelligence, building and maintaining machine leaning capabilities, building and maintaining internal pipelines that form derived data to be used by other teams. The volume of total incoming raw data ranges between 8 to 10 tb/ day spread across 7 regions. The total data in the system currently is about 7pb. This data is largely stored in Hive tables with a central metastore. We use Presto in three ways: 1. Data studio – an internal tool to enable data analysts, sales, marketing and other teams to do adhoc querying. This is also used by data engineers to do adhoc querying for engineering tasks. 2. Custom Reports – We create reports for customers to get performance insights on their campaigns. We have 100s of reports that are run on a daily basis. 3. Internal Pipelines – Presto is used to retrieve data to power 100s of pipelines run daily to generate derived data.

    • Simplifying Data Management through Metadata Integrations and AI Infusion – Kevin Shen, IBM

      Simplifying Data Management through Metadata Integrations and AI Infusion – Kevin Shen, IBM

      In this demo we’ll go through two key pieces of watsonx.data, IBM’s new Data Lakehouse offering. Multiple analytics engines working on the same data: – Demo: Multiple engines working on the same data set so you can use the analytics tools you love without having to deal with the ugly plumbing Semantic Automation: Leverage AI to simplify data discovery and manipulation, allowing your data to work for you – Demo: Using a chat interface to find tables of relevance and how AI can enrich data sets with semantic information

    • Speeding Up Presto in ByteDance – Shengxuan Liu, Bytedance & Beinan Wang, Alluxio

      Speeding Up Presto in ByteDance – Shengxuan Liu, Bytedance & Beinan Wang, Alluxio

      Shengxuan Liu from ByteDance and Beinan Wang from Alluxio will present the practical problems and interesting findings during the launch of Presto Router and Alluxio Local Cache. Their talk covers how ByteDance’s Presto team implements the cache invalidation and dashboard for Alluxio’s Local Cache. Shengxuan will also share his experience using a customized cache strategy to improve the cache efficiency and system reliability.

    • Presto on AWS Journey at Twilio – Lesson Learned and Optimization – Aakash Pradeep & Badri Tripathy

      Presto on AWS Journey at Twilio – Lesson Learned and Optimization – Aakash Pradeep & Badri Tripathy

      Twilio as a leader in cloud communication platforms is very heavy on data and data-based decision making. Most data related use cases are currently powered by the Presto engine. Two years back we started the Journey with Presto in Twilio and today the system has scaled to a multi-PB data lakehouse and supports more than 75k queries per day. In this journey, we learned a lot about how to effectively operationalize Presto on AWS and some of the tricks to have better query reliability, query performance, guard-railing the clusters and save cost. With this talk, we want to share this experience with the community.

    • Customer-Facing Presto at Rippling – Andy Li, Rippling

      Customer-Facing Presto at Rippling – Andy Li, Rippling

      Presto is used for a variety of cases, but tends to be used for larger scale analytical queries. We have been transitioning to using Presto to power our data platform and customer-facing scripting language, RQL (Rippling Query Language) to run arbitrary customer queries to power core products. Presto helps enable diverse, federated querying at scale. In this talk, Andy will cover where Presto sits in Rippling’s ecosystem as a core query layer, our collaboration and contributions for closer integration with Apache Pinot, and learnings on using Presto to handle a large variety of query patterns.

    • Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

      Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

      Data processing systems have evolved significantly over the last decade, driven by various factors such as the advent of cloud computing, increasingly complexity of applications such as ML, HTAP, Streaming, Observability and Graph processing. However, historically, these frameworks have evolved independently, leading to significant fragmentation of the stack. In this talk, I will talk about how this has evolved in the open source and at Meta, and how we are solving this problem through the Shared Foundations effort, leading to composable systems. This has resulted in significantly better performance, more features, higher engineering velocity and a more consistent user experience.

    • Building Large-scale Query Operators and Window Functions for Prestissimo using Velox – Aditi Pandit

      Building Large-scale Query Operators and Window Functions for Prestissimo using Velox – Aditi Pandit

      In this talk, Aditi Pandit, Principal Software Engineer at Ahana and Presto/Velox contributor, will throw the covers back on some of the most interesting portions of working in Prestissimo and Velox. The talk will be based on the experience of implementing the windowing functions in Velox. It will cover the nitty gritty on the vectorized operator, memory management and spilling. This talk is perfect for anyone who is using Presto in production and wants to understand more about the internals, or someone who is new to Presto and is looking for a deep technical understanding of the architecture.

    • The Future of Presto’s Query Optimizer – Bill McKenna, Ahana

      The Future of Presto’s Query Optimizer – Bill McKenna, Ahana

      In this talk, you will hear from the query optimizer OG himself, Bill McKenna (Principal software engineer at Ahana, Architect for the query optimizer that became the code base of the Amazon Redshift query optimizer, and co-author of The Volcano Optimizer Generator: Extensibility and Efficient Search) go into detail about the state of modern query optimizers, and how Presto stacks up against them and where it will go in the near future. If database theory is your jam, you won’t want to miss this deeply technical presentation from one of the pioneers in the field.