Videos

On-Demand Recordings from PrestoCon’s, Webinars, Meetups, and more

    • How to Speed up your Lakehouse Queries by an Order of Magnitude with Multi-modal Index Subsystem using Apache Hudi and Presto

      How to Speed up your Lakehouse Queries by an Order of Magnitude with Multi-modal Index Subsystem using Apache Hudi and Presto

      Sivabalan Narayanan of Onehouse shares more about how Apache Hudi brought transactions, incremental processing on top of data lakes, which are deemed as the foundational pillars for Lakehouse architecture. In this session, we will discuss Apache Hudi and how it fills the key technology gaps in the modern data architecture. Viewed from a data engineering lens, Hudi also plays a key unifying role between the batch and stream processing worlds realized by incremental processing model. We will take a look at the capabilities of native Hudi connector in Presto. We will dive deep into this connector, covering the key optimizations and features it unblocks. Presto users could now leverage the metadata table for optimized file listing and avoid large number of list operations in cloud storages. We will look at how we can improve the query latency in Presto using advanced data skipping methodogies employed with multi-modal sub-system with Hudi.

    • Building Modern Data Lakes for Analytics Using Object Storage – Satish Ramakrishnan, MinIO

      Building Modern Data Lakes for Analytics Using Object Storage – Satish Ramakrishnan, MinIO

      The modern data lake is distributed, unstructured and demands performance and scale – or better stated, performance at scale. Modern object stores are the ideal platform to pair with MPP query engines like Presto – particularly as the scale reaches tens or hundreds of petabytes with tens to hundreds of concurrent queries. In this talk, Satish Ramakrishnan will outline the better together attributes of the two technologies with a focus on the most sophisticated modern object storage features – from throughput optimizations, multi-cloud capabilities, cross-cloud active active replication and lifecycle management. Participants will come away with a reference architecture suited to query processing at object scale.

    • Women in Open Source & Presto – Getting started in the Presto open source ecosystem

      Women in Open Source & Presto – Getting started in the Presto open source ecosystem

      Women in Open Source & Presto – Getting Started in the Presto Open Source Ecosystem – Neha Pawar, Startree; Rebecca Schlussel, Meta; RongRong Zhong, Celonis & Moderated By Dipti Borkar, Microsoft Among GitHub users with at least ten contributions, a mere 6% were women. This is way less than the ratio of women in tech that various research shows at 26%. Given the amount of investment going into and the growth / success of companies based on open source as well as the enormous demand for developers in open source, it is a ratio we need to strive to improve for women. In this panel, we will discuss a few areas: – The journey of each panelist into open source projects – The benefits they have seen by participating in open source projects particularly Presto – The challenges women face in male-dominated open source communities – Ideas, suggestions and guidance to budding engineers on getting started with open source including Presto.

    • Keynote: Data Lakehouse: Country Club or Community Center? – Steven Mih, Co-founder & CEO, Ahana

      Keynote: Data Lakehouse: Country Club or Community Center? – Steven Mih, Co-founder & CEO, Ahana

      Over the last two decades, we’ve seen the birth and emergence of the data lake systems–from the internal walls of Google to modern Lakehouses at Meta/Facebook, which promise the best of both data lake and data warehouse worlds. Equally important is the role open source–and more broadly, openness–has played and will play in this journey. In this talk, Steven will draw his experience with open source distributed systems (Couchbase, Mesosphere, Alluxio, Linux Foundation Presto) to explore the significance of the “5 shades of openness” with respect to the composable open data lakehouse ecosystem.

    • Scaling Cache for Presto Iceberg Connector – Beinan Wang, Alluxio & Chunxu Tang

      Scaling Cache for Presto Iceberg Connector – Beinan Wang, Alluxio & Chunxu Tang

      While using the Presto Iceberg connector, the in-heap cache in Presto is likely overloaded. In this talk, Beinan and Chunxu will share the design, implementation, and optimization of the off-heap cache to address the scalability challenges. You will learn how to cache Iceberg data and metadata for the Presto Iceberg connector, followed by future work on improving table scans using Apache Arrow.

    • Scalable Feature Engineering with Tecton on Athena – Derek Salama, Tecton

      Scalable Feature Engineering with Tecton on Athena – Derek Salama, Tecton

      Tecton is the leading feature platform for real-time machine learning. Rather than build new SQL engines from scratch, Tecton connects to your existing engine to transform raw data into features for machine learning. This talk will cover Tecton’s new integration with Athena for feature engineering. Derek will demonstrate how Tecton with Athena is the fastest way to build feature pipelines and put new models in production.

    • 5 Reasons Why AI Is the Future of SQL – Jared Zhao, AskEdith

      5 Reasons Why AI Is the Future of SQL – Jared Zhao, AskEdith

      SQL remains ubiquitous for data retrieval and analytics, yet can be tedious to write, and is downright unusable for business users. The 2-5 business day turnaround time for data projects is both disruptive and frustrating for business users. Data teams are becoming increasingly overwhelmed, and organizations are pushing to empower their “citizen data analysts.” With the advent of AI English-to-SQL platforms like AskEdith, now anyone can work with and query Presto using plain English questions. AskEdith integrates natively with web interfaces like Ahana for a seamless analytics experience.

    • Ending DAG Distress: Building Self-Orchestrating Pipelines for Presto – Roy Hasson, Upsolver

      Ending DAG Distress: Building Self-Orchestrating Pipelines for Presto – Roy Hasson, Upsolver

      Ending DAG Distress: Building Self-Orchestrating Pipelines for Presto – Roy Hasson, Upsolver dbt and Airflow is a popular combination for creating and scheduling batch data modeling and transformation jobs that execute in a data warehouse like Snowflake. Presto users querying the data lake need a similar solution that is simple to use and makes it easy to ingest, model, transform and maintain datasets, without having to write or manage complex DAGs. In this session you will learn how Upsolver built a tool that allows engineers, developers and analysts to write data pipelines using SQL. Pipelines are automatically orchestrated, are data-aware and maintain a consistent data contract between each stage of the pipeline. You will also learn how to introduce the idea of data products into your company to enable more self-service for your Presto users.

    • Headless BI Architecture and Trade-offs – Pavel Tiunov, Cube Dev

      Headless BI Architecture and Trade-offs – Pavel Tiunov, Cube Dev

      There has been a proliferation of tools in different categories of the modern data stack. This talk will focus on the Headless BI category and Cube’s implementation of Headless BI. Headless BI injects a component between data warehouses and other data sources and tools on the other side of the stack (e.g. CDP, data exploration tools, custom data apps, etc.). This new component encapsulates several critical functions like data modeling, access control, and aggregate awareness while deliberately omitting others, like data visualization and presentation. We’ll explore: – Keeping data models separate from data sources and not substituting data modeling with mere data transformation. – Managing access control centrally, aggregate awareness, and caching in a separate layer upstack from data consumers. – Removing data presentation features and embracing data accessibility via a set of APIs.

    • The Past, Present, and Future of Presto – Philip Bell, Meta

      The Past, Present, and Future of Presto – Philip Bell, Meta

      PrestoDB recently underwent major architectural updates as the Presto Foundation grows membership and is looking to vastly grow the number of new commits and forks. Achieving this desired end state required successful refactoring and improving of Presto’s already impressive speed, efficiency, reliability, and extensibility. Establishing PrestoDB as a premier Open Source project required a major commitment of time and resources from Meta to ensure the community can benefit from this project for years to come, as well as positioning PrestoDB to evolve beyond what Meta alone could create. Members of the Presto Foundation need more of you to be involved in this major evolution in Presto’s history and core components, and bring your own inventive ideas to the mix.

    • A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse

      A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse

      A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse We tend to adopt practices that improve the flexibility of development and the velocity of code deployment, but how confident are we that the complex data system is safe once it arrives in production? We must be able to experiment in production and automate actions while minimizing customer pain and reducing damage to code and data. If your product’s value is derived from data in the shape of analytics or machine learning, losing it, or having corrupted data, can easily translate into pain. In this session, you will discover how chaos engineering principles apply to distributed data systems and the tools that enable us to make our data workloads more resilient. 

    • Presto Tech Talk: Intro to Presto and Superset

      Presto Tech Talk: Intro to Presto and Superset

      Presto and Superset are a powerful combination, because it enables analysts to query data from a data lake environment or join data from multiple data sources. In this event, we’ll do an introductory demo on how to query data from S3 using Presto to build a Superset dashboard.

    • Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

      Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

      AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and AWS Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. In this talk, Wen will walk through the recently announced AWS Lake Formation and Ahana integration.

    • Real Time Analytics at Uber with Presto-Pinot

      Real Time Analytics at Uber with Presto-Pinot

      In this talk, seasoned engineers at Uber will walk through the real time analytics use cases at Uber and the work they have done on the Presto architecture and the Presto-Pinot connector to address them.

    • Presto at Meta: A Guide to Tuning Clusters at Enormous Scale

      Presto at Meta: A Guide to Tuning Clusters at Enormous Scale

      Facebook operates Presto at an enormous scale. A critical part of the success of Presto is properly tuning the clusters according to the use case they target. Swapnil Tailor, Basar Onat and Tim Meehan describe important session properties and configuration properties used to configure Presto, and guidance on when and how to use them.

    • Presto at Bytedance- Hive UDF Wrapper for Presto

      Presto at Bytedance- Hive UDF Wrapper for Presto

      Presto has been widely used at Bytedance in several ways such as in the data warehouse, BI tools, ads etc. And, the Presto team at Bytedance has also delivered many key features and optimizations such as the Hive UDF wrapper, coordinator, runtime filter and so on which extend Presto usages and enhance Presto stabilities. Nowadays, most companies will use both Hive (or Spark) and Presto together. But Presto UDFs have very different syntax and internal mechanisms compared with Hive UDFs. This restricts Presto usage while users need to maintain 2 kinds of functions. In this talk, we will present a way to execute Hive UDF/UDAF inside Presto.

    • Querying streaming data with Presto, Amazon Athena and Upsolver

      Querying streaming data with Presto, Amazon Athena and Upsolver

      In this session, Yoni will present on querying streaming data with Presto and Amazon Athena including performance, data partitioning and compaction. In addition, we will demo using the Upsolver platform with Amazon Athena. In addition, he will share what they are working on with Prestodb.