Videos

On-Demand Recordings from PrestoCon’s, Webinars, Meetups, and more

    • How Carbon uses PrestoDB in the Cloud with Ahana to Power its Real-time Customer Dashboards

      How Carbon uses PrestoDB in the Cloud with Ahana to Power its Real-time Customer Dashboards

      Carbon is a real-time revenue management platform that consolidates revenue and audience analytics, data management, and yield operations into a single solution. Real-time analytics is super critical – their customers rely on real-time data to make revenue decisions. After facing issues around performance, visibility & ease of use, and serverless pricing model with AWS Athena, the team moved to a managed service for PrestoDB in the cloud – Ahana Cloud – to power their customer-facing dashboards. In this session, Jordan will discuss some of the reasons the team moved from AWS Athena to a managed PrestoDB on Intel-optimized AWS instances. He will also dive into their current architecture that includes an Ahana-managed Hive Metastore along with Apache ORC file format and an S3-based data lake. Last, he’ll share some performance benchmarks and talk about what’s next for PrestoDB at Carbon.

    • Speeding up Presto Queries Using Apache Hudi Clustering – Satish Kotha & Nishith Agarwal, Uber

      Speeding up Presto Queries Using Apache Hudi Clustering – Satish Kotha & Nishith Agarwal, Uber

      Apache Hudi is a data lake platform that supercharges data lakes. Originally created at Uber, Hudi provides various ways to strike trade-offs between ingestion speed and query performance by supporting user defined partitioners, automatic file sizing which are favorable to query performance. Hudi integrates with PrestoDB to make this data available for queries. During ingestion, data is typically co-located based on arrival time. However, query engines perform better when the data frequently queried is co-located together, which may be different from arrival time order. We will discuss a new framework called “data clustering” to make data lakes adaptable to query patterns, thereby improving query latencies. Finally, we will discuss future work to support improving data locality using custom bucketing of data during ingestion, avoiding some of the rewrite costs.

    • Using Presto’s BigQuery Connector for Better Performance and Ad-hoc Query connector for better performance and ad-hoc query in the Cloud – George Wang & Roderick Yao

      Using Presto’s BigQuery Connector for Better Performance and Ad-hoc Query connector for better performance and ad-hoc query in the Cloud – George Wang & Roderick Yao

      The Google BigQuery connector gives users the ability to query tables in the BigQuery service, Google Cloud’s fully managed data warehouse. In this presentation, we’ll discuss the BigQuery Connector plugin for Presto which uses the BigQuery Storage API to stream data in parallel, allowing users to query from BigQuery tables via gPRC to achieve a better read performance. We’ll also discuss how the connector enables interactive ad-hoc query to join data across distributed systems for data lake analytics.

    • Drag and Drop Query Builder for PrestoDB – Ravishankar Nair, PassionBytes

      Drag and Drop Query Builder for PrestoDB – Ravishankar Nair, PassionBytes

      You use multiple tools for databases, for example Azure Data Studio for SQLServer access, Toad or SQLDeveloper for Oracle access, MySQLWorkbench for MySQL databases. Imagine we have one tool and we can query any database, bring any table from any catalog to a single canvas! Now you join, the underlying PrestoDB compatible query is generated. Click a button, you get the profiled data, including distributions and correlations. An amazing tool in action.

    • Level 101 for Presto: What is PrestoDB?

      Level 101 for Presto: What is PrestoDB?

      In Level 101, you’ll get an overview of Presto, including: A high level overview of Presto & most common use cases The problems it solves and why you should use it A live, hands-on demo on getting Presto running on Docker Real world example: How Twitter uses Presto at scale

    • (Chinese) Presto at Bytedance – Hive UDF Wrapper for Presto

      (Chinese) Presto at Bytedance – Hive UDF Wrapper for Presto

      Presto has been widely used at Bytedance in several ways such as in the data warehouse, BI tools, ads etc. And, the Presto team at Bytedance has also delivered many key features and optimizations such as the Hive UDF wrapper, coordinator, runtime filter and so on which extend Presto usages and enhance Presto stabilities. Nowadays, most companies will use both Hive (or Spark) and Presto together. But Presto UDFs have very different syntax and internal mechanisms compared with Hive UDFs. This restricts Presto usage while users need to maintain 2 kinds of functions. In this talk, we will present a way to execute Hive UDF/UDAF inside Presto.

    • Real Time Analytics at Uber with Presto-Pinot

      Real Time Analytics at Uber with Presto-Pinot

      In this talk, seasoned engineers at Uber will walk through the real time analytics use cases at Uber and the work they have done on the Presto architecture and the Presto-Pinot connector to address them.

    • Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

      Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

      AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and AWS Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. In this talk, Wen will walk through the recently announced AWS Lake Formation and Ahana integration