Videos

On-Demand Recordings from PrestoCon’s, Webinars, Meetups, and more

    • Presto On Spark: Scaling not Failing with Spark – Ariel Weisberg, Meta & Shradha Ambekar, Intuit

      Presto On Spark: Scaling not Failing with Spark – Ariel Weisberg, Meta & Shradha Ambekar, Intuit

      Presto on Spark is an integration between Presto and Spark that leverages Presto’s compiler/evaluation as a library and Spark’s large scale processing capabilities. It enables a unified SQL experience between interactive and batch use cases. A unified option for batch data processing and ad hoc is very important for creating the experience of queries that scale instead of fail without requiring rewrites between different SQL dialects. In this session, we’ll talk about Presto On Spark architecture, why it matters and its implementation/usage at Intuit.

    • Presto on Kafka at Scale – Yang Yang & Yupeng Fu, Uber

      Presto on Kafka at Scale – Yang Yang & Yupeng Fu, Uber

      Presto is a popular distributed SQL query engine for running interactive analytic queries. Presto provides a Connector API that allows plugins to dozens of data sources, and thus positions itself as a single point of access to a wide variety of data. At Uber, we significantly improved Presto’s Kafka connector to meet Uber’s scale. For example, the new connector allows dynamic Kafka cluster and topic discovery so users can directly query existing Kafka topics without any registration and onboarding process; dynamic schema discovery allows fetching the latest schema without any Presto restart or deployment; smart time range suggestions to users based on Kafka metadata analysis to avoid large-range scans and thus keep the query interactive.

    • Presto on Elastic Capacity – Neerad Somanchi & Abhisek Saikia, Meta

      Presto on Elastic Capacity – Neerad Somanchi & Abhisek Saikia, Meta

      Presto on elastic capacity – Elasticity of a shared fleet is one of the fundamental pillars of the IaaS (Infrastructure-as-a-Service) world. The ability of services to efficiently use both guaranteed and non-guaranteed (opportunistic) capacity is important in such a setting. Presto is great when it runs on guaranteed capacity (i.e, capacity that is fixed and stable). But what if we want Presto to leverage elastic (opportunistic) capacity, i.e, capacity that is shifting, but in a predictable manner (think Amazon EC2 Spot Blocks)? In this lightning presentation, Neerad Somanchi and Abhisek Saikia will talk about how a recent feature developed for Presto can help it efficiently utilize such elastic compute.

    • Presto Connector for DataCTRL – Mario Ceste, Jr., SAP NS2

      Presto Connector for DataCTRL – Mario Ceste, Jr., SAP NS2

      DataCTRL is a data management platform for ingesting large quantities of disparate data sets. We’ve written a connector for Presto which allows our users to leverage the data they’ve ingested using SQL. Integrating Presto with our platform has given our customers a quick and effective way to query their data while also building additional data products.

    • Presto Authorization with Apache Ranger – Reetika Agrawal, Ahana & William Brooks, Privacera

      Presto Authorization with Apache Ranger – Reetika Agrawal, Ahana & William Brooks, Privacera

      Apache Ranger has been the user’s choice to support authorization in various data platforms from small-scale to enterprise-grade production environments. At Ahana, engineers are working on the Presto-Ranger integration, aiming to support global fine-grained data access control across all catalogs for Presto, while also providing auditing and monitoring of user access. We would like to collaborate with the Privacera and share our learnings, what we developed so far, and also hope to shed light on the future work of the Ranger Presto Plugin with Apache Ranger committer.

    • Presto at Tencent at Scale: Usability Extension, Stability Improvement and Performance Optimization – Junyi Huang & Pan Liu

      Presto at Tencent at Scale: Usability Extension, Stability Improvement and Performance Optimization – Junyi Huang & Pan Liu

      Presto has been adopted at Tencent as scale to serve scenarios of ad-hoc queries and interactive queries for different business units. In this talk, we’d like to share our practice of Presto in production. In details, we’ll talk about our works to further improve the stability, extend the usability, and optimize the performance of Presto. The works all together make Presto better fit in our production environment, which we think will also benefit the community.

    • Presto at Bytedance – Pengfei Chang, Bytedance

      Presto at Bytedance – Pengfei Chang, Bytedance


      Presto has been widely used in Bytedance, e.g. DataWarehouse, BI Tools, Ads and so on. Meanwhile the presto team of Bytedance also delivered many important features and optimizations like Hive UDF Wrapper, multiple coordinator, runtime filter and so on which extend Presto usages and enhance Presto stababilities.

    • Prestissimo – Presto-on-Velox for Faster More Efficient Queries – Orri Erling, Meta

      Prestissimo – Presto-on-Velox for Faster More Efficient Queries – Orri Erling, Meta

      We built a drop-in replacement for the Presto worker using C++ and Velox and saw a dramatic improvements in CPU efficiency and latency for interactive queries. We embraced adaptive execution provided by Velox to efficiently evaluate filters pushed down into scan and automatically enable array-based aggregations and joins. We make extensive use of dictionary encodings to achieve zero-copy execution throughout the engine. We allow for vectorization friendly function implementations, provide ASCII-only fast paths and many other tricks. We’d like to share our learnings, early results and future plans. We are looking forward to invite the community to join our efforts in building the next generation of Presto together.

    • Open Source Data Lake Analytics: Trends and Opportunities – Biswapesh Chattopadhyay, Meta

      Open Source Data Lake Analytics: Trends and Opportunities – Biswapesh Chattopadhyay, Meta

      Open source data analytics is undergoing an interesting transformation as the industry rapidly evolves around it. Accelerating migration to the cloud, the rise of immensely well funded proprietary vendors, fast evolving needs of the users all contribute to this. This talk goes into detail about the trends and opportunities in the OSS data analytics space, and a call to action on how this space can stay relevant.

    • Introducing Materialized View in Presto – Rohit Jain, Meta

      Introducing Materialized View in Presto – Rohit Jain, Meta

      The materialized view is a well-known technique in the data world, it is used to increase the performance and efficiency of queries by precomputing and persisting results. We are announcing materialized view support in the PrestoDB in this talk. Please join us to learn more about it.

    • Handling Billions of Messages with PrestoDB in the Country of Pyramids – Ravishankar Nair

      Handling Billions of Messages with PrestoDB in the Country of Pyramids – Ravishankar Nair

      Millions of messages are legacy, and in the new modern world of data, we like “billions”. This is exactly the terminology in the use case we faced from a very prominent client in Egypt. The scenario demanded more attention as this valuable client did multiple proof of the concepts with many other open sources and could not meet exact SLA and needs. The client wanted to have more than a hundred billion( yes, “b”) messages in eight hours to be ingested and further queried without much latency. The presentation will be a live demonstration of how we can architect such a solution with PrestoDB under the hood and some simple but advanced ingestion capabilities and data formats.