PrestoDB Blog - PrestoDB

Avoid Data Silos in Presto in Meta: the journey from Raptor to RaptorX

By Rongrong Zhong, James Sun & Ke Wang January 28, 2022September 21, 2023

Raptor is a Presto connector (presto-raptor) that is used to power some critical interactive query workloads in Meta (previously Facebook). Though referred to in the ICDE 2019 paper Presto: SQL on Everything, it remains somewhat mysterious to many Presto users because there is no available documentation for this feature. This article will shed some light…

Scaling with Presto on Spark

By Rohan Pednekar, Shradha Ambekar & Ariel Weisberg October 26, 2021September 21, 2023

Overview Presto was originally designed to run interactive queries against data warehouses, but now it has evolved into a unified SQL engine on top of open data lake analytics for both interactive and batch workloads. Popular workloads on data lakes include: 1. Reporting and dashboarding This includes serving custom reporting for both internal and external…

RaptorX: Building a 10X Faster Presto

By James Sun, Ke Wang, Rohit Jain, Saksham Sachdev, Shixuan Fan, Bin Fan, Zhenxiao Luo & Lu Niu February 4, 2021September 21, 2023

RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. This blog post introduces the hierarchical cache work, which is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat…

Improving Presto Latencies with Alluxio Data Caching

By Rohit Jain June 16, 2020September 21, 2023

The Facebook Presto team has been collaborating with Alluxio on an open source data caching solution for Presto. This is required for multiple Facebook use-cases to improve query latency for queries that scan data from remote sources such as HDFS. We have observed significant improvements in query latencies and IO scans in our experiments. We…

Presto Unlimited: MPP SQL Engine at Scale

By Wenlei Xie, Andrii Rosa, Shixuan Fan, Tim Meehan & Rebecca Schlussel August 5, 2019September 21, 2023

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was originally designed for interactive use cases, however, after seeing the merit in having a single interface for both batch and interactive, it is now also used heavily for processing…