2021 - PrestoDB

Common Sub-Expression optimization

By Rongrong Zhong November 22, 2021September 21, 2023

The problem One common pattern we see in some analytical workloads is the repeated use of the same, often times expensive expression. Look at the following query plan for example: The expression JSON_PARSE(features) is used 6 times, and casted to different ROW structures for further processing. Traditionally, Presto would just execute the expression 6 times,…

What is Presto on Spark?

By Rohan Pednekar, Shradha Ambekar & Ariel Weisberg November 15, 2021October 19, 2023

1. Reporting and dashboarding This includes serving custom reporting for both internal and external developers for business insights and also many organizations using Presto for interactive A/B testing analytics. A defining characteristic of this use case is a requirement for low latency. It requires tens to hundreds of milliseconds at very high QPS, and not…

Scaling with Presto on Spark

By Rohan Pednekar, Shradha Ambekar & Ariel Weisberg October 26, 2021September 21, 2023

Overview Presto was originally designed to run interactive queries against data warehouses, but now it has evolved into a unified SQL engine on top of open data lake analytics for both interactive and batch workloads. Popular workloads on data lakes include: 1. Reporting and dashboarding This includes serving custom reporting for both internal and external…

Native Parquet Writer for Presto

By Lu Niu & Zhenxiao Luo June 29, 2021September 21, 2023

Overview With the wide deployment of Presto in a growing number of companies, Presto is used not only for queries, but also for data ingestion and ETL jobs. There is a need to improve Presto’s file writer performance, especially for popular columnar file formats, e.g. Parquet, and ORC. In this article, we introduce the brand…

Presto Foundation and PrestoDB: Our Commitment to the Presto Open Source Community

By Girish Baliga, Tim Meehan, Dipti Borkar, Zhenxiao Luo, Steven Mih & Bin Fan June 14, 2021September 21, 2023

We recently wrapped up an amazing PrestoCon Day attended by over 600 people from across the globe. The technical discussions and the panel was a clear indication of the growing community. We showcased a number of features contributed by various companies that continue to advance the mission of Presto open source, reiterating our commitment to…

RaptorX: Building a 10X Faster Presto

By James Sun, Ke Wang, Rohit Jain, Saksham Sachdev, Shixuan Fan, Bin Fan, Zhenxiao Luo & Lu Niu February 4, 2021September 21, 2023

RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. This blog post introduces the hierarchical cache work, which is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat…

2020 Recap – A Year with Presto

By Dipti Borkar January 12, 2021September 21, 2023

Tl;dr: 2020 was a huge year for the Presto community. We held our first major conference, PrestoCon, the biggest Presto event ever. We had a massive expansion of our meetup groups with more than 20 sessions held throughout the year, and significant innovations were contributed to Presto! This year has certainly been unique, to say…