2019 - PrestoDB

Improving the Presto planner for better push down and data federation

By Yi He, James Sun, Maria Basmanova, Rongrong Zhong, Jiexi Lin, Saksham Sachdev & Akshay Pall December 23, 2019September 21, 2023

Presto defines a connector API that allows Presto to query any data source that has a connector implementation. The existing connector API provides basic predicate pushdown functionality allowing connectors to perform filtering at the underlying data source. However, there are certain limitations with the existing predicate pushdown functionality that limits what connectors can do. The…

5 design choices—and 1 weird trick — to get 2x efficiency gains in Presto repartitioning

By Ying Su, Orri Erling, Tim Meehan, Sahar Massachi, Bhavani Hari & Maria Basmanova December 20, 2019September 21, 2023

We like Presto. We like it a lot — so much we want to make it better in every way. Here’s an example: we just optimized the PartitionedOutputOperator. It’s now 2-3x more CPU efficient, which, when measured against Facebook’s production workload, translates to 6% gains overall. That’s huge. The optimized repartitioning is in use on…

Join Us! Growing the Presto Foundation in 2020 and Beyond

By Brian Hsieh December 16, 2019September 21, 2023

The Presto Foundation (PF) was established in September 2019 as an openly governed and vendor-neutral body dedicated to scaling and diversifying the Presto community. Hosted by the Linux Foundation, PF and its Governing Board are in a unique position to make Presto the fastest and the most reliable SQL engine for massively distributed data processing….

Table Scan: Doing The Right Thing With Structured Types

By Orri Erling September 26, 2019September 21, 2023

In the previous article we saw what gains are possible when filtering early and in the right order. In this article we look at how we do this with nested and structured types. We use the 100G TPC-H dataset, but now we group top level columns into structs or maps. Maps, lists and structs are…

Presto now hosted under the Linux Foundation

By Ariel Weisberg September 23, 2019September 21, 2023

We are excited to announce today, in partnership with Alibaba, Facebook, Twitter, and Uber, the launch of the Presto Foundation, a non-profit organization under the umbrella of the Linux Foundation. Hosting by the Linux Foundation opens up the Presto community to a broader ecosystem of users and contributors. The Presto Foundation’s open and neutral governance…

Memory Management in Presto

By Nezih Yigitbasi August 19, 2019October 20, 2023

In a multi-tenant system like Presto careful memory management is required to keep the system stable and prevent individual queries from taking over all the resources. However, tracking the memory usage of data structures in an application (Presto) running on the Java Virtual Machine (JVM) requires a significant amount of work. In addition, Presto is…

Presto Unlimited: MPP SQL Engine at Scale

By Wenlei Xie, Andrii Rosa, Shixuan Fan, Tim Meehan & Rebecca Schlussel August 5, 2019September 21, 2023

Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was originally designed for interactive use cases, however, after seeing the merit in having a single interface for both batch and interactive, it is now also used heavily for processing…

Complete Table Scan: A Quantitative Assessment

By Orri Erling July 29, 2019September 21, 2023

In the previous article we looked at the abstract problem statement and possibilities inherent in scanning tables. In this piece we look at the quantitative upside with Presto. We look at a number of queries and explain the findings. The initial impulse motivating this work is the observation that table scan is by far the…

Everything You Always Wanted To Do in Table Scan

By Orri Erling, Maria Basmanova, Ying Su, Tim Meehan & Elon Azoulay June 29, 2019September 21, 2023

Table scan, on the face of it, sounds trivial and boring. What’s there in just reading a long bunch of records from first to last? Aren’t indexing and other kinds of physical design more interesting? As data has gotten bigger, the columnar table scan has only gotten more prominent. The columnar scan is a fairly…

Introducing the Presto blog

By Orri Erling June 28, 2019September 21, 2023

Presto is a key piece of data infrastructure at many companies. The community has many ongoing projects for taking it to new levels of performance and functionality plus unique experience and insight into challenges of scale. We are opening this blog as an informal channel for discussing our work as well as technology trends and…