Table Scan: Doing The Right Thing With Structured Types
In the previous article we saw what gains are possible when filtering early and in the right order. In this article we look at how we do this with nested and structured types.
September 26, 2019
In the previous article we saw what gains are possible when filtering early and in the right order. In this article we look at how we do this with nested and structured types.
September 23, 2019
We are excited to announce today, in partnership with Alibaba, Facebook, Twitter, and Uber, the launch of the Presto Foundation, a non-profit organization under the umbrella of the Linux Foundation.
Hosting by the Linux Foundation opens up the Presto community to a broader ecosystem of users and contributors. The Presto Foundation's open and neutral governance enables the community to influence Presto's future, which will also make it more attractive to developers. Together, we will raise Presto's performance, scalability, and reliability to new heights that could never have been reached alone.
This is a new chapter for the Presto open source project. We are very excited for what lies ahead!
Full announcement on the Linux Foundation’s website.
Uber's announcement of their participation.
Media coverage
August 19, 2019
In a multi-tenant system like Presto careful memory management is required to keep the system stable and prevent individual queries from taking over all the resources. However, tracking the memory usage of data structures in an application (Presto) running on the Java Virtual Machine (JVM) requires a significant amount of work. In addition, Presto is a distributed system, which makes the problem more complicated. This post provides an overview of how memory management works in Presto, and provides info about the various memory management related JMX counters/endpoints that can be used for monitoring production clusters.
August 5, 2019
Wenlei Xie, Andrii Rosa, Shixuan Fan, Rebecca Schlussel, Tim Meehan
Presto is an open source distributed SQL query engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Presto was originally designed for interactive use cases, however, after seeing the merit in having a single interface for both batch and interactive, it is now also used heavily for processing batch workloads [6]. As a concrete example, more than 80% of new warehouse batch workloads at Facebook are developed on Presto. Its flexible “connector” design makes it possible to run queries against heterogeneous data sources — such as joining together Hive and MySQL tables without preloading the data.
However, memory-intensive (many TBs) and long-running (multiple hours) queries have been major pain points for Presto users. It is difficult to reason how much memory queries will use and when it will hit memory limit, and failures in long-running queries cause retries which create landing time variance. To improve user experience and scale MPP Database to large ETL workloads, we started this Presto Unlimited project.
July 23, 2019
In the previous article we looked at the abstract problem statement and possibilities inherent in scanning tables. In this piece we look at the quantitative upside with Presto. We look at a number of queries and explain the findings.
The initial impulse motivating this work is the observation that table scan is by far the #1 operator in Presto workloads I have seen. This is a little over half of all Presto CPU, with repartitioning a distant second, at around 1/10 of the total. The other half of the motivation is ready opportunity: Presto in its pre-Aria state does almost none of the things that are common in table scan.
June 29, 2019
Orri Erling, Maria Basmanova, Ying Su, Timothy Meehan, Elon Azoulay
Table scan, on the face of it, sounds trivial and boring. What’s there in just reading a long bunch of records from first to last? Aren’t indexing and other kinds of physical design more interesting?
As data has gotten bigger, the columnar table scan has only gotten more prominent. The columnar scan is a fairly safe baseline operation: The cost of writing data is low, the cost of reading it is predictable.
Another factor that makes the table scan the main operation is the omnipresent denormalization in data warehouse. This only goes further as a result of ubiquitous use of lists and maps and other non-first normal form data.
The aim of this series of articles is to lay out the full theory and practice of table scan with all angles covered. We will see that this is mostly a matter of common sense and systematic application of a few principles: Do not do extra work and do the work that you do always in bulk. Many systems like Google’s BigQuery do some subset of the optimizations outlined here. Doing all of these is however far from universal in the big data world, so there is a point in laying this all out and making a model implementation on top of Presto. We are here talking about the ORC format, but the same things apply equally to Parquet or JSON shredded into columns.
June 28, 2019
Presto is a key piece of data infrastructure at many companies. The community has many ongoing projects for taking it to new levels of performance and functionality plus unique experience and insight into challenges of scale.
We are opening this blog as an informal channel for discussing our work as well as technology trends and issues that affect the big data and data warehouse world at large. Our development continues to take place at github and can thus be followed by everybody. Here we seek to have a channel that is more concise and interesting to a broader readership than github issues and code comments would be.
We have current projects like Aria Presto for doubling CPU efficiency and Presto Unlimited for enabling fault tolerant execution of very large queries. We are running one of the world’s largest data warehouses and thus have a unique perspective on platform technologies, e.g. C++ vs. Java, data analytics usage patterns, integration of machine learning and database, data center infrastructure for supporting these and much more. Some of the big questions we are facing have to do with optimizing infrastructure at scale and designing the future of interoperable file formats and metadata. Today we are running ORC on Presto and Spark and system specific file formats for diverse online systems. We are constantly navigating the strait between universality and specialization and keep looking for ways to generalize while advancing functionality and performance.
The Presto user and developer community involves many of the world’s leading technology players. There is exciting work in progress around Presto at many of these companies. We look forward to tracking these too here. Articles from the Presto world are welcome. Stay tuned for everything Presto.