• OVERVIEW
  • DOCS
  • BLOG
  • FAQ
  • COMMUNITY
  • RESOURCES
  • GITHUB

›Recent Posts

Recent Posts

  • RaptorX: Building a 10X Faster Presto
  • 2020 Recap - A Year with Presto
  • Using OptimizedTypedSet to Improve Map and Array Functions
  • PrestoCon and Growing Industry Consortium - Intel and Upsolver Join Presto Foundation
  • Presto Enables Internal Log Data Analysis at Drift

RaptorX: Building a 10X Faster Presto

February 4, 2021

James Sun

James Sun

Facebook: Amit Dutta, Baldeep Hira, Biswapesh Chattopadhyay, James Sun, Jialiang Tan, Ke Wang, Lin Liu, Naveen Cherukuri, Nikhil Collooru, Peter Na, Rohit Jain, Saksham Sachdev, Sergey Pershin, Shixuan Fan, Varun Gajjala

Alluxio: Bin Fan, Calvin Jia, Haoyuan Li

Twitter: Zhenxiao Luo

Pinterest: Lu Niu

RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. This blog post introduces the hierarchical cache work, which is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat performance oriented connectors like Raptor with the added benefit of continuing to work with disaggregated storage.

Read More

2020 Recap - A Year with Presto

January 12, 2021

Dipti Borkar

Tl;dr: 2020 was a huge year for the Presto community. We held our first major conference, PrestoCon, the biggest Presto event ever. We had a massive expansion of our meetup groups with more than 20 sessions held throughout the year, and significant innovations were contributed to Presto!

This year has certainly been unique, to say the least. As chairperson of the Presto Foundation Outreach Committee, the term “outreach” took on a whole new meaning this year. But through the challenges of 2020, we adopted new ways to connect. We continued to build and engage with the Presto community in a new “virtual” way, and I couldn’t be more proud of all we’ve accomplished as a community in 2020.

So what did the Presto Foundation do in 2020?

Read More

Using OptimizedTypedSet to Improve Map and Array Functions

December 4, 2020

Ying Su

Ying Su

Function evaluation is a big part of projection CPU cost. Recently we optimized a set of functions that use TypedSet, e.g. map_concat, array_union, array_intersect, and array_except. By introducing a new OptimizedTypeSet, the above functions saw improvements in several dimensions:

  • Up to 80% reduction in wall time and CPU time in JMH benchmarks
  • Reserved memory reduced by 5%
  • Allocation rate reduced by 80%

Furthermore, OptimizedTypeSet resolves the long standing issue of throwing EXCEEDED_FUNCTION_MEMORY_LIMIT for large incoming blocks: "The input to function_name is too large. More than 4MB of memory is needed to hold the intermediate hash set.”

The OptimizedTypeSet and improvements to the above mentioned functions are merged to master, and will be available from Presto 0.244.

Read More

PrestoCon and Growing Industry Consortium - Intel and Upsolver Join Presto Foundation

November 20, 2020

Girish Baliga

Presto Foundation joined the Linux Foundation over a year ago, and has been focused on growing the Presto open source project and community. We encourage industry involvement with an open charter, clear guiding principles, and community-oriented goals. We recently hosted PrestoCon 2020, our first annual community conference, which was widely attended and well represented by Presto community members. We also warmly welcome Intel and Upsolver who recently joined the Presto Foundation.

Read More

Presto Enables Internal Log Data Analysis at Drift

October 29, 2020

Arun Venkateswaran

I’m a Senior Software Engineer in the data group at Drift, a conversational marketing platform that is used for qualifying leads faster, automatically booking meetings and connecting customers to the right business solutions more efficiently. I’ve used Presto quite a bit throughout my career, and I want to first give readers a quick overview of how Presto has enabled my team at Drift to quickly and cost-effectively analyze distributed logs at scale. Then I will share how we used and benefited from Presto at Vistaprint, where I worked previously.

Read More

Even Faster Unnest

August 20, 2020

Ying Su

Ying Su

Ying Su, Masha Basmanova, Orri Erling

Unnest is a common operation in Facebook’s daily Presto workload. It converts an ARRAY, MAP, or ROW into a flat relation. Its original implementation used deep copy all the time and was very inefficient. In Unnest Operator Performance Enhancement with Dictionary Blocks, the author improved the Unnest operator by up to 10x in CPU and elapsed times by using DictionaryBlock when possible. We went one step further and improved it for another 5-10x.

Read More

Getting Started with PrestoDB and Aria Scan Optimizations

August 14, 2020

Adam Shook

This article was originally published by Adam on June 15th, 2020 over at his blog at datacatessen.com.


PrestoDB recently released a set of experimental features under their Aria project in order to increase table scan performance of data stored in ORC files via the Hive Connector. In this post, we'll check out these new features at a very basic level using a test environment of PrestoDB on Docker. To find out more about the Aria features, you can check out the Facebook Engineering blog post which was published June 2019.

Read More

Building a high-performance platform on AWS to support real-time gaming services using Presto and Alluxio

August 6, 2020

Teng Wang

Authors: Teng Wang, Du Li, Yu Jin and Sundeep Narravula

Electronic Arts (EA) is a leading company in the gaming industry, providing dozens of games to serve billions of users worldwide each year. Making near real-time decisions for EA’s online services is critical for our business. This blog describes a data platform on AWS based on Presto and Alluxio to support online services with instantaneous response within the gaming industry.

Read More

PrestoDB and Apache Hudi

August 4, 2020

Bhavani Sudha Saktheeswaran

Co-author: Brandon Scheller

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by introducing primitives such as upserts, deletes and incremental queries. These features help surface faster, fresher data on a unified serving layer. Hudi tables can be stored on the Hadoop Distributed File System (HDFS) or cloud stores and integrates well with popular query engines such as Presto, Apache Hive, Apache Spark and Apache Impala. Given Hudi pioneered a new model that moved beyond just writing files to a more managed storage layer that interops with all major query engines, there were interesting learnings on how integration points evolved.

In this blog we are going to discuss how the Presto-Hudi integration has evolved over time and also discuss upcoming file listing and query planning improvements to Presto-Hudi queries.

Read More

Running Presto in a Hybrid Cloud Architecture

July 17, 2020

Adit Madan

Migrating SQL workloads from a fully on-premise environment to cloud infrastructure has numerous benefits, including alleviating resource contention and reducing costs by paying for computation resources on an on-demand basis. In the case of Presto running on data stored in HDFS, the separation of compute in the cloud and storage on-premises is apparent since Presto’s architecture enables the storage and compute components to operate independently. The critical issue in this hybrid environment of Presto in the cloud retrieving HDFS data from an on-premise environment is the network latency between the two clusters.

This crucial bottleneck severely limits performance of any workload since a significant portion of its time is spent transferring the requested data between networks that could be residing in geographically disparate locations. As a result, most companies copy their data into a cloud environment and maintain that duplicate data, also known as Lift and Shift. Companies with compliance and data sovereignty requirements may even prevent organizations from copying data into the cloud. This approach is not scalable and requires introducing a lot of manual effort to achieve reasonable results. This article introduces Alluxio to serve as a data orchestration layer to help serve data to Presto efficiently, as opposed to either directly querying the distant HDFS cluster or manually providing a localized copy of the data to Presto in a cloud cluster.

Read More
Next →
Copyright © The Presto Foundation.
All rights reserved. Presto is a registered trademark of LF Projects, LLC.
Please see ourTrademark Policy for more information.
Privacy Policy |Terms of Use.