PrestoDB Blog - PrestoDB

2022 PrestoDB Community in Review

By Ali LeClerc December 30, 2022September 14, 2023

Hello Presto enthusiasts! We here at the Presto Outreach Committee are absolutely thrilled to be entering the new year of 2023. It’s hard to believe that another year has passed, but as we reflect on the past year, we can’t help but feel grateful for the amazing growth and progress we’ve seen in the Presto…

Our Presto Credo for the Truly Open Source SQL Query Engine

By Steven Mih, Girish Baliga & Tim Meehan December 8, 2022September 21, 2023

We believe that data analytics should be democratized—and is why we innovate Presto with state-of-the-art database technology. Trusted governance is important to us—and is why we model our project governance and by laws after the Linux Foundation. TO OUR FELLOW DATA ENGINEERS, SOFTWARE DEVELOPERS, AND DATA PLATFORM ENTHUSIASTS: As the use of data analytics and…

Is PrestoDB the most popular Open Source Data Analytics project?

By Ali LeClerc November 30, 2022September 21, 2023

The Presto Foundation is thrilled to announce that today Presto has been awarded “2022 Editors Choice for Top 3 Data and AI Open Source Projects to Watch” from BigDATAwire. Past winners are a true who’s who in the data world including Apache Spark (2020), Apache Kafka (2018), MongoDB (2019), Apache Cassandra, ElasticSearch and Redis (2021)….

Common Sub-Expression optimization

By Rongrong Zhong November 22, 2021September 21, 2023

The problem One common pattern we see in some analytical workloads is the repeated use of the same, often times expensive expression. Look at the following query plan for example: The expression JSON_PARSE(features) is used 6 times, and casted to different ROW structures for further processing. Traditionally, Presto would just execute the expression 6 times,…

Presto Foundation and PrestoDB: Our Commitment to the Presto Open Source Community

By Girish Baliga, Tim Meehan, Dipti Borkar, Zhenxiao Luo, Steven Mih & Bin Fan June 14, 2021September 21, 2023

We recently wrapped up an amazing PrestoCon Day attended by over 600 people from across the globe. The technical discussions and the panel was a clear indication of the growing community. We showcased a number of features contributed by various companies that continue to advance the mission of Presto open source, reiterating our commitment to…

2020 Recap – A Year with Presto

By Dipti Borkar January 12, 2021September 21, 2023

Tl;dr: 2020 was a huge year for the Presto community. We held our first major conference, PrestoCon, the biggest Presto event ever. We had a massive expansion of our meetup groups with more than 20 sessions held throughout the year, and significant innovations were contributed to Presto! This year has certainly been unique, to say…

Using OptimizedTypedSet to Improve Map and Array Functions

By Ying Su December 4, 2020September 21, 2023

Function evaluation is a big part of projection CPU cost. Recently we optimized a set of functions that use TypedSet, e.g. map_concat, array_union, array_intersect, and array_except. By introducing a new OptimizedTypeSet, the above functions saw improvements in several dimensions: Furthermore, OptimizedTypeSet resolves the long standing issue of throwing EXCEEDED_FUNCTION_MEMORY_LIMIT for large incoming blocks: “The input…

Even Faster Unnest

By Ying Su, Maria Basmanova & Orri Erling August 20, 2020September 21, 2023

Unnest is a common operation in Facebook’s daily Presto workload. It converts an ARRAY, MAP, or ROW into a flat relation. Its original implementation used deep copy all the time and was very inefficient. In Unnest Operator Performance Enhancement with Dictionary Blocks, the author improved the Unnest operator by up to 10x in CPU and…

Getting Started with PrestoDB and Aria Scan Optimizations

By Adam Shook August 14, 2020September 21, 2023

This article was originally published by Adam on June 15th, 2020 over at his blog at datacatessen.com. PrestoDB recently released a set of experimental features under their Aria project in order to increase table scan performance of data stored in ORC files via the Hive Connector. In this post, we’ll check out these new features…

PrestoDB and Apache Hudi

By Bhavani Sudha Saktheeswaran August 4, 2020September 21, 2023

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by introducing primitives such as upserts, deletes and incremental queries. These features help surface faster, fresher data on a unified serving layer. Hudi tables can be stored…

Spatial Joins 1: Local Spatial Joins

By James Gill May 7, 2020September 21, 2023

A common type of spatial query involves relating one table of geometric objects (e.g., a table population_centers with columns population, latitude, longitude) with another such table (e.g., a table counties with columns county_name, boundary_wkt), such as calculating for each county the population sum of all population centers contained within it. These kinds of calculations are…

Engineering SQL Support on Apache Pinot at Uber

By Haibo Wang March 18, 2020September 21, 2023

The article, Engineering SQL Support on Apache Pinot at Uber, was originally published by Uber on the Uber Engineering Blog on January 15, 2020. Check out eng.uber.com for more articles about Uber’s engineering work and follow Uber Engineering at @UberEng and Uber Open Source at @UberOpenSouce on Twitter for updates from our teams. Uber leverages…

Improving the Presto planner for better push down and data federation

By Yi He, James Sun, Maria Basmanova, Rongrong Zhong, Jiexi Lin, Saksham Sachdev & Akshay Pall December 23, 2019September 21, 2023

Presto defines a connector API that allows Presto to query any data source that has a connector implementation. The existing connector API provides basic predicate pushdown functionality allowing connectors to perform filtering at the underlying data source. However, there are certain limitations with the existing predicate pushdown functionality that limits what connectors can do. The…

5 design choices—and 1 weird trick — to get 2x efficiency gains in Presto repartitioning

By Ying Su, Orri Erling, Tim Meehan, Sahar Massachi, Bhavani Hari & Maria Basmanova December 20, 2019September 21, 2023

We like Presto. We like it a lot — so much we want to make it better in every way. Here’s an example: we just optimized the PartitionedOutputOperator. It’s now 2-3x more CPU efficient, which, when measured against Facebook’s production workload, translates to 6% gains overall. That’s huge. The optimized repartitioning is in use on…

Introducing the Presto blog

By Orri Erling June 28, 2019September 21, 2023

Presto is a key piece of data infrastructure at many companies. The community has many ongoing projects for taking it to new levels of performance and functionality plus unique experience and insight into challenges of scale. We are opening this blog as an informal channel for discussing our work as well as technology trends and…