PrestoDB Blog - PrestoDB

Top 3 reasons why you should attend PrestoCon 2023: Halloween Edition

By Ali LeClerc October 31, 2023October 31, 2023

Two days of Presto, hands-on workshops, and Prestissimo…oh my! Happy Halloween, Presto community! Over the last few weeks, I’ve had people reach out to ask more about PrestoCon 2023, so I figured I’d write a blog to share my thoughts. First, a quick overview: When: December 5-6th, 2023 at the Computer History Museum in Mountain View,…

Recapping PrestoCon Day 2023 – Presto for the Data Lakehouse, Presto at scale

By Ali LeClerc June 23, 2023September 14, 2023

Just a few weeks ago we hosted PrestoCon Day, our annual virtual community conference. Thank you to everyone who attended – it was an awesome day! We had a fantastic agenda with many Presto users sharing why they chose Presto and how they’re using it to power some pretty sizable workloads. Sign up for the…

A recap of PrestoCon 2022 – Bringing Data Lakehouse Analytics to Life (plus a special video recap)

By Ali LeClerc January 9, 2023September 14, 2023

Last month the Computer History Museum in Mountain View, California, reverberated with “all things Presto,” at our PrestoCon 2022 conference. Back for the third time—and the first time post-pandemic—PrestoCon was ground zero for training, knowledge sharing, and inspiration about the open-source Presto for data analytics and lakehouses, as well as for the vibrant Presto community….

5 Reasons to attend PrestoCon 2022 on Dec. 7-8.

By Rohan Pednekar & Steven Mih November 28, 2022September 22, 2023

The annual PrestoCon is coming back for its 3rd year and it’s going to be better than ever! If you want to learn how to use Presto with confidence and/or network with data engineers, this is the event for you. PrestoCon 2022 will be held in Mountain View, California on December 7th and 8th. The…

Common Sub-Expression optimization

By Rongrong Zhong November 22, 2021September 21, 2023

The problem One common pattern we see in some analytical workloads is the repeated use of the same, often times expensive expression. Look at the following query plan for example: The expression JSON_PARSE(features) is used 6 times, and casted to different ROW structures for further processing. Traditionally, Presto would just execute the expression 6 times,…

Presto Foundation and PrestoDB: Our Commitment to the Presto Open Source Community

By Girish Baliga, Tim Meehan, Dipti Borkar, Zhenxiao Luo, Steven Mih & Bin Fan June 14, 2021September 21, 2023

We recently wrapped up an amazing PrestoCon Day attended by over 600 people from across the globe. The technical discussions and the panel was a clear indication of the growing community. We showcased a number of features contributed by various companies that continue to advance the mission of Presto open source, reiterating our commitment to…

2020 Recap – A Year with Presto

By Dipti Borkar January 12, 2021September 21, 2023

Tl;dr: 2020 was a huge year for the Presto community. We held our first major conference, PrestoCon, the biggest Presto event ever. We had a massive expansion of our meetup groups with more than 20 sessions held throughout the year, and significant innovations were contributed to Presto! This year has certainly been unique, to say…

Using OptimizedTypedSet to Improve Map and Array Functions

By Ying Su December 4, 2020September 21, 2023

Function evaluation is a big part of projection CPU cost. Recently we optimized a set of functions that use TypedSet, e.g. map_concat, array_union, array_intersect, and array_except. By introducing a new OptimizedTypeSet, the above functions saw improvements in several dimensions: Furthermore, OptimizedTypeSet resolves the long standing issue of throwing EXCEEDED_FUNCTION_MEMORY_LIMIT for large incoming blocks: “The input…

PrestoCon and Growing Industry Consortium – Intel and Upsolver Join Presto Foundation

By Girish Baliga November 20, 2020September 21, 2023

Presto Foundation joined the Linux Foundation over a year ago, and has been focused on growing the Presto open source project and community. We encourage industry involvement with an open charter, clear guiding principles, and community-oriented goals. We recently hosted PrestoCon 2020, our first annual community conference, which was widely attended and well represented by…

Even Faster Unnest

By Ying Su, Maria Basmanova & Orri Erling August 20, 2020September 21, 2023

Unnest is a common operation in Facebook’s daily Presto workload. It converts an ARRAY, MAP, or ROW into a flat relation. Its original implementation used deep copy all the time and was very inefficient. In Unnest Operator Performance Enhancement with Dictionary Blocks, the author improved the Unnest operator by up to 10x in CPU and…

Getting Started with PrestoDB and Aria Scan Optimizations

By Adam Shook August 14, 2020September 21, 2023

This article was originally published by Adam on June 15th, 2020 over at his blog at datacatessen.com. PrestoDB recently released a set of experimental features under their Aria project in order to increase table scan performance of data stored in ORC files via the Hive Connector. In this post, we’ll check out these new features…

PrestoDB and Apache Hudi

By Bhavani Sudha Saktheeswaran August 4, 2020September 21, 2023

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by introducing primitives such as upserts, deletes and incremental queries. These features help surface faster, fresher data on a unified serving layer. Hudi tables can be stored…

Spatial Joins 1: Local Spatial Joins

By James Gill May 7, 2020September 21, 2023

A common type of spatial query involves relating one table of geometric objects (e.g., a table population_centers with columns population, latitude, longitude) with another such table (e.g., a table counties with columns county_name, boundary_wkt), such as calculating for each county the population sum of all population centers contained within it. These kinds of calculations are…

Engineering SQL Support on Apache Pinot at Uber

By Haibo Wang March 18, 2020September 21, 2023

The article, Engineering SQL Support on Apache Pinot at Uber, was originally published by Uber on the Uber Engineering Blog on January 15, 2020. Check out eng.uber.com for more articles about Uber’s engineering work and follow Uber Engineering at @UberEng and Uber Open Source at @UberOpenSouce on Twitter for updates from our teams. Uber leverages…

Announcing PrestoCon 2020: Advancing the Big Data Ecosystem with Presto

By Nezih Yigitbasi February 13, 2020September 21, 2023

On March 24, 2020 in San Mateo, the Presto Foundation, in partnership with The Linux Foundation, will be hosting the organization’s first-ever PrestoCon. The event, one of the first Presto-focused full-day conferences ever held, will feature speakers from Uber, Facebook, and Twitter, as well as tech talks from other major Presto contributors and enthusiasts. Presto,…

Improving the Presto planner for better push down and data federation

By Yi He, James Sun, Maria Basmanova, Rongrong Zhong, Jiexi Lin, Saksham Sachdev & Akshay Pall December 23, 2019September 21, 2023

Presto defines a connector API that allows Presto to query any data source that has a connector implementation. The existing connector API provides basic predicate pushdown functionality allowing connectors to perform filtering at the underlying data source. However, there are certain limitations with the existing predicate pushdown functionality that limits what connectors can do. The…

5 design choices—and 1 weird trick — to get 2x efficiency gains in Presto repartitioning

By Ying Su, Orri Erling, Tim Meehan, Sahar Massachi, Bhavani Hari & Maria Basmanova December 20, 2019September 21, 2023

We like Presto. We like it a lot — so much we want to make it better in every way. Here’s an example: we just optimized the PartitionedOutputOperator. It’s now 2-3x more CPU efficient, which, when measured against Facebook’s production workload, translates to 6% gains overall. That’s huge. The optimized repartitioning is in use on…