PrestoDB Blog - Page 2 of 4

A recap of PrestoCon 2022 – Bringing Data Lakehouse Analytics to Life (plus a special video recap)

By Ali LeClerc January 9, 2023September 14, 2023

Last month the Computer History Museum in Mountain View, California, reverberated with “all things Presto,” at our PrestoCon 2022 conference. Back for the third time—and the first time post-pandemic—PrestoCon was ground zero for training, knowledge sharing, and inspiration about the open-source Presto for data analytics and lakehouses, as well as for the vibrant Presto community….

2022 PrestoDB Community in Review

By Ali LeClerc December 30, 2022September 14, 2023

Hello Presto enthusiasts! We here at the Presto Outreach Committee are absolutely thrilled to be entering the new year of 2023. It’s hard to believe that another year has passed, but as we reflect on the past year, we can’t help but feel grateful for the amazing growth and progress we’ve seen in the Presto…

Presto on AWS at Twilio – Lesson Learned and Optimization

By Ali LeClerc December 28, 2022September 21, 2023

Earlier this month we hosted PrestoCon, a fantastic in-person event that showcased the innovation around the Presto project. In this blog we’ll detail Twilio’s presentation on their Presto use case, including their architecture, key optimizations, and lessons learned. You can also check out their full presentation here. In their session, Twilio engineers Aakash Pradeep and…

Our Presto Credo for the Truly Open Source SQL Query Engine

By Steven Mih, Girish Baliga & Tim Meehan December 8, 2022September 21, 2023

We believe that data analytics should be democratized—and is why we innovate Presto with state-of-the-art database technology. Trusted governance is important to us—and is why we model our project governance and by laws after the Linux Foundation. TO OUR FELLOW DATA ENGINEERS, SOFTWARE DEVELOPERS, AND DATA PLATFORM ENTHUSIASTS: As the use of data analytics and…

Is PrestoDB the most popular Open Source Data Analytics project?

By Ali LeClerc November 30, 2022September 21, 2023

The Presto Foundation is thrilled to announce that today Presto has been awarded “2022 Editors Choice for Top 3 Data and AI Open Source Projects to Watch” from BigDATAwire. Past winners are a true who’s who in the data world including Apache Spark (2020), Apache Kafka (2018), MongoDB (2019), Apache Cassandra, ElasticSearch and Redis (2021)….

5 Reasons to attend PrestoCon 2022 on Dec. 7-8.

By Rohan Pednekar & Steven Mih November 28, 2022September 22, 2023

The annual PrestoCon is coming back for its 3rd year and it’s going to be better than ever! If you want to learn how to use Presto with confidence and/or network with data engineers, this is the event for you. PrestoCon 2022 will be held in Mountain View, California on December 7th and 8th. The…

Presto Parquet Column Encryption

By Xinli Shang July 10, 2022September 21, 2023

Introduction Apache Parquet modular encryption provides encryption at-rest and in-transit at finer-grained. In big data world, data analytic tables are usually very wide with hundreds of columns, while only a small number of columns need to be protected. So the finer-grained access control is a better fit than coarse-grained one like table level access control….

Faster Presto Queries with Parquet Page Index

By Xinli Shang May 10, 2022September 21, 2023

Introduction Today’s data is growing very fast, which creates challenges for query engines like Presto. Presto is a popular interactive query engine, because of its scalability, high performance, and smooth integration with Hadoop. As the volume of data grows, Presto needs to read larger chunks of data and load them into memory, which causes higher…

Disaggregated Coordinator

By Swapnil Tailor, Tim Meehan, Vaishnavi Batni, Abhisek Saikia & Neerad Somanchi April 15, 2022November 15, 2023

Overview Presto’s architecture originally only supported a single coordinator and a pool of workers. This has worked well for many years but created some challenges. To overcome these challenges, we came up with a new design with a disaggregated coordinator that allows the coordinator to be horizontally scaled out across a single pool of workers….

Native Delta Lake Connector for Presto

By Rohan Pednekar & Denny Lee March 15, 2022September 21, 2023

This is a joint publication by the PrestoDB and Delta Lake communities Due to the popularity of both the PrestoDB and Delta Lake projects (more on this below), in early 2020 the Delta Lake community announced that one could query Delta tables from PrestoDB. While popular, this method entailed the use of a manifest file…

Avoid Data Silos in Presto in Meta: the journey from Raptor to RaptorX

By Rongrong Zhong, James Sun & Ke Wang January 28, 2022September 21, 2023

Raptor is a Presto connector (presto-raptor) that is used to power some critical interactive query workloads in Meta (previously Facebook). Though referred to in the ICDE 2019 paper Presto: SQL on Everything, it remains somewhat mysterious to many Presto users because there is no available documentation for this feature. This article will shed some light…

Common Sub-Expression optimization

By Rongrong Zhong November 22, 2021September 21, 2023

The problem One common pattern we see in some analytical workloads is the repeated use of the same, often times expensive expression. Look at the following query plan for example: The expression JSON_PARSE(features) is used 6 times, and casted to different ROW structures for further processing. Traditionally, Presto would just execute the expression 6 times,…

What is Presto on Spark?

By Rohan Pednekar, Shradha Ambekar & Ariel Weisberg November 15, 2021October 19, 2023

1. Reporting and dashboarding This includes serving custom reporting for both internal and external developers for business insights and also many organizations using Presto for interactive A/B testing analytics. A defining characteristic of this use case is a requirement for low latency. It requires tens to hundreds of milliseconds at very high QPS, and not…

Scaling with Presto on Spark

By Rohan Pednekar, Shradha Ambekar & Ariel Weisberg October 26, 2021September 21, 2023

Overview Presto was originally designed to run interactive queries against data warehouses, but now it has evolved into a unified SQL engine on top of open data lake analytics for both interactive and batch workloads. Popular workloads on data lakes include: 1. Reporting and dashboarding This includes serving custom reporting for both internal and external…

Native Parquet Writer for Presto

By Lu Niu & Zhenxiao Luo June 29, 2021September 21, 2023

Overview With the wide deployment of Presto in a growing number of companies, Presto is used not only for queries, but also for data ingestion and ETL jobs. There is a need to improve Presto’s file writer performance, especially for popular columnar file formats, e.g. Parquet, and ORC. In this article, we introduce the brand…

Presto Foundation and PrestoDB: Our Commitment to the Presto Open Source Community

By Girish Baliga, Tim Meehan, Dipti Borkar, Zhenxiao Luo, Steven Mih & Bin Fan June 14, 2021September 21, 2023

We recently wrapped up an amazing PrestoCon Day attended by over 600 people from across the globe. The technical discussions and the panel was a clear indication of the growing community. We showcased a number of features contributed by various companies that continue to advance the mission of Presto open source, reiterating our commitment to…

RaptorX: Building a 10X Faster Presto

By James Sun, Ke Wang, Rohit Jain, Saksham Sachdev, Shixuan Fan, Bin Fan, Zhenxiao Luo & Lu Niu February 4, 2021September 21, 2023

RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. This blog post introduces the hierarchical cache work, which is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat…

2020 Recap – A Year with Presto

By Dipti Borkar January 12, 2021September 21, 2023

Tl;dr: 2020 was a huge year for the Presto community. We held our first major conference, PrestoCon, the biggest Presto event ever. We had a massive expansion of our meetup groups with more than 20 sessions held throughout the year, and significant innovations were contributed to Presto! This year has certainly been unique, to say…