• Get Started
  • Learn
  • Community
  • Blog
  • Docs
  • Slack
  • GitHub
  • Stackoverflow
  • Twitter
  • LinkedIn

›Recent Posts

Recent Posts

  • A recap of PrestoCon 2022 - Bringing Data Lakehouse Analytics to Life (plus a special video recap)
  • Customer-Facing Presto at Rippling - Andy Li, Rippling
  • 2022 PrestoDB Community in Review
  • Presto on AWS at Twilio - Lesson Learned and Optimization
  • Our Presto Credo for the Truly Open Source SQL Query Engine.

A recap of PrestoCon 2022 - Bringing Data Lakehouse Analytics to Life (plus a special video recap)

January 9, 2023

Ali LeClerc

Last month the Computer History Museum in Mountain View, California, reverberated with “all things Presto,” at our PrestoCon 2022 conference. Back for the third time—and the first time post-pandemic—PrestoCon was ground zero for training, knowledge sharing, and inspiration about the open-source Presto for data analytics and lakehouses, as well as for the vibrant Presto community. This year was special however, as it was the first ever in-person PrestoCon event, and I couldn’t have been more thrilled to meet the community, hear how companies are using Presto in production, and learn what’s coming up on the engineering roadmap.

To memorialize this awesome event, we put together a quick 3 minute video of PrestoCon! Check it out, and we hope you enjoy it 🙂

Read More

Customer-Facing Presto at Rippling - Andy Li, Rippling

January 9, 2023

Ali LeClerc

Last month we hosted PrestoCon, a return to in-person events that showcased the community development of Presto. In this blog we’ll detail Rippling’s presentation on their Presto use case, including their architecture, key optimizations, and hard earned lessons. You can also check out their full presentation here.

Read More

2022 PrestoDB Community in Review

December 30, 2022

Ali LeClerc

Hello Presto enthusiasts!

We here at the Presto Outreach Committee are absolutely thrilled to be entering the new year of 2023. It's hard to believe that another year has passed, but as we reflect on the past year, we can't help but feel grateful for the amazing growth and progress we've seen in the Presto community in 2022.

Presto2022Review

Read More

Presto on AWS at Twilio - Lesson Learned and Optimization

December 28, 2022

Ali LeClerc

Earlier this month we hosted PrestoCon, a fantastic in-person event that showcased the innovation around the Presto project. In this blog we’ll detail Twilio’s presentation on their Presto use case, including their architecture, key optimizations, and lessons learned. You can also check out their full presentation here.

Read More

Our Presto Credo for the Truly Open Source SQL Query Engine.

December 8, 2022

Steven Mih, Board member & Treasurer Presto Foundation and Co-founder at Ahana

Co-authors:

  • Girish Baliga, Chair, Presto Foundation, Presto Foundation Member, Engineering at Uber

  • Tim Meehan, Chair, Presto Foundation TSC and Software Engineer at Meta

We believe that data analytics should be democratized—and is why we innovate Presto with state-of-the-art database technology. Trusted governance is important to us—and is why we model our project governance and bylaws after the Linux Foundation.

TO OUR FELLOW DATA ENGINEERS, SOFTWARE DEVELOPERS, AND DATA PLATFORM ENTHUSIASTS:

As the use of data analytics and SQL lakehouses grows, the open-forever Presto distributed SQL query engine has the enduring power to change the world with better data-driven decisions.

We take this moment to reflect on the open source Presto query engine and especially why open source Presto, hosted by the Linux Foundation’s Presto Foundation, is the best choice for those who care about data platforms and state-of-the-art database technology.

We believe:

Read More

Is PrestoDB the most popular Open Source Data Analytics project?

November 30, 2022

Ali LeClerc

Co-author: Rachel Pedreschi

The Presto Foundation is thrilled to announce that today Presto has been awarded “2022 Editors Choice for Top 3 Data and AI Open Source Projects to Watch” from BigDATAwire. Past winners are a true who’s who in the data world including Apache Spark (2020), Apache Kafka (2018), MongoDB (2019), Apache Cassandra, ElasticSearch and Redis (2021). This award underscores what the Linux Foundation's Presto Foundation has known for a long time, that PrestoDB continues to be extremely popular, and we have recently dug into the data to find out more.

Presto Architecture

Read More

5 Reasons to attend PrestoCon 2022 on Dec. 7-8.

November 28, 2022

Rohan Pednekar

Co-author: Steven Mih, Board member, Presto Foundation Member: Ahana

The annual PrestoCon is coming back for its 3rd year and it’s going to be better than ever! If you want to learn how to use Presto with confidence and/or network with data engineers, this is the event for you. PrestoCon 2022 will be held in Mountain View, California on December 7th and 8th. The conference features two days of in-depth training sessions and talks led by some of the best minds in the industry. If you want to learn how to use Presto for data analytics and lakehouses, or simply to get the most out of your data infrastructure, register now and get ready for two exciting days of learning and networking!

Read More

Presto Parquet Column Encryption

July 10, 2022

Xinli Shang

Uber: Xinli Shang

Introduction

Apache Parquet modular encryption provides encryption at-rest and in-transit at finer-grained. In big data world, data analytic tables are usually very wide with hundreds of columns, while only a small number of columns need to be protected. So the finer-grained access control is a better fit than coarse-grained one like table level access control.

In addition, data access restrictions, retention, and encryption at-rest are fundamental security controls. Column encryption with access control at the encryption key can solve all three problems with one unified solution as discussed in another blog One Stone, Three Birds: Finer-Grained Encryption @ Apache Parquet.

Apache Parquet modular encryption has been released in Parquet 1.12.0 and Presto has been updated to 1.12.1. This enables the Presto repository to incorporate the Parquet column encryption.

Read More

Faster Presto Queries with Parquet Page Index

May 10, 2022

Xinli Shang

Uber: Xinli Shang

Uber: Chen Liang

Introduction

Today’s data is growing very fast, which creates challenges for query engines like Presto. Presto is a popular interactive query engine, because of its scalability, high performance, and smooth integration with Hadoop. As the volume of data grows, Presto needs to read larger chunks of data and load them into memory, which causes higher IO, memory usage, and GC time etc.

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

There are some initiatives done earlier to speed up the Presto reading Parquet data, but there is still a lot of data to read. Since the Java version Parquet(parquet-mr 1.11.0) release, a feature called Page Index has been added to speed up the queries by filtering unnecessary Parquet pages in column chunks.

This article discusses this feature, the porting status into Presto and the benchmark testing result.

Read More

Disaggregated Coordinator

April 15, 2022

Swapnil Tailor

Meta: Swapnil Tailor, Tim Meehan, Vaishnavi Batni, Abhisek Saikia, Neerad Somanchi

Overview

Presto's architecture originally only supported a single coordinator and a pool of workers. This has worked well for many years but created some challenges.

  • With a single coordinator, the cluster can scale up to a certain number of workers reliably. A large worker pool running complex, multi-stage queries can overwhelm an inadequately provisioned coordinator, requiring upgraded hardware to support the increase in worker load.
  • A single coordinator is a single point of failure for the Presto cluster.

To overcome these challenges, we came up with a new design with a disaggregated coordinator that allows the coordinator to be horizontally scaled out across a single pool of workers.

Read More
Next →
Copyright © The Presto Foundation.
All rights reserved. Presto is a registered trademark of LF Projects, LLC.
Please see ourTrademark Policy for more information.
Privacy Policy |Terms of Use.