What is Presto?

Fast and reliable SQL query engine for data analytics and the open lakehouse

For data engineers who struggle with managing multiple query languages and interfaces to siloed databases and storage, Presto is the fast and reliable engine that provides one simple ANSI SQL interface for all your data analytics and your open lakehouse.

Presto Tech Talk: Intro to Presto and Superset video

Key Innovation

Some of the biggest companies in the world are contributing to the Presto open source project. These key innovations are only available in Linux Foundation Presto today.

Presto C++

A full rewrite of the Presto query execution engine built on Velox, a state-of-the-art execution engine designed to be composable across compute engines. The goal is to bring 3-4x improvement in performance and scalability.
Blog | Docs

Historical-Based Optimization Framework

The HBO framework enables advanced query optimization techniques by leveraging historical execution statistics. This approach offers a more efficient query execution strategy through its unique cost estimation, plan transformations, and the incorporation of historical data.
Blog | Paper

Caching with RaptorX

Disaggregate storage from compute for low latency to provide a unified, cheap, fast and scalable solution to OLAP and interactive use cases.
Blog | Presentation

Disaggregated Coordinator (aka Fireball)

Scale out the coordinator horizontally and revamp the RPC stack.
Github | Blog

ETL with Presto-on-Spark

Presto on Spark is an integration between Presto and Spark that leverages Presto’s compiler/evaluation as a library and Spark’s large scale processing capabilities. It enables a unified SQL experience between interactive and batch use cases
Docs

User Defined Functions

Support for dynamic SQL functions (available in experimental mode)
Docs

Research Papers

Presto: SQL on Everything (2019)
Meta

Presto: A Decade of SQL Analytics at Meta (2023)
Meta, Alluxio, Ahana Cloud

Presto’s History-based Query Optimizer (2024)
Meta, Uber

Why Presto?

One Language

Different engines for different workloads means you will have to re-platform down the road.

With Presto, you get 1 familiar ANSI SQL language and 1 engine for your data analytics so you don’t need to graduate to another lakehouse engine. Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users.

One Interface

Most data teams have different engines for different workloads on their data lake storage, and each engine has its own language and interface.

Presto gives you one simple ANSI SQL interface for all of your data in various siloed data systems, helping you join your data ecosystem together. Presto’s connector architecture enables you to query data where it lives.

Fast, Reliable & Efficient

Data infrastructure costs can explode, especially with proprietary systems like data warehouses, as the data size and users workloads grow.

Presto is battle-tested at Meta and Uber and can scale to meet growing data sizes and workloads. It’s faster and more efficient than other engines because it’s optimized for large numbers of small queries, so you can query data at better price-performance compared to proprietary systems.

Get Started With Presto

Use Cases

Ad-hoc Query

Use SQL to run ad hoc queries whenever you want, wherever your data resides. Presto allows you to query data where it’s stored so you don’t have to ETL data into a separate system.

Reporting and dashboarding

Query data across multiple sources to build one Presto view of reports and dashboards for Presto self-service BI business intelligence.

Open Lakehouse

Through one interface, Presto acts as more than just a query engine as it sits at the core of your data ecosystem, helping to tie it all together by solving data problems at scale.