What is Presto?

Fast and reliable SQL query engine for data analytics and the open lakehouse

For data engineers who struggle with managing multiple query languages and interfaces to siloed databases and storage, Presto is the fast and reliable engine that provides one simple ANSI SQL interface for all your data analytics and your open lakehouse.

Key Innovation

Some of the biggest companies in the world are contributing to the Presto open source project. These key innovations are only available in Linux Foundation Presto today.

Project Aria
Push down entire expressions to the data source for some file formats like ORC.
Blog | Design
Project Presto Unlimited
Exchange materialization to create temporary in-memory bucketed tables to use significantly less memory.
Github | Blog
Caching with RaptorX
Disaggregate storage from compute for low latency to provide a unified, cheap, fast and scalable solution to OLAP and interactive use cases.
Blog | Presentation
Disaggregated Coordinator (aka Fireball)
Scale out the coordinator horiztonally and revamp the RPC stack.
Github | Blog
ETL with Presto-on-Spark
Presto on Spark is an integration between Presto and Spark that leverages Presto's compiler/evaluation as a library and Spark's large scale processing capabilities. It enables a unified SQL experience between interactive and batch use cases
User Defined Functions
Support for dynamic SQL functions (available in experimental mode)

Why Presto?

One Language

Different engines for different workloads means you will have to re-platform down the road.

With Presto, you get 1 familar ANSI SQL language and 1 engine for your data analytics so you don't need to graduate to another lakehouse engine. Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users.

One Language
One Interface

One Interface

Most data teams have different engines for different workloads on their data lake storage, and each engine has its own language and interface.

Presto gives you one simple ANSI SQL interface for all of your data in various siloed data systems, helping you join your data ecosystem together. Presto's connector architecture enables you to query data where it lives.

Fast, Reliable & Efficient

Data infrastructure costs can explode, especially with proprietary systeems like data warehouses, as the data size and users workloads grow.

Presto is battle-tested at Meta and Uber and can scale to meet growing data sizes and workloads. It's faster and more efficient than other engines because it's optimized for large numbers of small queries, so you can query data at better price-performance compared to proprietary systems.

Fast, Reliable, and Efficient

Use Cases

Ad-hoc Query

Use SQL to run ad hoc queries whenever you want, wherever your data resides. Presto allows you to query data where it’s stored so you don’t have to ETL data into a separate system.

Reporting and dashboarding

Query data across multiple sources to build one Presto view of reports and dashboards for Presto self-service BI business intelligence.

Open Lakehouse

Through one interface, Presto acts as more than just a query engine as it sits at the core of your data ecosystem, helping to tie it all together by solving data problems at scale.