What is Presto?

    Fast and reliable SQL query engine for data analytics and the open lakehouse

    For data engineers who struggle with managing multiple query languages and interfaces to siloed databases and storage, Presto is the fast and reliable engine that provides one simple ANSI SQL interface for all your data analytics and your open lakehouse.

    Presto Tech Talk: Intro to Presto and Superset video

    Key Innovation

    Some of the biggest companies in the world are contributing to the Presto open source project. These key innovations are only available in Linux Foundation Presto today.

    Project Aria

    Push down entire expressions to the data source for some file formats like ORC.
    Blog | Design

    Project Presto Unlimited

    Exchange materialization to create temporary in-memory bucketed tables to use significantly less memory.
    Github | Blog

    Caching with RaptorX

    Disaggregate storage from compute for low latency to provide a unified, cheap, fast and scalable solution to OLAP and interactive use cases.
    Blog | Presentation

    Disaggregated Coordinator (aka Fireball)

    Scale out the coordinator horizontally and revamp the RPC stack.
    Github | Blog

    ETL with Presto-on-Spark

    Presto on Spark is an integration between Presto and Spark that leverages Presto’s compiler/evaluation as a library and Spark’s large scale processing capabilities. It enables a unified SQL experience between interactive and batch use cases

    User Defined Functions

    Support for dynamic SQL functions (available in experimental mode)

    Why Presto?

    One Language

    Different engines for different workloads means you will have to re-platform down the road.

    With Presto, you get 1 familiar ANSI SQL language and 1 engine for your data analytics so you don’t need to graduate to another lakehouse engine. Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users.

    One Interface

    Most data teams have different engines for different workloads on their data lake storage, and each engine has its own language and interface.

    Presto gives you one simple ANSI SQL interface for all of your data in various siloed data systems, helping you join your data ecosystem together. Presto’s connector architecture enables you to query data where it lives.

    Fast, Reliable & Efficient

    Data infrastructure costs can explode, especially with proprietary systems like data warehouses, as the data size and users workloads grow.

    Presto is battle-tested at Meta and Uber and can scale to meet growing data sizes and workloads. It’s faster and more efficient than other engines because it’s optimized for large numbers of small queries, so you can query data at better price-performance compared to proprietary systems.

    Use Cases

    Ad-hoc Query

    Use SQL to run ad hoc queries whenever you want, wherever your data resides. Presto allows you to query data where it’s stored so you don’t have to ETL data into a separate system.

    Reporting and dashboarding

    Query data across multiple sources to build one Presto view of reports and dashboards for Presto self-service BI business intelligence.

    Open Lakehouse

    Through one interface, Presto acts as more than just a query engine as it sits at the core of your data ecosystem, helping to tie it all together by solving data problems at scale.