Presto Blog - PrestoDB

GPU-Accelerated Presto C++ is Here: Nightly Images for NVIDIA GPUs
By Luis Garcés-Erice, Zoltan Arnold Nagy, Daniel Bauer & Sean Rooney June 24, 2026June 24, 2026
TL;DR — GPUs can run analytical SQL dramatically faster than CPUs: published numbers show a single node Presto C++ with GPU-accelerated operators running a TPC-H-style benchmark in ~100 seconds versus ~1,200 seconds on a high-end CPU — on the order of 12× faster — and a UCX/NVLink exchange running >6× faster with multi-node Presto. Together…
Read More GPU-Accelerated Presto C++ is Here: Nightly Images for NVIDIA GPUs
Presto-Lance Connector: Querying Vector Embeddings with Distributed SQL
By Saurabh Mahawar & Jianjian Xie June 21, 2026June 22, 2026
As artificial intelligence and machine learning (ML) models become integral to modern software, organizations are storing billions of high-dimensional vectors (embeddings) alongside traditional metadata. Analyzing this multi-modal data at scale requires a bridge between high-performance vector databases and distributed SQL query engines. Recently, the Presto open-source community introduced a native Lance Connector (added in release…
Read More Presto-Lance Connector: Querying Vector Embeddings with Distributed SQL
Bridging Java and Native Execution: Presto Connector Federation with Apache Arrow Flight
By Bryan Cutler, Pratik Joseph Dabre & Saurabh Mahawar June 15, 2026June 15, 2026
Introduction: The Federation Challenge in Modern Query Engines The evolution of distributed query engines has created a fundamental architectural tension. Over the past decade, the Presto ecosystem has developed an extensive collection of production-grade Java-based connectors, providing federated access to hundreds of heterogeneous data sources. Concurrently, native C++ execution engines built on frameworks like Velox…
Read More Bridging Java and Native Execution: Presto Connector Federation with Apache Arrow Flight
Solving Cross-Warehouse Joins for AI Systems Using Presto : Without Breaking Latency, Cost, or Correctness
By Apoorv Garg May 18, 2026May 18, 2026
This is a guest post from Apoorv, AI Lead Engineer at Sylus The Distributed Intelligence Challenge Problem Statement: “Show me last quarter’s ad spend next to the revenue it drove, by campaign.” While this seems like a routine request for a human analyst, it creates a significant technical hurdle for AI systems when the required…
Read More Solving Cross-Warehouse Joins for AI Systems Using Presto : Without Breaking Latency, Cost, or Correctness
Presto Benchmarking Tutorial – TPC-H & TPC-DS on Iceberg with Google Cloud Storage (GCS)
By Saurabh Mahawar May 3, 2026May 3, 2026
TL;DR: What You Will Build In this comprehensive guide, you will deploy a Presto benchmarking setup using Docker Compose. We will construct a cloud-native Data Lakehouse by mapping raw data into Apache Iceberg tables stored on Google Cloud Storage (GCS). Finally, we will execute the industry-standard TPC-H and TPC-DS benchmark suites using PBench, and visualize our query latencies in real-time using a persistent MySQL and Grafana observability stack. Whether you are stress-testing…
Read More Presto Benchmarking Tutorial – TPC-H & TPC-DS on Iceberg with Google Cloud Storage (GCS)
Password Authentication Setup on Local
By Pratyaksh Sharma April 30, 2026May 3, 2026
Authentication and authorization are two main pillars for data security. While a Presto cluster can be set up to run without authentication for development purposes, production clusters must be secured at all times. Setting up secure clusters comes with its own challenges in terms of the involved setup and configuration changes. In this blog, we…
Read More Password Authentication Setup on Local
Iceberg Branches and Tags with Presto
By Reetika Agrawal March 25, 2026March 25, 2026
Modern data lakehouses increasingly require versioned data access, auditability, and safe experimentation without affecting production systems. Apache Iceberg allows you to maintain multiple concurrent timelines of a table through Branches and capture static historical points using Tags. This mechanism is heavily inspired by Git but operates on underlying table snapshots. In this blog, we are going to see…
Read More Iceberg Branches and Tags with Presto
Deploy Presto on Kubernetes using Helm: Query S3 Data with Hive Metastore
By Saurabh Mahawar February 27, 2026February 27, 2026
Deploying Presto on Kubernetes transforms this powerful engine into a cloud-native, resilient service that automatically handles failures, scales seamlessly, and optimizes resource utilization. When combined with Helm charts, the deployment becomes standardized, version-controlled, and easily reproducible across environments. This comprehensive guide will walk you through deploying a production-capable baseline Presto cluster on Kubernetes using the official Presto Helm…
Read More Deploy Presto on Kubernetes using Helm: Query S3 Data with Hive Metastore
PBench 1.2.1: End-to-End Benchmarking and Performance Testing for Presto
By Ethan Zhang February 24, 2026February 24, 2026
Benchmarking a distributed SQL engine like Presto involves much more than running a few queries and recording wall-clock times. Real-world performance evaluation demands multi-phase test execution, concurrent workloads, production traffic replay, and deep offline analysis. PBench is a purpose-built benchmarking toolkit for Presto that handles all of this through a declarative, composable stage system. With the 1.2.1…
Read More PBench 1.2.1: End-to-End Benchmarking and Performance Testing for Presto
TPC-H vs TPC-DS : Benchmarking Modern Distributed SQL Engines like Presto
By Saurabh Mahawar January 30, 2026May 11, 2026
In the world of big data, performance is the ultimate currency. But when you are processing petabytes of data across a distributed cluster, speed isn’t just about a stopwatch, it’s a high-stakes engineering challenge. Whether you are evaluating Presto, Spark or any other engine, you need an objective yardstick. Performance in a distributed SQL engine…
Read More TPC-H vs TPC-DS : Benchmarking Modern Distributed SQL Engines like Presto
Presto vs Prestissimo – Known differences and workarounds
By Amit Dutta & Krishna Pai January 22, 2026January 30, 2026
TL;DR This blog outlines the known differences between Presto and Prestissimo where existing Presto queries require adjustment to work in Prestissimo. Details Prestissimo is generally available to use and has feature parity (except for a few functions) with Presto Java. There are differences in libraries used in both stacks. Also we have ensured that bugs…
Read More Presto vs Prestissimo – Known differences and workarounds
From Zero to Contributor: A Complete Guide to Contributing to Presto Open Source
By Saurabh Mahawar January 9, 2026January 9, 2026
PrestoDB is a powerful distributed SQL query engine used widely for large-scale data analytics. Contributing to Presto is an excellent way to gain hands-on experience with distributed systems, Java, SQL engines, and large open-source codebases. This step-by-step tutorial is designed specifically for beginners and first-time contributors who want to build Presto from source, run the…
Read More From Zero to Contributor: A Complete Guide to Contributing to Presto Open Source
Understanding Presto UI: A Deep Dive into the Web Interface Architecture
By Saurabh Mahawar December 1, 2025December 1, 2025
Presto UI is a modern, React-based web interface that provides real-time monitoring, query management, and cluster administration capabilities for the Presto distributed SQL query engine. Whether you’re a database administrator, data engineer, or developer, Presto UI offers intuitive tools to visualize query execution, monitor cluster health, and interact with the Presto coordinator. Key Benefits of…
Read More Understanding Presto UI: A Deep Dive into the Web Interface Architecture
Seamless Integration: Connecting PrestoDB to SingleStore for High-Performance Analytics
By Saurabh Mahawar September 11, 2025November 19, 2025
In today’s data-driven landscape, organization’s are constantly seeking ways to analyze massive datasets quickly and efficiently. PrestoDB, a powerful open-source SQL query engine, and SingleStore, a distributed SQL database, are two technologies that, when combined, offer unparalleled capabilities for high-performance data querying and distributed analytics. This guide provides a hands-on, step-by-step tutorial on how to…
Read More Seamless Integration: Connecting PrestoDB to SingleStore for High-Performance Analytics
Presto Takes a Leap: Upgrading to Java 17 for Enhanced Performance and Security
By Zachary Blanco August 25, 2025
We’re excited to announce that the core Presto engine is migrating to Java 17. This upgrade reinforces our commitment to providing a robust, high-performance, and secure SQL query engine. This change allows Presto to leverage Java 17’s improvements, bringing enhancements in performance, stability, and security, and laying a strong foundation for future upgrades. Why Java…
Read More Presto Takes a Leap: Upgrading to Java 17 for Enhanced Performance and Security
Prestissimo Extension for AI Training Data Normalization at Meta: A Deep Dive for Developers (Lightning Talk)
By Saurabh Mahawar August 23, 2025August 23, 2025
At PrestoCon Day 2025, Meta’s Presto team recently unveiled the Prestissimo extension, a powerful enhancement designed to optimize AI training data normalization. This article explores the technical underpinnings and developer-centric features of this extension, providing a comprehensive understanding of how it supports large-scale AI workloads at Meta. Understanding AI Training Data Storage at Meta At…
Read More Prestissimo Extension for AI Training Data Normalization at Meta: A Deep Dive for Developers (Lightning Talk)
Presto C++ Unleashed: Dynamically Load Unfenced UDFs, End Rebuilds, and Boost Performance
By Saurabh Mahawar August 22, 2025August 22, 2025
Dynamic loading in Presto C++ is revolutionizing how developers build and deploy user-defined functions (UDFs). At PrestoCon Day 2025 , Soumya Duriseti explained how Presto C++ now supports dynamic loading of unfenced UDFs, eliminating the need for time-consuming static builds and making it easier than ever to add custom logic without rebuilding the entire binary….
Read More Presto C++ Unleashed: Dynamically Load Unfenced UDFs, End Rebuilds, and Boost Performance
Building Connectors in Presto C++: Deep Dive into the TPCDS Connector (Lightning Talk)
By Saurabh Mahawar, Pramod Satya & Pratik Joseph Dabre August 20, 2025August 20, 2025
At PrestoCon Day 2025, engineers from IBM presented a deep dive into how connectors in Presto C++ extend the engine’s modular capabilities, focusing on the newly implemented TPCDS benchmark connector. Connectors are central to Presto’s architecture, enabling the query engine to communicate seamlessly with external systems such as databases, file formats, or benchmark data generators….
Read More Building Connectors in Presto C++: Deep Dive into the TPCDS Connector (Lightning Talk)