PrestoDB Blog - PrestoDB

Hudi tables via Presto-Hive connector: A Deep Dive

By Pratyaksh Sharma May 30, 2023September 14, 2023

With the growing popularity of the lakehouse approach, it has become increasingly important for query engines to support these new formats such as Hudi. A previous blog discusses the evolution of presto-hudi integration via hive connector at a high level. With the latest community developments, a separate presto-hudi connector has come up but it is…

Presto Parquet Column Encryption

By Xinli Shang July 10, 2022September 21, 2023

Introduction Apache Parquet modular encryption provides encryption at-rest and in-transit at finer-grained. In big data world, data analytic tables are usually very wide with hundreds of columns, while only a small number of columns need to be protected. So the finer-grained access control is a better fit than coarse-grained one like table level access control….

Faster Presto Queries with Parquet Page Index

By Xinli Shang May 10, 2022September 21, 2023

Introduction Today’s data is growing very fast, which creates challenges for query engines like Presto. Presto is a popular interactive query engine, because of its scalability, high performance, and smooth integration with Hadoop. As the volume of data grows, Presto needs to read larger chunks of data and load them into memory, which causes higher…

Native Parquet Writer for Presto

By Lu Niu & Zhenxiao Luo June 29, 2021September 21, 2023

Overview With the wide deployment of Presto in a growing number of companies, Presto is used not only for queries, but also for data ingestion and ETL jobs. There is a need to improve Presto’s file writer performance, especially for popular columnar file formats, e.g. Parquet, and ORC. In this article, we introduce the brand…

PrestoDB and Apache Hudi

By Bhavani Sudha Saktheeswaran August 4, 2020September 21, 2023

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by introducing primitives such as upserts, deletes and incremental queries. These features help surface faster, fresher data on a unified serving layer. Hudi tables can be stored…

Running Presto in a Hybrid Cloud Architecture

By Adit Madan July 17, 2020September 21, 2023

Migrating SQL workloads from a fully on-premise environment to cloud infrastructure has numerous benefits, including alleviating resource contention and reducing costs by paying for computation resources on an on-demand basis. In the case of Presto running on data stored in HDFS, the separation of compute in the cloud and storage on-premises is apparent since Presto’s…