Hudi tables via Presto-Hive connector: A Deep Dive

With the growing popularity of the lakehouse approach, it has become increasingly important for query engines to support these new formats such as Hudi. A previous blog discusses the evolution of presto-hudi integration via hive connector at a high level. With the latest community developments, a separate presto-hudi connector has come up but it is…

Disaggregated Coordinator

Overview Presto’s architecture originally only supported a single coordinator and a pool of workers. This has worked well for many years but created some challenges. To overcome these challenges, we came up with a new design with a disaggregated coordinator that allows the coordinator to be horizontally scaled out across a single pool of workers….

PrestoDB and Apache Hudi

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by introducing primitives such as upserts, deletes and incremental queries. These features help surface faster, fresher data on a unified serving layer. Hudi tables can be stored…

Running Presto in a Hybrid Cloud Architecture

Migrating SQL workloads from a fully on-premise environment to cloud infrastructure has numerous benefits, including alleviating resource contention and reducing costs by paying for computation resources on an on-demand basis. In the case of Presto running on data stored in HDFS, the separation of compute in the cloud and storage on-premises is apparent since Presto’s…