Hudi tables via Presto-Hive connector: A Deep Dive

With the growing popularity of the lakehouse approach, it has become increasingly important for query engines to support these new formats such as Hudi. A previous blog discusses the evolution of presto-hudi integration via hive connector at a high level. With the latest community developments, a separate presto-hudi connector has come up but it is…

Presto Enables Internal Log Data Analysis at Drift

I’m a Senior Software Engineer in the data group at Drift, a conversational marketing platform that is used for qualifying leads faster, automatically booking meetings and connecting customers to the right business solutions more efficiently. I’ve used Presto quite a bit throughout my career, and I want to first give readers a quick overview of…

PrestoDB and Apache Hudi

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by introducing primitives such as upserts, deletes and incremental queries. These features help surface faster, fresher data on a unified serving layer. Hudi tables can be stored…

Running Presto in a Hybrid Cloud Architecture

Migrating SQL workloads from a fully on-premise environment to cloud infrastructure has numerous benefits, including alleviating resource contention and reducing costs by paying for computation resources on an on-demand basis. In the case of Presto running on data stored in HDFS, the separation of compute in the cloud and storage on-premises is apparent since Presto’s…