Building Modern Data Lakes for Analytics Using Object Storage – Satish Ramakrishnan, MinIO

Building Modern Data Lakes for Analytics Using Object Storage – Satish Ramakrishnan, MinIO

The modern data lake is distributed, unstructured and demands performance and scale – or better stated, performance at scale. Modern object stores are the ideal platform to pair with MPP query engines like Presto – particularly as the scale reaches tens or hundreds of petabytes with tens to hundreds of concurrent queries. In this talk, Satish Ramakrishnan will outline the better together attributes of the two technologies with a focus on the most sophisticated modern object storage features – from throughput optimizations, multi-cloud capabilities, cross-cloud active active replication and lifecycle management. Participants will come away with a reference architecture suited to query processing at object scale.

Predicting Resource Usages of Future Queries Based on 10M Presto Queries at Twitter

Predicting Resource Usages of Future Queries Based on 10M Presto Queries at Twitter

Here, Chunxu and Beinan would like to share what they have learned in developing a highly-scalable query predictor service through applying machine learning algorithms to ~10 million historical Presto queries to classify queries based on their CPU times and peak memory bytes. At Twitter, this service is helping to improve the performance of Presto clusters and provide expected execution statistics on Business Intelligence dashboards.

A Tour of Presto Iceberg Connector – Beinan Wang, Alluxio & Chunxu Tang, Twitter

A Tour of Presto Iceberg Connector – Beinan Wang, Alluxio & Chunxu Tang, Twitter

Apache Iceberg is an open table format for huge analytic datasets. The Presto Iceberg connector consolidates the SQL engine and the table format, to empower high-performant data analytics. Here, Beinan and Chunxu would like to discuss and share the architectural design of the Presto Iceberg connector, advanced Iceberg feature support (such as native iceberg connector, row-level deletion, and iceberg v2 support), and the future roadmap.

Presto and Apache Iceberg – Chunxu Tang, Twitter

Presto and Apache Iceberg – Chunxu Tang, Twitter

Apache Iceberg is an open table format for huge analytic datasets. At Twitter, engineers are working on the Presto-Iceberg connector, aiming to bring high-performance data analytics on Iceberg to the Presto ecosystem. Here, Chunxu would like to share what they have learned during the development, hoping to shed light on the future work of interactive queries.

Level 101 for Presto: What is PrestoDB?

Level 101 for Presto: What is PrestoDB?

In Level 101, you’ll get an overview of Presto, including: A high level overview of Presto & most common use cases The problems it solves and why you should use it A live, hands-on demo on getting Presto running on Docker Real world example: How Twitter uses Presto at scale