Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

An optimizer’s plans are only as good as the estimates available for the tables its querying. For queries over recently ingested data that is not yet ANALYZE-d to update table or partition stats, the Presto optimizer flies blind; it is unable to make good query plans and resorts to syntactic join orders. To solve this problem, we propose building ‘Quick Stats’ : By utilizing file level metadata available in open data lake formats such as Delta & Hudi, and by examining stats from Parquet & ORC footers, we can build a representative stats sample at a per partition level. These stats can be cached for use be newer queries, and can also be persisted back to the metastore. New strategies for tuning these stats, such as sampling, can be added to improve their precision.

Implementing Lakehouse Architecture with Presto at Bolt – Kostiantyn Tsykulenko, Bolt.eu

Implementing Lakehouse Architecture with Presto at Bolt – Kostiantyn Tsykulenko, Bolt.eu

Bolt.eu is the first European mobility super-app. We have over 100M users across Europe and Africa and have to deal with data at a large scale on a daily basis (over 100k queries daily). Previously we were using a traditional data warehouse solution based on Redshift but we’ve faced scalability issues that were hard to overcome and after doing our research we chose Presto as the solution. In just a single year we’ve managed to migrate to the Lakehouse architecture using AWS, Presto, Spark and Delta lake. We would like to talk about our journey, some of the challenges we’ve encountered and how we solved them.

Delta Lake Connector for Presto – Denny Lee, Databricks

Delta Lake Connector for Presto – Denny Lee, Databricks

Delta lake is an open-source project that enables building a lakehouse architecture on top of existing storage systems such as S3, ADLS, GCS, and HDFS. We – the Presto and Delta Lake communities – have come together to make it easier for Presto to leverage the reliability of data lakes by integrating with Delta Lake. In this session, we would like to share the design decisions and internals of the Presto/Delta connector.