Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

An optimizer’s plans are only as good as the estimates available for the tables its querying. For queries over recently ingested data that is not yet ANALYZE-d to update table or partition stats, the Presto optimizer flies blind; it is unable to make good query plans and resorts to syntactic join orders. To solve this problem, we propose building ‘Quick Stats’ : By utilizing file level metadata available in open data lake formats such as Delta & Hudi, and by examining stats from Parquet & ORC footers, we can build a representative stats sample at a per partition level. These stats can be cached for use be newer queries, and can also be persisted back to the metastore. New strategies for tuning these stats, such as sampling, can be added to improve their precision.

Building a Modern Data Platform with Presto – Denis Krivenko, Platform24

Building a Modern Data Platform with Presto – Denis Krivenko, Platform24

Hadoop era is gone. Cloud computing is today’s reality. But… What if you cannot use public clouds? What if your cloud does not provide data platform capabilities? What if you want your solution to be cloud agnostic? In this case you create your own cloud native data platform on Kubernetes. In the session Denis will talk about reasons for building analytics data platform solution in Platform24, cloud native data platform architecture principles, data stack they use and why Presto plays one of the key roles in it.