Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

An optimizer’s plans are only as good as the estimates available for the tables its querying. For queries over recently ingested data that is not yet ANALYZE-d to update table or partition stats, the Presto optimizer flies blind; it is unable to make good query plans and resorts to syntactic join orders. To solve this problem, we propose building ‘Quick Stats’ : By utilizing file level metadata available in open data lake formats such as Delta & Hudi, and by examining stats from Parquet & ORC footers, we can build a representative stats sample at a per partition level. These stats can be cached for use be newer queries, and can also be persisted back to the metastore. New strategies for tuning these stats, such as sampling, can be added to improve their precision.

Presto on AWS using Ahana Cloud at Cartona – Omar Mohamed, Cartona

Presto on AWS using Ahana Cloud at Cartona – Omar Mohamed, Cartona

Cartona is one of the fastest growing B2B e-commerce marketplaces in Egypt that connects retailers with suppliers, wholesalers, and production companies. We needed to federate across multiple data sources, including transactional databases like Postgres and AWS S3 data lake. In this session, we’ll talk about how Presto allows us to join across all of these data sources without having to copy or ingest data – it’s all done in place. In addition, we’ll talk about how we were up and running in less than an hour with the Ahana Cloud managed service. It gives us the power of Presto and the ease of use without the need to manage it or have deep skills to deploy and operate it.