Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

Quick Stats – Runtime ANALYZE for Better Query Plans – Anant Aneja, Ahana

An optimizer’s plans are only as good as the estimates available for the tables its querying. For queries over recently ingested data that is not yet ANALYZE-d to update table or partition stats, the Presto optimizer flies blind; it is unable to make good query plans and resorts to syntactic join orders. To solve this problem, we propose building ‘Quick Stats’ : By utilizing file level metadata available in open data lake formats such as Delta & Hudi, and by examining stats from Parquet & ORC footers, we can build a representative stats sample at a per partition level. These stats can be cached for use be newer queries, and can also be persisted back to the metastore. New strategies for tuning these stats, such as sampling, can be added to improve their precision.

Scalable Feature Engineering with Tecton on Athena – Derek Salama, Tecton

Scalable Feature Engineering with Tecton on Athena – Derek Salama, Tecton

Tecton is the leading feature platform for real-time machine learning. Rather than build new SQL engines from scratch, Tecton connects to your existing engine to transform raw data into features for machine learning. This talk will cover Tecton’s new integration with Athena for feature engineering. Derek will demonstrate how Tecton with Athena is the fastest way to build feature pipelines and put new models in production.

Querying streaming data with Presto, Amazon Athena and Upsolver

Querying streaming data with Presto, Amazon Athena and Upsolver

In this session, Yoni will present on querying streaming data with Presto and Amazon Athena including performance, data partitioning and compaction. In addition, we will demo using the Upsolver platform with Amazon Athena. In addition, he will share what they are working on with Prestodb.