How we accelerated our Iceberg queries for CDC with MoR and Equality Deletes

    How we accelerated our Iceberg queries for CDC with MoR and Equality Deletes

    Ingesting and maintaining a stream of Change Data Capture (CDC) from transactional databases to an Iceberg lakehouse is not easy. More specifically, as the frequency and volume of changes increase, query performance quickly degrades forcing users to make hard choices between CoW vs. MoR, small vs. large files and even whether you should delay refreshing the table. In this lightning talk, you’ll learn how Apache Iceberg manages deleted rows, the difference between position and equality delete files and how recent enhancements to Presto optimize MoR with equality deletes using joins to improve queries by 400X.

    See slides.