Building Modern Data Lakes for Analytics Using Object Storage – Satish Ramakrishnan, MinIO

Building Modern Data Lakes for Analytics Using Object Storage – Satish Ramakrishnan, MinIO

The modern data lake is distributed, unstructured and demands performance and scale – or better stated, performance at scale. Modern object stores are the ideal platform to pair with MPP query engines like Presto – particularly as the scale reaches tens or hundreds of petabytes with tens to hundreds of concurrent queries. In this talk, Satish Ramakrishnan will outline the better together attributes of the two technologies with a focus on the most sophisticated modern object storage features – from throughput optimizations, multi-cloud capabilities, cross-cloud active active replication and lifecycle management. Participants will come away with a reference architecture suited to query processing at object scale.

Headless BI Architecture and Trade-offs – Pavel Tiunov, Cube Dev

Headless BI Architecture and Trade-offs – Pavel Tiunov, Cube Dev

There has been a proliferation of tools in different categories of the modern data stack. This talk will focus on the Headless BI category and Cube’s implementation of Headless BI. Headless BI injects a component between data warehouses and other data sources and tools on the other side of the stack (e.g. CDP, data exploration tools, custom data apps, etc.). This new component encapsulates several critical functions like data modeling, access control, and aggregate awareness while deliberately omitting others, like data visualization and presentation. We’ll explore: – Keeping data models separate from data sources and not substituting data modeling with mere data transformation. – Managing access control centrally, aggregate awareness, and caching in a separate layer upstack from data consumers. – Removing data presentation features and embracing data accessibility via a set of APIs.