Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

Data processing systems have evolved significantly over the last decade, driven by various factors such as the advent of cloud computing, increasingly complexity of applications such as ML, HTAP, Streaming, Observability and Graph processing. However, historically, these frameworks have evolved independently, leading to significant fragmentation of the stack. In this talk, I will talk about how this has evolved in the open source and at Meta, and how we are solving this problem through the Shared Foundations effort, leading to composable systems. This has resulted in significantly better performance, more features, higher engineering velocity and a more consistent user experience.

Headless BI Architecture and Trade-offs – Pavel Tiunov, Cube Dev

Headless BI Architecture and Trade-offs – Pavel Tiunov, Cube Dev

There has been a proliferation of tools in different categories of the modern data stack. This talk will focus on the Headless BI category and Cube’s implementation of Headless BI. Headless BI injects a component between data warehouses and other data sources and tools on the other side of the stack (e.g. CDP, data exploration tools, custom data apps, etc.). This new component encapsulates several critical functions like data modeling, access control, and aggregate awareness while deliberately omitting others, like data visualization and presentation. We’ll explore: – Keeping data models separate from data sources and not substituting data modeling with mere data transformation. – Managing access control centrally, aggregate awareness, and caching in a separate layer upstack from data consumers. – Removing data presentation features and embracing data accessibility via a set of APIs.

Using Presto’s BigQuery Connector for Better Performance and Ad-hoc Query connector for better performance and ad-hoc query in the Cloud – George Wang & Roderick Yao

Using Presto’s BigQuery Connector for Better Performance and Ad-hoc Query connector for better performance and ad-hoc query in the Cloud – George Wang & Roderick Yao

The Google BigQuery connector gives users the ability to query tables in the BigQuery service, Google Cloud’s fully managed data warehouse. In this presentation, we’ll discuss the BigQuery Connector plugin for Presto which uses the BigQuery Storage API to stream data in parallel, allowing users to query from BigQuery tables via gPRC to achieve a better read performance. We’ll also discuss how the connector enables interactive ad-hoc query to join data across distributed systems for data lake analytics.