A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse

A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse

A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse We tend to adopt practices that improve the flexibility of development and the velocity of code deployment, but how confident are we that the complex data system is safe once it arrives in production? We must be able to experiment in production and automate actions while minimizing customer pain and reducing damage to code and data. If your product’s value is derived from data in the shape of analytics or machine learning, losing it, or having corrupted data, can easily translate into pain. In this session, you will discover how chaos engineering principles apply to distributed data systems and the tools that enable us to make our data workloads more resilient. 

Extending Presto at LinkedIn with a Smart Catalog Layer LinkedIn

Extending Presto at LinkedIn with a Smart Catalog Layer LinkedIn

In this talk, Walaa describes how LinkedIn extended its Presto Hive Catalog with a smart logical abstraction layer that is capable of reasoning about logical views with UDFs by using two core components, Coral and Transport UDFs. Coral is a view virtualization library, powered by Apache Calcite, that represents views using their logical query plans. Walaa shows how LinkedIn leverages Coral abstractions to decouple view expression language from the execution engine, and hence execute non-Presto-SQL views inside Presto, and achieve on-the-fly query rewrite for data governance and query optimization.