A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse

A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse

A Git-like Repository for your Data Lake – Vinodhini Sivakami Duraisamy, Treeverse We tend to adopt practices that improve the flexibility of development and the velocity of code deployment, but how confident are we that the complex data system is safe once it arrives in production? We must be able to experiment in production and automate actions while minimizing customer pain and reducing damage to code and data. If your product’s value is derived from data in the shape of analytics or machine learning, losing it, or having corrupted data, can easily translate into pain. In this session, you will discover how chaos engineering principles apply to distributed data systems and the tools that enable us to make our data workloads more resilient. 

Presto on Kafka at Scale – Yang Yang & Yupeng Fu, Uber

Presto on Kafka at Scale – Yang Yang & Yupeng Fu, Uber

Presto is a popular distributed SQL query engine for running interactive analytic queries. Presto provides a Connector API that allows plugins to dozens of data sources, and thus positions itself as a single point of access to a wide variety of data. At Uber, we significantly improved Presto’s Kafka connector to meet Uber’s scale. For example, the new connector allows dynamic Kafka cluster and topic discovery so users can directly query existing Kafka topics without any registration and onboarding process; dynamic schema discovery allows fetching the latest schema without any Presto restart or deployment; smart time range suggestions to users based on Kafka metadata analysis to avoid large-range scans and thus keep the query interactive.