Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

Data processing systems have evolved significantly over the last decade, driven by various factors such as the advent of cloud computing, increasingly complexity of applications such as ML, HTAP, Streaming, Observability and Graph processing. However, historically, these frameworks have evolved independently, leading to significant fragmentation of the stack. In this talk, I will talk about how this has evolved in the open source and at Meta, and how we are solving this problem through the Shared Foundations effort, leading to composable systems. This has resulted in significantly better performance, more features, higher engineering velocity and a more consistent user experience.

Presto & the Foundations of Open Lake House: Trends & Opportunities – Biswapesh Chattopadhyay, Meta

Presto & the Foundations of Open Lake House: Trends & Opportunities – Biswapesh Chattopadhyay, Meta

Building open and shared foundational tech to build a lake house architecture can provide the best-of-breed user experience across the Analytics and ML domains and potentially beyond. In this talk, Biswa will share examples drawn from the evolution of the data stack at Meta over the last few years including efforts towards dialect unification (Sapphire aka Presto-on-Spark and Xstream-IE streaming engine efforts), eval unification (using Velox as the base), eliminating the need for data duplication for interactive analytics by building smart caching (RaptorX), building a best-of-breed file format that works across Analytics and ML (Alpha), and building an open source ML data pre-proc engine (TorchArrow) which shares the core dialect and eval components with Presto.

Open Source Data Lake Analytics: Trends and Opportunities – Biswapesh Chattopadhyay, Meta

Open Source Data Lake Analytics: Trends and Opportunities – Biswapesh Chattopadhyay, Meta

Open source data analytics is undergoing an interesting transformation as the industry rapidly evolves around it. Accelerating migration to the cloud, the rise of immensely well funded proprietary vendors, fast evolving needs of the users all contribute to this. This talk goes into detail about the trends and opportunities in the OSS data analytics space, and a call to action on how this space can stay relevant.

How Carbon uses PrestoDB in the Cloud with Ahana to Power its Real-time Customer Dashboards

How Carbon uses PrestoDB in the Cloud with Ahana to Power its Real-time Customer Dashboards

Carbon is a real-time revenue management platform that consolidates revenue and audience analytics, data management, and yield operations into a single solution. Real-time analytics is super critical – their customers rely on real-time data to make revenue decisions. After facing issues around performance, visibility & ease of use, and serverless pricing model with AWS Athena, the team moved to a managed service for PrestoDB in the cloud – Ahana Cloud – to power their customer-facing dashboards. In this session, Jordan will discuss some of the reasons the team moved from AWS Athena to a managed PrestoDB on Intel-optimized AWS instances. He will also dive into their current architecture that includes an Ahana-managed Hive Metastore along with Apache ORC file format and an S3-based data lake. Last, he’ll share some performance benchmarks and talk about what’s next for PrestoDB at Carbon.