The Real Time Analytics Platform at Uber serves 100M+ queries daily and is used for several critical features: from end-user app features to radius selection for Uber Eats. All these queries are proxied via a custom internal fork of Presto (named Neutrino) that is optimized for low-latency/high-throughput (50ms latency at 1000s of RPS). With this talk we plan to share our learnings over the last 6 months and how we run Presto reliably at this scale for real-time analytics.
In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability. In this talk, we will explore how to get the best of both worlds using Apache Pinot and Presto: 1. How people do analytics today to trade-off Latency and Flexibility: Comparison over analytics on raw data vs pre-join/pre-cube dataset. 2. Introduce Apache Pinot as a column store for fast real-time data analytics and Presto Pinot Connector to cover the entire landscape. 3. Deep dive into Presto Pinot Connector to see how the connector does predicate and aggregation push down. 4. Benchmark results for Presto Pinot connector.