Real Time Analytics at Uber with Presto-Pinot

Real Time Analytics at Uber with Presto-Pinot

In this talk, seasoned engineers at Uber will walk through the real time analytics use cases at Uber and the work they have done on the Presto architecture and the Presto-Pinot connector to address them.

Presto for Real Time Analytics at Uber – Ankit Sultana, Uber

Presto for Real Time Analytics at Uber – Ankit Sultana, Uber

The Real Time Analytics Platform at Uber serves 100M+ queries daily and is used for several critical features: from end-user app features to radius selection for Uber Eats. All these queries are proxied via a custom internal fork of Presto (named Neutrino) that is optimized for low-latency/high-throughput (50ms latency at 1000s of RPS). With this talk we plan to share our learnings over the last 6 months and how we run Presto reliably at this scale for real-time analytics.

Query Execution Optimization for Broadcast Join using Replicated-Reads Strategy – George Wang, Ahana

Query Execution Optimization for Broadcast Join using Replicated-Reads Strategy – George Wang, Ahana

Today presto supports broadcast join by having a worker to fetch data from a small data source to build a hash table and then sending the entire data over the network to all other workers for hash lookup probed by large data source. This can be optimized by a new query execution strategy as source data from small tables is pulled directly by all workers which is known as replicated reads from dimension tables. This feature comes with a nice caching property given that all worker nodes N are now participating in scanning the data from remote sources. The table scan operation for dimension tables is cacheable per all worker nodes. In addition, there will be better resource utilization because the presto scheduler can now reduce the number plan fragment to execute as the same workers run tasks in parallel within a single stage to reduce data shuffles.

Realtime Analytics with Presto and Apache Pinot – Xiang Fu

Realtime Analytics with Presto and Apache Pinot – Xiang Fu

In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability. In this talk, we will explore how to get the best of both worlds using Apache Pinot and Presto: 1. How people do analytics today to trade-off Latency and Flexibility: Comparison over analytics on raw data vs pre-join/pre-cube dataset. 2. Introduce Apache Pinot as a column store for fast real-time data analytics and Presto Pinot Connector to cover the entire landscape. 3. Deep dive into Presto Pinot Connector to see how the connector does predicate and aggregation push down. 4. Benchmark results for Presto Pinot connector.