Velociraptor – The Next Generation of RaptorX – Vladimir Rodionov, Carrot Cache

Velociraptor – The Next Generation of RaptorX – Vladimir Rodionov, Carrot Cache

Vladimir Rodionov, founder of Carrot Cache will present the Velociraptor – the next evolution of PrestoDB hierarchical caching framework RaptorX. Velociraptor enables efficient data and meta-data caching well beyond RaptorX limits in terms of number of data files (multi-billions), number of table partitions (multi-millions) and number of table columns (multi-thousands). Velociraptor replaces all five RaptorX caches (Hive meta-data, file list, query result fragments, ORC/Parquet meta-data and data I/O) with a scalable solution, based on Carrot Cache, which does not pollute JVM heap memory, does not affect Java Garbage Collector, keeps all data and meta-data off Java heap memory or on disk and can scale well beyond server’s physical RAM limit. Velociraptor supports server restart, by quickly saving and loading data to/from disk for automatic cache warm up.

Customer-Facing Presto at Rippling – Andy Li, Rippling

Customer-Facing Presto at Rippling – Andy Li, Rippling

Presto is used for a variety of cases, but tends to be used for larger scale analytical queries. We have been transitioning to using Presto to power our data platform and customer-facing scripting language, RQL (Rippling Query Language) to run arbitrary customer queries to power core products. Presto helps enable diverse, federated querying at scale. In this talk, Andy will cover where Presto sits in Rippling’s ecosystem as a core query layer, our collaboration and contributions for closer integration with Apache Pinot, and learnings on using Presto to handle a large variety of query patterns.

Real Time Analytics at Uber with Presto-Pinot

Real Time Analytics at Uber with Presto-Pinot

In this talk, seasoned engineers at Uber will walk through the real time analytics use cases at Uber and the work they have done on the Presto architecture and the Presto-Pinot connector to address them.

After RaptorX: Improve Performance Understanding and Workload Analysis in Presto – Ke Wang & Bin Fan

After RaptorX: Improve Performance Understanding and Workload Analysis in Presto – Ke Wang & Bin Fan

RaptorX, an umbrella project presented in PrestoCon Day in March, enabled the Presto interactive fleet in Facebook to reduce latency by 10x, based on a set of architectural improvements and optimizations with hierarchical caching. This presentation provides an update on the follow-up enhancement. Bin Fan from Alluxio will talk about the exploration of a probabilistic algorithm in Alluxio caching to estimate cache working set and the implementation of shadow cache Ke Wang from Facebook will talk about how shadow cache is used to understand the system bottleneck for better resource allocation and query routing decisions. She will also cover a recent improvement in collecting and aggregating per-query runtime statistics on the Presto engine to better understand the time breakdown, resource usage breakdown and cache hit rate on a per-query basis, which can help identify areas of improvement.

RaptorX: Building a 10X Faster Presto – James Sun, Facebook, Inc

RaptorX: Building a 10X Faster Presto – James Sun, Facebook, Inc

RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce the hierarchical cache work including Alluxio data cache, fragment result cache, etc. Cache is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat performance oriented connectors like Raptor with the added benefit of continuing to work with disaggregated storage.

Realtime Analytics with Presto and Apache Pinot – Xiang Fu

Realtime Analytics with Presto and Apache Pinot – Xiang Fu

In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability. In this talk, we will explore how to get the best of both worlds using Apache Pinot and Presto: 1. How people do analytics today to trade-off Latency and Flexibility: Comparison over analytics on raw data vs pre-join/pre-cube dataset. 2. Introduce Apache Pinot as a column store for fast real-time data analytics and Presto Pinot Connector to cover the entire landscape. 3. Deep dive into Presto Pinot Connector to see how the connector does predicate and aggregation push down. 4. Benchmark results for Presto Pinot connector.

Panel: The Presto Ecosystem

Panel: The Presto Ecosystem

The Presto Ecosystem – Moderated by Dipti Borkar, Ahana; Maxime Beauchemin, Preset; Vinoth Chandar, Apache Hudi; Kishore Gopalakrishna, Apache Pinot & James Sun, Facebook, Inc.