Vladimir Rodionov, founder of Carrot Cache will present the Velociraptor – the next evolution of PrestoDB hierarchical caching framework RaptorX. Velociraptor enables efficient data and meta-data caching well beyond RaptorX limits in terms of number of data files (multi-billions), number of table partitions (multi-millions) and number of table columns (multi-thousands). Velociraptor replaces all five RaptorX caches (Hive meta-data, file list, query result fragments, ORC/Parquet meta-data and data I/O) with a scalable solution, based on Carrot Cache, which does not pollute JVM heap memory, does not affect Java Garbage Collector, keeps all data and meta-data off Java heap memory or on disk and can scale well beyond server’s physical RAM limit. Velociraptor supports server restart, by quickly saving and loading data to/from disk for automatic cache warm up.
RaptorX, an umbrella project presented in PrestoCon Day in March, enabled the Presto interactive fleet in Facebook to reduce latency by 10x, based on a set of architectural improvements and optimizations with hierarchical caching. This presentation provides an update on the follow-up enhancement. Bin Fan from Alluxio will talk about the exploration of a probabilistic algorithm in Alluxio caching to estimate cache working set and the implementation of shadow cache Ke Wang from Facebook will talk about how shadow cache is used to understand the system bottleneck for better resource allocation and query routing decisions. She will also cover a recent improvement in collecting and aggregating per-query runtime statistics on the Presto engine to better understand the time breakdown, resource usage breakdown and cache hit rate on a per-query basis, which can help identify areas of improvement.
RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce the hierarchical cache work including Alluxio data cache, fragment result cache, etc. Cache is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat performance oriented connectors like Raptor with the added benefit of continuing to work with disaggregated storage.