PrestoDB in HPE Ezmeral Unified Analytics – Milind Bhandarkar, HPE

PrestoDB in HPE Ezmeral Unified Analytics – Milind Bhandarkar, HPE


HPE Ezmeral Unified Analytics is an end-to-end data & AI/ML platform that consists of several popular open-source frameworks for data engineering, data analytics, data science, & ML engineering in a well-integrated packaging. These open-source frameworks include Apache Spark, Apache Airflow, Apache Superset, PrestoDB, MLFlow, Kubeflow, and Feast. This platform is built atop Kubernetes and provides built in security. In this talk we will focus on the role of PrestoDB in Unified Analytics as a fast SQL query engine, and also as a secure data access layer. We will discuss some of our value-additions to PrestoDB, such as a distributed memory-centric columnar caching layer that provides both explicit and transparent caching for dataset fragments, often leading to 3x to 4x query performance. We will conclude by proposing to make caching pluggable in PrestoDB and discussing future directions.

Presto Query Analysis for Data Layout Formatting and Query Result Caching – Gurmeet Singh, Uber

Presto Query Analysis for Data Layout Formatting and Query Result Caching – Gurmeet Singh, Uber

In this talk, I will be talking about a microservice that we have built at Uber to be able to analyze Presto queries. The Presto Query Engine does not provide endpoints for query analysis purposes. One has to either execute the query or gather insights from the query explain plan. In this talk, I will talk about 1. The work that we had to do to do the query analysis in a microservice using Presto as a library. 2. Doing predicate analysis on the queries to come up with data formatting recommendations in order to improve query performance. 3. Using the analysis service for query result cache invalidation. The analysis figures out whether the results from a previous run of the query are still valid and can be reused.