Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and AWS Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. In this talk, Wen will walk through the recently announced AWS Lake Formation and Ahana integration.

Parquet Column Level Access Control with Presto

Parquet Column Level Access Control with Presto

Apache Parquet is the major columnar file storage format used by Apache Presto and several other query engines in many big data analytic frameworks today. In a lot of use cases, a portion of the column data is highly sensitive and must be protected. Column encryption at the file format level is supported in the Parquet community. Due to the rewritten code of Parquet in Presto, Parquet column encryption at Presto needs to be ported with modifications to the Presto code page. And the integration with Key Management Service (KMS) and other query engines like Hive and Spark is another challenge. In this talk, we will show the work we have done for enabling Presto for Parquet column decryption including challenges, solutions, integration with Hive/Spark Parquet column encryption and look forward to the next step of encryption work.

Presto Query Analysis for Data Layout Formatting and Query Result Caching – Gurmeet Singh, Uber

Presto Query Analysis for Data Layout Formatting and Query Result Caching – Gurmeet Singh, Uber

In this talk, I will be talking about a microservice that we have built at Uber to be able to analyze Presto queries. The Presto Query Engine does not provide endpoints for query analysis purposes. One has to either execute the query or gather insights from the query explain plan. In this talk, I will talk about 1. The work that we had to do to do the query analysis in a microservice using Presto as a library. 2. Doing predicate analysis on the queries to come up with data formatting recommendations in order to improve query performance. 3. Using the analysis service for query result cache invalidation. The analysis figures out whether the results from a previous run of the query are still valid and can be reused.

Presto Authorization with Apache Ranger – Reetika Agrawal, Ahana & William Brooks, Privacera

Presto Authorization with Apache Ranger – Reetika Agrawal, Ahana & William Brooks, Privacera

Apache Ranger has been the user’s choice to support authorization in various data platforms from small-scale to enterprise-grade production environments. At Ahana, engineers are working on the Presto-Ranger integration, aiming to support global fine-grained data access control across all catalogs for Presto, while also providing auditing and monitoring of user access. We would like to collaborate with the Privacera and share our learnings, what we developed so far, and also hope to shed light on the future work of the Ranger Presto Plugin with Apache Ranger committer.

Authorizing Presto with AWS Lake Formation – Jalpreet Singh Nanda, Ahana & Roy Hasson, Amazon

Authorizing Presto with AWS Lake Formation – Jalpreet Singh Nanda, Ahana & Roy Hasson, Amazon

AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. At Ahana and Amazon, engineers are working on Presto and Lake Formation integration to support Authorization on Presto. This means that Presto clusters will be enforce data permissions on user queries against Lake Formation backed data lakes, which is a tightly integrated Lake Formation, AWS Glue, and Amazon S3 data lake stack. In this session we will present high level design, our leanings, future plans and demo how data platform users can use Lake Formation integration to support fine-grained data access controls on Presto.