Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

Shared Foundations Of Composable Data Systems – Biswapesh Chattopadhyay, Google

Data processing systems have evolved significantly over the last decade, driven by various factors such as the advent of cloud computing, increasingly complexity of applications such as ML, HTAP, Streaming, Observability and Graph processing. However, historically, these frameworks have evolved independently, leading to significant fragmentation of the stack. In this talk, I will talk about how this has evolved in the open source and at Meta, and how we are solving this problem through the Shared Foundations effort, leading to composable systems. This has resulted in significantly better performance, more features, higher engineering velocity and a more consistent user experience.

Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

Build & Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation

AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and AWS Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. In this talk, Wen will walk through the recently announced AWS Lake Formation and Ahana integration.

Parquet Column Level Access Control with Presto

Parquet Column Level Access Control with Presto

Apache Parquet is the major columnar file storage format used by Apache Presto and several other query engines in many big data analytic frameworks today. In a lot of use cases, a portion of the column data is highly sensitive and must be protected. Column encryption at the file format level is supported in the Parquet community. Due to the rewritten code of Parquet in Presto, Parquet column encryption at Presto needs to be ported with modifications to the Presto code page. And the integration with Key Management Service (KMS) and other query engines like Hive and Spark is another challenge. In this talk, we will show the work we have done for enabling Presto for Parquet column decryption including challenges, solutions, integration with Hive/Spark Parquet column encryption and look forward to the next step of encryption work.

Presto Authorization with Apache Ranger – Reetika Agrawal, Ahana & William Brooks, Privacera

Presto Authorization with Apache Ranger – Reetika Agrawal, Ahana & William Brooks, Privacera

Apache Ranger has been the user’s choice to support authorization in various data platforms from small-scale to enterprise-grade production environments. At Ahana, engineers are working on the Presto-Ranger integration, aiming to support global fine-grained data access control across all catalogs for Presto, while also providing auditing and monitoring of user access. We would like to collaborate with the Privacera and share our learnings, what we developed so far, and also hope to shed light on the future work of the Ranger Presto Plugin with Apache Ranger committer.

Authorizing Presto with AWS Lake Formation – Jalpreet Singh Nanda, Ahana & Roy Hasson, Amazon

Authorizing Presto with AWS Lake Formation – Jalpreet Singh Nanda, Ahana & Roy Hasson, Amazon

AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. At Ahana and Amazon, engineers are working on Presto and Lake Formation integration to support Authorization on Presto. This means that Presto clusters will be enforce data permissions on user queries against Lake Formation backed data lakes, which is a tightly integrated Lake Formation, AWS Glue, and Amazon S3 data lake stack. In this session we will present high level design, our leanings, future plans and demo how data platform users can use Lake Formation integration to support fine-grained data access controls on Presto.

Using Presto’s BigQuery Connector for Better Performance and Ad-hoc Query connector for better performance and ad-hoc query in the Cloud – George Wang & Roderick Yao

Using Presto’s BigQuery Connector for Better Performance and Ad-hoc Query connector for better performance and ad-hoc query in the Cloud – George Wang & Roderick Yao

The Google BigQuery connector gives users the ability to query tables in the BigQuery service, Google Cloud’s fully managed data warehouse. In this presentation, we’ll discuss the BigQuery Connector plugin for Presto which uses the BigQuery Storage API to stream data in parallel, allowing users to query from BigQuery tables via gPRC to achieve a better read performance. We’ll also discuss how the connector enables interactive ad-hoc query to join data across distributed systems for data lake analytics.