Bolt.eu is the first European mobility super-app. We have over 100M users across Europe and Africa and have to deal with data at a large scale on a daily basis (over 100k queries daily). Previously we were using a traditional data warehouse solution based on Redshift but we’ve faced scalability issues that were hard to overcome and after doing our research we chose Presto as the solution. In just a single year we’ve managed to migrate to the Lakehouse architecture using AWS, Presto, Spark and Delta lake. We would like to talk about our journey, some of the challenges we’ve encountered and how we solved them.
In this talk, you will hear from the query optimizer OG himself, Bill McKenna (Principal software engineer at Ahana, Architect for the query optimizer that became the code base of the Amazon Redshift query optimizer, and co-author of The Volcano Optimizer Generator: Extensibility and Efficient Search) go into detail about the state of modern query optimizers, and how Presto stacks up against them and where it will go in the near future. If database theory is your jam, you won’t want to miss this deeply technical presentation from one of the pioneers in the field.
AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and AWS Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. In this talk, Wen will walk through the recently announced AWS Lake Formation and Ahana integration.
PrestoDB is built to be cloud agnostic and container-friendly, but getting it to run on Kubernetes in the cloud can be challenging. In this talk, Gary Stafford (AWS) and Dipti Borkar (Ahana) will discuss: Why use the in-VPC deployment model with AWS and demo, etc – Deploying PrestoDB on AWS EKS using the Ahana Cloud managed service within the user’s AWS account.
In this round table moderated by Eric Kavanagh of The Bloor Group, panelists from Uber, Facebook, Ahana, and Alibaba will discuss all aspects of building a thriving open source community around PrestoDB including why Presto is so popular & the problems it solves, the open source model the foundation follows, why governance and transparency are so important to an open source community, and what the community looks for in open source projects.
Getting started with a do-it-yourself approach to standing up an open SQL Lakehouse can be challenging and cumbersome. Ahana Cloud Community Edition dramatically simplifies it and gives you the ability to learn and validate Presto for your open SQL Lakehouse—for free. In this session, we’ll show you how easy it is to register for, stand up, and use the Ahana Cloud Community Edition to query on top of your Lakehouse.
Blinkit, India’s leading instant delivery service, uses Presto on AWS to help them deliver on their promise of “everything delivered in 10 minutes”. In this session, Satyam and Akshay will discuss why they moved to Presto on S3 from their cloud data warehouse for more flexibility and better price performance. They’ll also share more on their open data lakehouse architecture which includes Presto as their SQL engine for ad hoc reporting, Ahana as SaaS for Presto, Apache Hudi and Iceberg to help manage transactions, and AWS S3 as their data lake.
Today presto supports broadcast join by having a worker to fetch data from a small data source to build a hash table and then sending the entire data over the network to all other workers for hash lookup probed by large data source. This can be optimized by a new query execution strategy as source data from small tables is pulled directly by all workers which is known as replicated reads from dimension tables. This feature comes with a nice caching property given that all worker nodes N are now participating in scanning the data from remote sources. The table scan operation for dimension tables is cacheable per all worker nodes. In addition, there will be better resource utilization because the presto scheduler can now reduce the number plan fragment to execute as the same workers run tasks in parallel within a single stage to reduce data shuffles.
Today’s digital-native companies need a modern data infra that can handle data wrangling and data-driven analytics for the ever-increasing amount of data needed to drive business. Specifically, they need to address challenges like complexity, cost, and lock-in. An Open SQL Data Lakehouse approach enables flexibility and better cost performance by leveraging open technologies and formats. Join us for this panel where leading technologists from the Presto open source project will share their vision of the SQL Data Lakehouse and why Presto is a critical component.
Apache Ranger has been the user’s choice to support authorization in various data platforms from small-scale to enterprise-grade production environments. At Ahana, engineers are working on the Presto-Ranger integration, aiming to support global fine-grained data access control across all catalogs for Presto, while also providing auditing and monitoring of user access. We would like to collaborate with the Privacera and share our learnings, what we developed so far, and also hope to shed light on the future work of the Ranger Presto Plugin with Apache Ranger committer.
AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. At Ahana and Amazon, engineers are working on Presto and Lake Formation integration to support Authorization on Presto. This means that Presto clusters will be enforce data permissions on user queries against Lake Formation backed data lakes, which is a tightly integrated Lake Formation, AWS Glue, and Amazon S3 data lake stack. In this session we will present high level design, our leanings, future plans and demo how data platform users can use Lake Formation integration to support fine-grained data access controls on Presto.
Cartona is one of the fastest growing B2B e-commerce marketplaces in Egypt that connects retailers with suppliers, wholesalers, and production companies. We needed to federate across multiple data sources, including transactional databases like Postgres and AWS S3 data lake. In this session, we’ll talk about how Presto allows us to join across all of these data sources without having to copy or ingest data – it’s all done in place. In addition, we’ll talk about how we were up and running in less than an hour with the Ahana Cloud managed service. It gives us the power of Presto and the ease of use without the need to manage it or have deep skills to deploy and operate it.
Presto is complicated with many intricacies. Ahana Cloud is the only managed service for Presto on AWS that simplifies Presto, bringing its power to platform teams of any size or skill set. In this session we’ll give you a quick overview of Ahana Cloud, including managing multiple Presto clusters seamlessly, querying a range of data sources, as well as just-released capabilities.
AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and AWS Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. In this talk, Wen will walk through the recently announced AWS Lake Formation and Ahana integration