As part of Adobe Advertising, the Adobe Data Processing platform uses Presto for three key use cases: scheduled pipelines, ad-hoc query, and custom reporting. Their platform handles data throughput of 12B events daily, in addition to 20B user profiles for audience targeting and segmentation and 200B auctions/day for their real-time ad bidding. Presto plays a central role in managing these tasks, including approximately 4K ad-hoc queries/month.
Adroitts’ mission is to build secure, reliable and cost-effective products for their customers. When it comes to data analytics, they must be able to link transactions to respective customers with their profile information to get the full view of the customer. They chose Presto as the underlying query engine to do this.
Alibaba Data Lake Analytics embraces Presto’s federated query engine capability and has accumulated a number of successful business use cases that signify the power of Presto’s analytics capability.
Bhuma’s platform, built on Presto, enables real-time data apps on the open data lake using modern APIs. The Bhuma team built the Presto JS client along with a Presto orchestrator to manage query prioritization and deeper insights into the runtime query analytics.
Blinkit, India’s leading instant delivery service, uses Presto on AWS to help them deliver on their promise of “everything delivered in 10 minutes”. Blinkit moved to Presto on S3 from their cloud data warehouse for more flexibility and better price performance and created open data lakehouse architecture which includes Presto as their SQL engine for ad hoc reporting, Ahana as SaaS for Presto, Apache Hudi and Iceberg to help manage transactions, and AWS S3 as their data lake.
Bolt is a ride sharing app with 100M users across 45 European countries. Presto replaced Redshift as the primary query engine for BI, with over 10 Presto clusters supporting different business verticals. This enabled them to move to a Presto-based open data lakehouse that alleviated critical issues in their legacy architecture in addition to saving on costs.
Presto has been widely used in Bytedance, e.g. DataWarehouse, BI Tools, Ads and so on. At Bytedance, OLAP Platform migrated their ad-hoc workloads from Apache Hive and Apache Spark to Presto and It quickly become popular and expanded fast. Today, Presto cluster at Bytedance have tens of thousands compute cores and serves about 1 million queries per day which cover more than 90 percent of interactive queries. This dramatically reduced the query latency and saved a lot of compute resources.
Carbon is a real-time revenue management platform that consolidates revenue and audience analytics, data management, and yield operations into a single solution. Real-time analytics is super critical – their customers rely on real-time data to make revenue decisions. After facing issues around performance, visibility & ease of use, and serverless pricing model with AWS Athena, the team moved to a managed service for PrestoDB in the cloud – Ahana Cloud – to power their customer-facing dashboards.
Cartona is one of the fastest growing B2B e-commerce marketplaces in Egypt that connects retailers with suppliers, wholesalers, and production companies. We needed to federate across multiple data sources, including transactional databases like Postgres and AWS S3 data lake and Presto allowed us to join across all of these data sources without having to copy or ingest data – it’s all done in place.
The Denodo platform is a solution that provides data management capabilities across the distributed modern data landscape. Its goal is to provide a consistent engine to enable and enforce security, data integration, self-service and governance across all data, regardless of location and technology. As part of its capabilities, it includes a distribution of Presto as its data lake engine.
HPE Ezmeral Unified Analytics is an end-to-end data & AI/ML platform that consists of several popular open-source frameworks for data engineering, data analytics, data science, & ML engineering, including Presto which provides Unified Analytics as a fast SQL query engine, and also as a secure data access layer.
IBM uses Presto to power watsonx.data, its open data lakehouse. IBM is also a key contributor to the open source project, including Presto optimizer and performance work.
As a member of the Presto Foundation, Intel works closely on the open-source development of Presto. Just recently they developed AutoSteer, an ML-based solution that automatically drives query optimization in any SQL database that exposes tunable optimizer knobs, including PrestoDB. They achieved up to 40% improvement in query performance vs. PrestoDB’s native query optimizer in its latest benchmarking.
Intuit handles a massive amount of data, including hundreds of thousands of tables and petabytes of data. They serve over 100 million customers and have a comprehensive data architecture in place. Intuit primarily uses AWS S3 for data persistence and relies on Presto and Spark for data processing. They utilize Presto on Spark for exploration, operationalize queries through data pipelines, and leverage Tableau and Qlik Sense for data visualization.
Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
Facing challenges with PostgreSQL and MySQL when it came to their rapid increase in data volume and compute, Metropolis needed a flexible and horizontally scalable data lake architecture to address their challenges associated with control and flexibility but still needed fine-grained security policies. They moved to a data lake architecture which includes AWS S3 for data storage, AWS Lake Formation for fine-grained security control, and Ahana for Presto for SQL on S3. Metropolis augments their own data with third-party sources like Zendesk, Heap, and Stripe, storing the resulting datasets in the data lake.
Platform24, the digital healthcare SaaS platform for healthcare providers, based in Sweden uses Presto as SQL engine for their Modern Data Platform to address the need of Open Source, Self hosted, Kubernetes friendly, Reliable, Scalable and Cost effective, etc.
Rippling, a popular HR and payroll platform, uses Presto to power their data platform and enable real-time querying at scale. Coupled with Apache Pinot, it can handle large amounts of data at significant scale. Specifically, they’ve focused on utilizing Presto for projection push down. For complex SQL queries that get translated into even more complex SQL statements, they can push down certain operations to save time. They’ve also implemented dynamic filtering push down, which pushes the filtering down into the connector and allows for the scan to be filtered as well. This results in a significant reduction in the amount of data being scanned and processed.