Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
Uber uses Presto for the SQL Data Lakehouse where over 7K weekly active users run 500K queries/day on 59 PB HDFS bytes/day.
At Twitter, to overcome performance issues that can arise in developing and maintaining SQL systems with increasing volumes of data, we designed a large-scale SQL federation system across on-premises and cloud Hadoop and Google Cloud Storage (GCS) clusters. leveraging Presto as the core of SQL engine clusters, pursuing high scalability and availability to fulfill the increasing need for data analytics on a petabyte (PB) scale of data. All the presto clusters have ran more than 10 million presto sql queries.
Ahana offers the only managed service for Presto as a feature rich, next-gen SQL query engine in Ahana Cloud. It plays a critical role for our data platform users who want easy-to-use, fully integrated and cloud native SQL engine on their S3 data lakes and other data sources. Ahana has everything you need to get started with SQL on the Open Data Lakehouse.
Alibaba Data Lake Analytics embraces Presto’s federated query engine capability and has accumulated a number of successful business use cases that signify the power of Presto’s analytics capability.
Carbon is a real-time revenue management platform that consolidates revenue and audience analytics, data management, and yield operations into a single solution. Real-time analytics is super critical – their customers rely on real-time data to make revenue decisions. After facing issues around performance, visibility & ease of use, and serverless pricing model with AWS Athena, the team moved to a managed service for PrestoDB in the cloud – Ahana Cloud – to power their customer-facing dashboards.
Cartona is one of the fastest growing B2B e-commerce marketplaces in Egypt that connects retailers with suppliers, wholesalers, and production companies. We needed to federate across multiple data sources, including transactional databases like Postgres and AWS S3 data lake and Presto allowed us to join across all of these data sources without having to copy or ingest data – it’s all done in place.
Blinkit, India’s leading instant delivery service, uses Presto on AWS to help them deliver on their promise of “everything delivered in 10 minutes”. Blinkit moved to Presto on S3 from their cloud data warehouse for more flexibility and better price performance and created open data lakehouse architecture which includes Presto as their SQL engine for ad hoc reporting, Ahana as SaaS for Presto, Apache Hudi and Iceberg to help manage transactions, and AWS S3 as their data lake.
Securonix uses Ahana Cloud for Presto to enable extremely fast SQL queries on AWS S3 for ‘Threat Hunting’. “Before Presto we were using a Hadoop cluster, and the challenge was on scale…not only was it expensive but the scaling factors were not linear. The Presto engine was designed for scale, and it’s feature-built just for a query engine. Ahana Cloud made it easy for us to use Presto in the cloud.” – Derrick Harcey, Chief Architect at Securonix
Facing challenges with PostgreSQL and MySQL when it came to their rapid increase in data volume and compute, Metropolis needed a flexible and horizontally scalable data lake architecture to address their challenges associated with control and flexibility but still needed fine-grained security policies. They moved to a data lake architecture which includes AWS S3 for data storage, AWS Lake Formation for fine-grained security control, and Ahana for Presto for SQL on S3. Metropolis augments their own data with third-party sources like Zendesk, Heap, and Stripe, storing the resulting datasets in the data lake.
Adroitts’ mission is to build secure, reliable and cost-effective products for their customers. When it comes to data analytics, they must be able to link transactions to respective customers with their profile information to get the full view of the customer. They chose Presto as the underlying query engine to do this.
Platform24, the digital healthcare SaaS platform for healthcare providers, based in Sweden uses Presto as SQL engine for their Modern Data Platform to address the need of Open Source, Self hosted, Kubernetes friendly, Reliable, Scalable and Cost effective, etc.
Presto has been widely used in Bytedance, e.g. DataWarehouse, BI Tools, Ads and so on. At Bytedance, OLAP Platform migrated their ad-hoc workloads from Apache Hive and Apache Spark to Presto and It quickly become popular and expanded fast. Today, Presto cluster at Bytedance have tens of thousands compute cores and serves about 1 million queries per day which cover more than 90 percent of interactive queries. This dramatically reduced the query latency and saved a lot of compute resources.
HermesDB is the next generation of OLAP engine at Tencent with the architecture featuring separation of storage and calculation. HermesDB characterizes efficient indexing files in storage data, equipping with customized Presto as the core query engine. With the help of Presto connector, HermesDB could not only support full ANSI syntax but also ultilize Apache Lucene as underlying computer core.
Rippling, a popular HR and payroll platform, uses Presto to power their data platform and enable real-time querying at scale. Coupled with Apache Pinot, it can handle large amounts of data at significant scale. Specifically, they’ve focused on utilizing Presto for projection push down. For complex SQL queries that get translated into even more complex SQL statements, they can push down certain operations to save time. They’ve also implemented dynamic filtering push down, which pushes the filtering down into the connector and allows for the scan to be filtered as well. This results in a significant reduction in the amount of data being scanned and processed.
Twilio uses Presto on AWS. Approximately 80% of Twilio’s data comes from product teams that use Kafka or MySQL databases. In addition to this, the company receives data from external sources such as Salesforce, Zendesk, and Marketo, as well as internal CSV files generated by accounting and finance teams. This data is loaded into the S3 data lake using config-driven Python and Spark-based loaders. With Presto, they can decouple the storage and compute layers and scale without affecting performance. In addition to data exploration and ad-hoc analysis by data analysts, Presto has also been used as a data source for real-time dashboards and machine learning models.