How Jio Platforms Leverages Presto for Large-Scale Analytics
In our latest community spotlight, we sat down with Sonal Holankar, Associate Data Engineer at Jio Platforms, about how they use Presto to power analytics at scale. Jio Platforms, a subsidiary of Reliance Industries, is one of India’s leading digital service providers, with a suite of applications and services, including JioMart, JioMoney, and JioGames. Managing vast amounts of data efficiently is a critical part of their operations, and Presto plays a key role in this effort.
Presto at Jio Platforms
Jio Platforms operates across seven Hadoop clusters, using Presto as their primary SQL query engine to analyze and retrieve business-critical data. Sonal’s team runs Presto across these clusters to process and query massive amounts of data, ensuring rapid insights for their internal teams.
“We handle terabytes of data daily and rely on Presto to execute queries and fetch results in seconds,” Sonal shared. “It is incredibly fast and works seamlessly with various file formats, including ORC and Parquet.”
The team connects Presto to their Hive Metastore (HMS), enabling smooth integration with Hadoop-based data storage. They leverage the Presto REST API They’ve built an internal data application that allows users to query data via Presto, providing a fast and efficient way to extract insights.
Presto + Tableau for Interactive Analytics
Using Tableau with Presto is one of their key use cases, which allows analysts to create interactive dashboards directly from Presto-powered queries.
“With Tableau, our users can connect to Presto, fetch schema details, and generate insights interactively. Presto’s ability to handle complex queries at speed is a major advantage,” Sonal explained.
Managing Presto at Scale
With multiple Presto clusters in production, monitoring and performance tuning are key aspects of Jio’s workflow. Their team has implemented various measures to keep Presto running smoothly:
- Query Execution Limits: They set a query execution time limit of 5 minutes to prevent excessive load on the Presto coordinator.
- Shell Scripts for Monitoring: They do daily monitoring of Presto clusters via the Presto REST API. Their custom scripts fetch daily query metrics, helping them track usage patterns and optimize cluster performance.
- Troubleshooting & Debugging: The team manually monitors Presto for network issues or downtime, and they restart nodes when necessary to maintain stability.
Scaling with Iceberg, Kudu, and More
Beyond traditional Hadoop-based storage, Jio Platforms is also exploring Apache Iceberg and Kudu for modern data lakehouse architectures.
“We created separate Presto catalogs for Iceberg and Kudu tables, allowing us to access them efficiently. However, we found that only Hive 4 supports Iceberg with HMS, so we built a dedicated catalog for that integration,” Sonal noted.
Challenges & Future Improvements
While Presto performs well, managing large workloads comes with its challenges. Sonal mentioned a few key areas where they aim to improve:
- Automated Monitoring: Currently, monitoring is a manual process, and automating it would enhance operational efficiency.
- Scaling for High Query Loads: When users submit over 400 queries simultaneously, performance can degrade. Fine-tuning cluster configurations will help address this.
Conclusion
Jio Platforms showcases how Presto can handle large-scale analytics workloads efficiently, delivering fast query performance across vast datasets. Their setup with Hadoop, Hive, Tableau, and Iceberg highlights the flexibility of Presto in modern data architectures.
We’re excited to see how Jio continues to leverage Presto!
Are you using Presto at scale? Join our Slack community and share your experiences with us!