PrestoCon Day 2024: Optimizing Data Analytics at Etisalat Egypt with Presto 

    This is our first blog in our PrestoCon Day 2024 recap series! At PrestoCon Day we had Mohamed Taha, Senior Data Engineer at Etisalat Egypt, present his team’s journey in optimizing data analytics using Presto. In this blog we’ll attempt to highlight the key points and take-away’s from Mohamed’s talk (which was awesome!)  

    Quick overview of Etisalat Egypt 

    Etisalat, now rebranded as e&, is a telecommunications and technology company that started in 1967 in the UAE. It has expanded to 32 countries, serving over 170 million subscribers. Etisalat Egypt, a part of this global entity since 2007, provides diverse services to more than 30 million customers, including telecom services, ADSL, e-commerce websites, entertainment, and B2B services. 

    Legacy Data Landscape 

    Initially, Etisalat Egypt’s data landscape was relatively simple, supporting telecom operations with systems like billing, customer service, and charging. Data from these systems went through ETL processes into a Teradata data warehouse, supporting dashboards, AI, machine learning, and reporting.  

    However, scaling this model was expensive and complex and led to inefficiencies like data silos. 

    To address these inefficiencies, a data lake was added, consisting of a petabyte-scale storage box with data stored in Parquet format. Spark was used for transformations, and Impala served as the representation layer. Despite these improvements, Impala’s high latency for data lake queries and resulting data silos presented significant challenges. 

    Key Requirements for Data Transformation 

    The team identified several key requirements for their data transformation: 

    • Minimize data movement 
    • Eliminate data silos 
    • Enable faster queries on the data lake  
    • Ensure tools are scalable and customizable 
    • Seamless integration with existing BI and reporting tools 
    • Strong community support and a fast learning curve 

    Journey to Presto 

    The journey began with the data mesh concept, aiming to decentralize data and facilitate federated queries across various sources using a data virtualization tool. However, reliance on Impala for querying the data lake and limitations of their data virtualization tool led to continued challenges. 

    Next, they explored ClickHouse, an OLAP database and query engine, for real-time user-facing analytics dashboards. ClickHouse supports different types of tables, including internal tables optimized for queries and external tables from HDFS or Hive catalogs. 

    Despite its potential, implementation issues arose: 

    • Hive connector instability 
    • Poor performance of the JDBC engine 
    • Complexity in migrating SQL logic from Teradata and Hive to ClickHouse 
    • Manual partitioning required for HDFS tables due to ClickHouse’s shared-nothing architecture 

    Finally, the team adopted Presto, which proved to be a game-changer. Presto’s compatibility with Hive SQL and ability to connect directly to the Hive catalog for processing parquet files on HDFS solved many performance issues. Key advantages included: 

    • Easy setup 
    • Great scalability by adding more workers 
    • Familiar SQL syntax similar to Hive 

    Challenges with Presto included: 

    • No Teradata connector 
    • Inconsistent pushdown capabilities across connectors 

    Despite these, Presto significantly improved query performance and scalability. 

    Presto vs. Trino 

    The team did evaluate Presto vs. Trino and ultimately chose Presto. A key consideration included the work going into Presto C++ and associated performance gains. Here’s their take-away’s: 

    Future Work and Suggestions 

    Looking ahead, Etisalat Egypt plans to enhance their Presto usage with the following: 

    • UI Enhancements: Improve administration capabilities, including controlling query queues, rushing queries, and managing sessions. 
    • Technical Improvements: Explore the C++ engine and leverage Trino’s feature to write SQL queries native to the source within Trino SQL. 
    • Community Contributions: Develop tutorials on contributing to Presto code, adding new features like connectors and UDFs. 

    Thank you to Mohamed for sharing this use case at PrestoCon Day, and we can’t wait to see what’s next for the Etisalat Egypt team! 

    Resources 

    Watch the Etisalat Egypt session 
    Join the Presto community slack 
    See upcoming community events