Presto’s Intelligent Future: Leveraging RAG and LLM’s for Smarter Query Execution 

    At PrestoCon Day 2025, Satej Sahu (Principal Data Engineer at Zalando SE) introduced the Self-Healing Query Connector for Presto, an AI-powered upgrade designed to make query troubleshooting faster, smarter, and more reliable. By combining Large Language Models with live query data, including logs, explain plans, and schema details it delivers accurate, context-aware solutions that improve query success rates and reduce operational overhead. With self-healing capabilities baked directly into the Presto engine, data teams can spend less time firefighting and more time unlocking business insights. 

    The Pain Points of Presto Failures

    Running Presto queries is often accompanied by a range of failures and optimization challenges that can be quite “painful”. Common causes include: 

    • Syntactical errors
    • Schema-based issues
    • Incorrect references based on data structure. 
    • Cluster sizing problems
    • Memory exhaustion

    These failures necessitate manual troubleshooting, a process that significantly slows down engineering velocity and consumes valuable time that could be better spent on solving business problems. Rerunning queries without understanding or fixing the root cause is inherently wasteful and unproductive. 

    Unleashing Self-Healing Capabilities with AI 

    The core idea is to enable Presto to become self-healing, interpreting and resolving issues without extensive manual intervention. This is where AI and automation step in. AI can provide solutions, but the feedback loop and tooling layers (like agents) are crucial for continuous improvement and reducing manual toil. 

    To achieve this, the AI system needs key contextual inputs, including: 

    • Query logs
    • Explain plans
    • Error messages
    • Session properties and system conflicts
    • Presto system documentation and schema context

    While manually processing this information can be cognitively overwhelming for humans, feeding it into a Large Language Model (LLM) or Generative AI allows the system to identify incremental (or even multi-step) solutions, acting as a powerful assistant. 

    Grounding AI: The Power of Retrieval Augmented Generation (RAG) 

    Large Language Models are trained on vast datasets, which can sometimes lead to biases or “hallucinations”, generating seemingly valid but factually inaccurate or irrelevant responses. To counteract this and ground the LLM’s responses in accurate, relevant information, Retrieval Augmented Generation (RAG) is employed. 

    Here’s how RAG works in this context: 

    1. User Query: A query is initiated by the user. 
    1. Embeddings: An embedding model converts the user query into a vector mathematical representation (an embedding). Embeddings allow similar words or sentences to be represented in close proximity within a multi-dimensional space, enabling correlation. 
    1. Vector Store Search: This query embedding is then used to perform a similarity search in a vector store containing embeddings of relevant documents. This step retrieves the most relevant documents. 
    1. Contextualized LLM Prompt: The retrieved relevant documents are then fed along with the original query to the LLM. This provides the LLM with specific, factual context. 
    2. Natural Language Response: The LLM, now “grounded” by the provided documents, generates a more accurate and human-friendly response. 

    Architectural Approaches for AI Assistance

    1. External Chat Ecosystem 

    • Description: Building a completely independent chatbot interface (e.g., mobile or web) that interacts with an LLM without direct integration into the Presto system. 
    • Pros: Quick and easy setup. 
    • Cons: 
      • Lacks contextual system information: Requires the user to manually pass query explanations, schema details, configuration, and documentation as part of the prompt. 
      • Prompt Window Limitations: LLMs have finite context window sizes, making it challenging to include all necessary information for complex scenarios. 
      • Response Quality: The quality of the response is highly dependent on the complexity and completeness of the prompt. The LLM may also lack inherent knowledge of Presto specifics like “explain plans”. 

    2. Connector Ecosystem (The Preferred Approach) 

    • Description: Extending the Presto system by adding a custom connector that integrates directly with the LLM. 
    • Pros: 
      • Contextual Information: The system itself can fetch necessary contextual information by running explain queries and accessing config parameters, reducing manual input. 
      • Real Execution Details: Provides the LLM with actual execution details rather than simulations. 
      • Familiar Environment: Leverages the existing appeal of Presto’s connector ecosystem. 
    • Cons: 
      • Complexity: Can be more complex to build depending on the desired features. 
      • Integration Needed: Requires careful integration with the LLM and defining the sequence of events. 

    The Self-Healing Connector Architecture in Detail 

    In the connector ecosystem approach, Presto acts as a core system that the LLM can interface with, running queries, gathering results, and processing them iteratively. 

    Here’s how a self-healing custom connector would operate: 

    1. Query Execution: A user executes a query. 
    2. Custom Connector Intervention: A custom connector intercepts the query. 
    3. Input Parameters: This connector takes specific inputs, such as an action (e.g., explain to understand the query plan or query to execute) and optionally schema information (though the system can often fetch this itself). 
    4. Client Request Execution: The connector executes client requests (e.g., to MySQL) or uses custom client APIs to fetch data. 
    5. Result Retrieval: It retrieves results from the data sources. 
    6. Feedback to LLM: The retrieved results, along with the original query and any error messages, are fed to the chosen LLM (e.g., OpenAI) via a prompt. 
    7. LLM Processing and Remediation: 
      • The LLM, now understanding the schema and error context, can decide to fix the query
      • It can re-execute the refined query
      • Few-shot examples can be provided to the LLM to guide it towards correct syntax, especially for SQL-oriented models. 
      • Retry thresholds can be set for auto-retries. 
    8. User Feedback: The LLM’s refined query or the final, fixed result is presented back to the user in a more user-friendly format. 

                        This continuous loop of execution, error detection, LLM analysis, and re-execution forms the basis of the self-healing system. 

                        Beyond Troubleshooting: Natural Language to SQL 

                        The connector can evolve further to bridge the gap between business users and SQL. Instead of requiring a SQL query, the connector can take a natural language problem statement (e.g., “What’s the maximum revenue across sales?”). 

                        1. The connector, along with fetched schema information, feeds this natural language problem to the LLM. 
                        2. The LLM then interprets the schema and generates the appropriate SQL query, abstracting the complexity from the business user. 
                        3. This generated SQL query is then executed, and the results are presented back to the user. 

                            Technical Considerations and Future Directions 

                            Implementing such a system involves: 

                            • API Interactions: Sending requests to LLMs (like OpenAI) via RESTful API endpoints. 
                            • Tooling: Using frameworks like LangChain4j or Spring AI can simplify LLM integration. 
                            • Query Analysis: Executing explain queries to get detailed execution plans and feeding them to the LLM for optimization. 

                            What’s next for self-healing Presto? 

                            • Expanded Error Coverage: Enhancing the system’s ability to handle a wider range of error types. 
                            • Presto Coordinator Integration: Deeper integration into the Presto coordinator for more advanced control (though this involves more complex core code changes). 
                            • Human-in-the-Loop: Incorporating mechanisms for human feedback (e.g., a dedicated column for hints) to continuously improve the AI’s performance. 
                            • Memory Layer: Adding a memory layer to allow the AI to remember previous queries and conversations, improving conversational coherence. 
                            • Observability: Crucially, monitoring the performance of the system (latency, parallelization) to understand its effectiveness and guide further improvements. 

                            The Impact: Less Toil, More Value

                            By leveraging AI and a well-designed connector ecosystem, Presto can evolve into a truly smarter and more resilient data platform, freeing engineers to focus on higher-value business problems. 

                            Follow Presto at LinkedinYoutube, and Join Slack channel to interact with the community.