Setting Up Presto with Apache Superset: Hands-On Guide

    PrestoDB, an open-source distributed SQL query engine, allows you to query data from multiple disparate sources. When combined with Apache Superset, an open-source data visualization and exploration platform, it forms a powerful and flexible analytics solution. This guide provides a step-by-step approach to deploying these components within a Dockerized environment, simplifying setup and management.

    Pre-Requisites:

    Before proceeding, ensure you have the following installed and possess basic familiarity with Docker commands:

    • Docker Application: The guide uses OrbStack as an example
    • Basic Docker Commands Knowledge.

    Step-by-Step Implementation Guide

    Step 1: Project Structure and Docker Compose Configuration

    The foundation of this setup is a docker-compose.yml file, which orchestrates all necessary services. The configuration specifies container images, port mappings, environment variables, and data persistence volumes.

    Step -2: Setting Up Docker Compose

    version: "3.8"
    
    services:
      superset:
        image: apache/superset:latest
        container_name: superset
        ports:
          - "8088:8088"
        environment:
          SUPERSET_SECRET_KEY: 'supersecretkey'
          PYTHONUNBUFFERED: 1
        depends_on:
          - db
        volumes:
          - superset_home:/app/superset_home
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost:8088/health"]
          interval: 30s
          timeout: 10s
          retries: 5
        command: >
          /bin/bash -c "
          sleep 10 &&
          superset db upgrade &&
          superset fab create-admin --username admin --firstname Admin --lastname User --email admin@superset.com --password admin &&
          superset init &&
          superset run -h 0.0.0.0 -p 8088
          "
    
      db:
        image: postgres:15
        container_name: superset_db
        environment:
          POSTGRES_DB: superset
          POSTGRES_USER: superset
          POSTGRES_PASSWORD: superset
        volumes:
          - db_data:/var/lib/postgresql/data
    
      mysql:
        image: mysql:latest
        container_name: mysql
        environment:
          MYSQL_ROOT_PASSWORD: root
          MYSQL_DATABASE: testdb
        ports:
          - "3307:3306"
        volumes:
          - mysql_data:/var/lib/mysql
    
      mongo:
        image: mongo:latest
        container_name: mongodb
        ports:
          - "27018:27017"
        volumes:
          - mongo_data:/data/db
    
      presto:
        image: prestodb/presto:latest
        container_name: presto
        ports:
          - "8081:8080"
        volumes:
          - ./presto/etc/catalog/mongodb.properties:/opt/presto-server/etc/catalog/mongodb.properties
          - ./presto/etc/catalog/mysql.properties:/opt/presto-server/etc/catalog/mysql.properties
        depends_on:
          - mysql
          - mongo
    
    volumes:
      superset_home:
      db_data:
      mysql_data:
      mongo_data:

    Step 3: Creating Presto Catalog Files

    Presto uses catalog files to define connections to external data sources. For this setup, two files are created within a presto/etc/catalog directory (relative to your docker-compose.yml):

    mysql.properties (To connect to MySQL Database):

    connector.name=mysql
    connection-url=jdbc:mysql://mysql:3306
    connection-user=root
    connection-password=root

    mongodb.properties (To connect to MongoDB Database):

    connector.name=mongodb
    mongodb.seeds=mongodb:27017

    Step 4: Orchestrating Services with Docker Compose

    • Navigate to the directory containing your docker-compose.yml file in your terminal and execute the following command to start all defined services in detached mode:
    docker-compose up -d
    • Once all the images are pulled, hit the below command to check the status of all containers.
    docker ps
    • Finally, confirm that Apache Superset is accessible on http://localhost:8088/ and PrestoDB on http://localhost:8081/ in your web browser.

    Step 5: Integrating PrestoDB with Apache Superset

    Apache Superset does not include the Presto driver by default, necessitating a manual installation within the Superset container.

    • Below command provides an interactive bash shell within the superset container.
    docker exec -it superset bash
    • Once inside the container, install the pyhive[presto] Python package, which is crucial for Superset-Presto connectivity:
    pip install "pyhive[presto]"
    • After the installation, exit the Superset container and restart it to apply the changes.
    docker restart superset
    • Open Superset in your browser at http://localhost:8088/.
    • Log in using the default credentials: Username: admin, Password: admin.
    • Navigate to Settings -> Database Connections -> Database within the Superset interface.
    • Proceed with connecting PrestoDB. Upon seeing “Connection looks good”, click CONNECT.
    • Congratulations, everything is running smoothly and Presto has connected with Apache Superset.

    Step 6: Verifying Data Access and Querying

    Conclusion:

    Follow Presto at LinkedinYoutube, and Join Slack channel to interact with the community.