Facing challenges with PostgreSQL and MySQL when it came to their rapid increase in data volume and compute, Metropolis needed a flexible and horizontally scalable data lake architecture to address their challenges associated with control and flexibility but still needed fine-grained security policies. They moved to a data lake architecture which includes AWS S3 for data storage, AWS Lake Formation for fine-grained security control, and Ahana for Presto for SQL on S3. Metropolis augments their own data with third-party sources like Zendesk, Heap, and Stripe, storing the resulting datasets in the data lake.
Twilio uses Presto on AWS. Approximately 80% of Twilio’s data comes from product teams that use Kafka or MySQL databases. In addition to this, the company receives data from external sources such as Salesforce, Zendesk, and Marketo, as well as internal CSV files generated by accounting and finance teams. This data is loaded into the S3 data lake using config-driven Python and Spark-based loaders. With Presto, they can decouple the storage and compute layers and scale without affecting performance. In addition to data exploration and ad-hoc analysis by data analysts, Presto has also been used as a data source for real-time dashboards and machine learning models.