Presto on AWS Journey at Twilio – Lesson Learned and Optimization – Aakash Pradeep & Badri Tripathy

Presto on AWS Journey at Twilio – Lesson Learned and Optimization – Aakash Pradeep & Badri Tripathy

Twilio as a leader in cloud communication platforms is very heavy on data and data-based decision making. Most data related use cases are currently powered by the Presto engine. Two years back we started the Journey with Presto in Twilio and today the system has scaled to a multi-PB data lakehouse and supports more than 75k queries per day. In this journey, we learned a lot about how to effectively operationalize Presto on AWS and some of the tricks to have better query reliability, query performance, guard-railing the clusters and save cost. With this talk, we want to share this experience with the community.

Disaggregated Coordinator – Swapnil Tailor, Facebook

Disaggregated Coordinator – Swapnil Tailor, Facebook

In the existing Presto architecture, single coordinator has become a bottleneck in a number of ways for cluster scalability. – With an increasing number of workers, the coordinator has the potential of slow down due to a high number of tasks. – In high QPS use cases, we have found workers can become starved of splits by excessive CPU being spend on task updates in coordinator. – Also with single coordinator, we have an upper limit on the worker pool because of above-mentioned reasons. To overcome with this challenges, we are coming up with a new architecture which supports multiple coordinators in a single cluster.