How can Presto better support ML users?
In this talk, I’ll discuss some of the challenges faced by ML users as they leverage Presto to prepare large scale training datasets. Based on the experience supporting these workloads at Meta, I’ll present how they are different from traditional analytic workloads, and discuss the opportunities such new requirements offer to the design of modern compute engines. I’ll present our findings in three different dimensions:
- More efficient storage and in-memory data layout.
- Compressed execution and its impact in operator design.
- (Extremely) late materialization.
I’ll also share recent progress the Meta team made at supporting these workloads, initial results, existing and new open source projects that support this stack, and present areas where more research, development, and collaboration is needed.
