Women in Open Source & Presto – Getting started in the Presto open source ecosystem

Women in Open Source & Presto – Getting started in the Presto open source ecosystem

Women in Open Source & Presto – Getting Started in the Presto Open Source Ecosystem – Neha Pawar, Startree; Rebecca Schlussel, Meta; RongRong Zhong, Celonis & Moderated By Dipti Borkar, Microsoft Among GitHub users with at least ten contributions, a mere 6% were women. This is way less than the ratio of women in tech that various research shows at 26%. Given the amount of investment going into and the growth / success of companies based on open source as well as the enormous demand for developers in open source, it is a ratio we need to strive to improve for women. In this panel, we will discuss a few areas: – The journey of each panelist into open source projects – The benefits they have seen by participating in open source projects particularly Presto – The challenges women face in male-dominated open source communities – Ideas, suggestions and guidance to budding engineers on getting started with open source including Presto.

Query Execution Optimization for Broadcast Join using Replicated-Reads Strategy – George Wang, Ahana

Query Execution Optimization for Broadcast Join using Replicated-Reads Strategy – George Wang, Ahana

Today presto supports broadcast join by having a worker to fetch data from a small data source to build a hash table and then sending the entire data over the network to all other workers for hash lookup probed by large data source. This can be optimized by a new query execution strategy as source data from small tables is pulled directly by all workers which is known as replicated reads from dimension tables. This feature comes with a nice caching property given that all worker nodes N are now participating in scanning the data from remote sources. The table scan operation for dimension tables is cacheable per all worker nodes. In addition, there will be better resource utilization because the presto scheduler can now reduce the number plan fragment to execute as the same workers run tasks in parallel within a single stage to reduce data shuffles.