PrestoDB Blog - PrestoDB

What is Presto on Spark?

By Rohan Pednekar, Shradha Ambekar & Ariel Weisberg November 15, 2021October 19, 2023

1. Reporting and dashboarding This includes serving custom reporting for both internal and external developers for business insights and also many organizations using Presto for interactive A/B testing analytics. A defining characteristic of this use case is a requirement for low latency. It requires tens to hundreds of milliseconds at very high QPS, and not…

Scaling with Presto on Spark

By Rohan Pednekar, Shradha Ambekar & Ariel Weisberg October 26, 2021September 21, 2023

Overview Presto was originally designed to run interactive queries against data warehouses, but now it has evolved into a unified SQL engine on top of open data lake analytics for both interactive and batch workloads. Popular workloads on data lakes include: 1. Reporting and dashboarding This includes serving custom reporting for both internal and external…

Using OptimizedTypedSet to Improve Map and Array Functions

By Ying Su December 4, 2020September 21, 2023

Function evaluation is a big part of projection CPU cost. Recently we optimized a set of functions that use TypedSet, e.g. map_concat, array_union, array_intersect, and array_except. By introducing a new OptimizedTypeSet, the above functions saw improvements in several dimensions: Furthermore, OptimizedTypeSet resolves the long standing issue of throwing EXCEEDED_FUNCTION_MEMORY_LIMIT for large incoming blocks: “The input…