Ying Su, Author at PrestoDB

Using OptimizedTypedSet to Improve Map and Array Functions

By Ying Su December 4, 2020September 21, 2023

Function evaluation is a big part of projection CPU cost. Recently we optimized a set of functions that use TypedSet, e.g. map_concat, array_union, array_intersect, and array_except. By introducing a new OptimizedTypeSet, the above functions saw improvements in several dimensions: Furthermore, OptimizedTypeSet resolves the long standing issue of throwing EXCEEDED_FUNCTION_MEMORY_LIMIT for large incoming blocks: “The input…

Even Faster Unnest

By Ying Su, Maria Basmanova & Orri Erling August 20, 2020September 21, 2023

Unnest is a common operation in Facebook’s daily Presto workload. It converts an ARRAY, MAP, or ROW into a flat relation. Its original implementation used deep copy all the time and was very inefficient. In Unnest Operator Performance Enhancement with Dictionary Blocks, the author improved the Unnest operator by up to 10x in CPU and…

5 design choices—and 1 weird trick — to get 2x efficiency gains in Presto repartitioning

By Ying Su, Orri Erling, Tim Meehan, Sahar Massachi, Bhavani Hari & Maria Basmanova December 20, 2019September 21, 2023

We like Presto. We like it a lot — so much we want to make it better in every way. Here’s an example: we just optimized the PartitionedOutputOperator. It’s now 2-3x more CPU efficient, which, when measured against Facebook’s production workload, translates to 6% gains overall. That’s huge. The optimized repartitioning is in use on…

Everything You Always Wanted To Do in Table Scan

By Orri Erling, Maria Basmanova, Ying Su, Tim Meehan & Elon Azoulay June 29, 2019September 21, 2023

Table scan, on the face of it, sounds trivial and boring. What’s there in just reading a long bunch of records from first to last? Aren’t indexing and other kinds of physical design more interesting? As data has gotten bigger, the columnar table scan has only gotten more prominent. The columnar scan is a fairly…