5 design choices—and 1 weird trick — to get 2x efficiency gains in Presto repartitioning

We like Presto. We like it a lot — so much we want to make it better in every way. Here’s an example: we just optimized the PartitionedOutputOperator. It’s now 2-3x more CPU efficient, which, when measured against Facebook’s production workload, translates to 6% gains overall. That’s huge. The optimized repartitioning is in use on…

Everything You Always Wanted To Do in Table Scan

Table scan, on the face of it, sounds trivial and boring. What’s there in just reading a long bunch of records from first to last? Aren’t indexing and other kinds of physical design more interesting? As data has gotten bigger, the columnar table scan has only gotten more prominent. The columnar scan is a fairly…