PrestoDB Blog - Page 3 of 4

PrestoCon and Growing Industry Consortium – Intel and Upsolver Join Presto Foundation

By Girish Baliga November 20, 2020September 21, 2023

Presto Foundation joined the Linux Foundation over a year ago, and has been focused on growing the Presto open source project and community. We encourage industry involvement with an open charter, clear guiding principles, and community-oriented goals. We recently hosted PrestoCon 2020, our first annual community conference, which was widely attended and well represented by…

Presto Enables Internal Log Data Analysis at Drift

By Arun Venkateswaran October 29, 2020September 21, 2023

I’m a Senior Software Engineer in the data group at Drift, a conversational marketing platform that is used for qualifying leads faster, automatically booking meetings and connecting customers to the right business solutions more efficiently. I’ve used Presto quite a bit throughout my career, and I want to first give readers a quick overview of…

Even Faster Unnest

By Ying Su, Maria Basmanova & Orri Erling August 20, 2020September 21, 2023

Unnest is a common operation in Facebook’s daily Presto workload. It converts an ARRAY, MAP, or ROW into a flat relation. Its original implementation used deep copy all the time and was very inefficient. In Unnest Operator Performance Enhancement with Dictionary Blocks, the author improved the Unnest operator by up to 10x in CPU and…

Getting Started with PrestoDB and Aria Scan Optimizations

By Adam Shook August 14, 2020September 21, 2023

This article was originally published by Adam on June 15th, 2020 over at his blog at datacatessen.com. PrestoDB recently released a set of experimental features under their Aria project in order to increase table scan performance of data stored in ORC files via the Hive Connector. In this post, we’ll check out these new features…

Building a high-performance platform on AWS to support real-time gaming services using Presto and Alluxio

By Teng Wang August 6, 2020September 21, 2023

Electronic Arts (EA) is a leading company in the gaming industry, providing dozens of games to serve billions of users worldwide each year. Making near real-time decisions for EA’s online services is critical for our business. This blog describes a data platform on AWS based on Presto and Alluxio to support online services with instantaneous…

PrestoDB and Apache Hudi

By Bhavani Sudha Saktheeswaran August 4, 2020September 21, 2023

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by introducing primitives such as upserts, deletes and incremental queries. These features help surface faster, fresher data on a unified serving layer. Hudi tables can be stored…

Running Presto in a Hybrid Cloud Architecture

By Adit Madan July 17, 2020September 21, 2023

Migrating SQL workloads from a fully on-premise environment to cloud infrastructure has numerous benefits, including alleviating resource contention and reducing costs by paying for computation resources on an on-demand basis. In the case of Presto running on data stored in HDFS, the separation of compute in the cloud and storage on-premises is apparent since Presto’s…

Data Lake Analytics: Alibaba’s Federated Cloud Strategy

By George Wang June 30, 2020September 21, 2023

Presto is known to be a high-performance, distributed SQL query engine for Big Data. It offers large-scale data analytics with multiple connectors for accessing various data sources. This capability enables the Presto users to further extend some features to build a large-scale data federation service on cloud. Alibaba Data Lake Analytics embraces Presto’s federated query…

Improving Presto Latencies with Alluxio Data Caching

By Rohit Jain June 16, 2020September 21, 2023

The Facebook Presto team has been collaborating with Alluxio on an open source data caching solution for Presto. This is required for multiple Facebook use-cases to improve query latency for queries that scan data from remote sources such as HDFS. We have observed significant improvements in query latencies and IO scans in our experiments. We…

Spatial Joins 1: Local Spatial Joins

By James Gill May 7, 2020September 21, 2023

A common type of spatial query involves relating one table of geometric objects (e.g., a table population_centers with columns population, latitude, longitude) with another such table (e.g., a table counties with columns county_name, boundary_wkt), such as calculating for each county the population sum of all population centers contained within it. These kinds of calculations are…

Engineering SQL Support on Apache Pinot at Uber

By Haibo Wang March 18, 2020September 21, 2023

The article, Engineering SQL Support on Apache Pinot at Uber, was originally published by Uber on the Uber Engineering Blog on January 15, 2020. Check out eng.uber.com for more articles about Uber’s engineering work and follow Uber Engineering at @UberEng and Uber Open Source at @UberOpenSouce on Twitter for updates from our teams. Uber leverages…

Querying Nested Data with Lambda Functions

By Wenlei Xie March 2, 2020September 21, 2023

Denormalized data with nested values (e.g. array/map) have become omnipresent in this Big Data era, as a lot of data naturally conforms to a nested representation [1, 2]. As a result it is important to provide an efficient and convenient way to query nested data. SQL traditionally does not include support for this. The pioneering…

Announcing PrestoCon 2020: Advancing the Big Data Ecosystem with Presto

By Nezih Yigitbasi February 13, 2020September 21, 2023

On March 24, 2020 in San Mateo, the Presto Foundation, in partnership with The Linux Foundation, will be hosting the organization’s first-ever PrestoCon. The event, one of the first Presto-focused full-day conferences ever held, will feature speakers from Uber, Facebook, and Twitter, as well as tech talks from other major Presto contributors and enthusiasts. Presto,…

Improving the Presto planner for better push down and data federation

By Yi He, James Sun, Maria Basmanova, Rongrong Zhong, Jiexi Lin, Saksham Sachdev & Akshay Pall December 23, 2019September 21, 2023

Presto defines a connector API that allows Presto to query any data source that has a connector implementation. The existing connector API provides basic predicate pushdown functionality allowing connectors to perform filtering at the underlying data source. However, there are certain limitations with the existing predicate pushdown functionality that limits what connectors can do. The…

5 design choices—and 1 weird trick — to get 2x efficiency gains in Presto repartitioning

By Ying Su, Orri Erling, Tim Meehan, Sahar Massachi, Bhavani Hari & Maria Basmanova December 20, 2019September 21, 2023

We like Presto. We like it a lot — so much we want to make it better in every way. Here’s an example: we just optimized the PartitionedOutputOperator. It’s now 2-3x more CPU efficient, which, when measured against Facebook’s production workload, translates to 6% gains overall. That’s huge. The optimized repartitioning is in use on…

Join Us! Growing the Presto Foundation in 2020 and Beyond

By Brian Hsieh December 16, 2019September 21, 2023

The Presto Foundation (PF) was established in September 2019 as an openly governed and vendor-neutral body dedicated to scaling and diversifying the Presto community. Hosted by the Linux Foundation, PF and its Governing Board are in a unique position to make Presto the fastest and the most reliable SQL engine for massively distributed data processing….

Table Scan: Doing The Right Thing With Structured Types

By Orri Erling September 26, 2019September 21, 2023

In the previous article we saw what gains are possible when filtering early and in the right order. In this article we look at how we do this with nested and structured types. We use the 100G TPC-H dataset, but now we group top level columns into structs or maps. Maps, lists and structs are…

Presto now hosted under the Linux Foundation

By Ariel Weisberg September 23, 2019September 21, 2023

We are excited to announce today, in partnership with Alibaba, Facebook, Twitter, and Uber, the launch of the Presto Foundation, a non-profit organization under the umbrella of the Linux Foundation. Hosting by the Linux Foundation opens up the Presto community to a broader ecosystem of users and contributors. The Presto Foundation’s open and neutral governance…