Time Travel for Iceberg Tables in Presto

Introduction Presto, an open-source distributed SQL query engine, excels at querying large data sets distributed across diverse data sources. It has maintained its position as a high-performance data analytics tool for over a decade. As data collection capabilities expand, businesses increasingly recognize the importance of historical data alongside current data. With the recent release of…

Presto Parquet Column Encryption

Introduction Apache Parquet modular encryption provides encryption at-rest and in-transit at finer-grained. In big data world, data analytic tables are usually very wide with hundreds of columns, while only a small number of columns need to be protected. So the finer-grained access control is a better fit than coarse-grained one like table level access control….

Faster Presto Queries with Parquet Page Index

Introduction Today’s data is growing very fast, which creates challenges for query engines like Presto. Presto is a popular interactive query engine, because of its scalability, high performance, and smooth integration with Hadoop. As the volume of data grows, Presto needs to read larger chunks of data and load them into memory, which causes higher…