SerializedPage Wire Format#

Presto uses SerializedPage binary columnar format for exchanging data between stages.

The data can be compressed, encrypted and include a checksum. The layout is a header followed by a number of columns, followed by individual columns.

../_images/serialized-page-layout.png

Columns#

Each column starts with a header. The data follows.

Column Header#

The column header specifies the encoding of the column.

  • Length of the encoding name - 4 bytes

  • Name of the encoding

The supported encodings and mappings for Presto types are:

Encoding Name

Presto Types

BYTE_ARRAY

BOOLEAN, TINYINT, UNKNOWN

SHORT_ARRAY

SMALLINT

INT_ARRAY

INTEGER, REAL

LONG_ARRAY

BIGINT, DOUBLE, TIMESTAMP

INT128_ARRAY

Not used

VARIABLE_WIDTH

VARCHAR, VARBINARY

ARRAY

ARRAY

MAP

MAP

MAP_ELEMENT

n/a

ROW

ROW

DICTIONARY

n/a

RLE

n/a

See presto-common/src/main/java/com/facebook/presto/common/block/BlockEncodingManager.java

For example, the header for an INTEGER column describes INT_ARRAY encoding. The length of the encoding name is 9, hence, the first 4 bytes will be 0 0 0 9. The following 9 bytes store the INT_ARRAY string.

../_images/serialized-page-int-array.png

Null Flags#

All columns include the 1-byte has-nulls flag. 0 means no nulls. 1 means may have nulls. If has-nulls byte is 1, individual null flags are specified in the following bytes using 1 bit per flag. 0 means the value is not null. 1 means the value is null.

  • Has-nulls flag - 1 byte

  • [optional] Null flags - number of rows / 8 bytes; 1 bit per flag; the bits are stored in bytes in reverse order; the first flag in each byte is the high bit.

Let’s say we have 10 rows with nulls in zero-based rows 1, 4, 6, 7, 9. The null flags will be represented with 3 bytes. First byte stores the has-null flag: 1. Second byte stores null flags for the first 8 rows. Third byte stores null flags for the last 2 rows.

../_images/serialized-page-nulls.png

XXX_ARRAY Encodings#

BYTE_ARRAY, INT_ARRAY, SHORT_ARRAY, LONG_ARRAY and INT128_ARRAY encodings differ only by the number of bytes used per value.

The data layout is:

  • Number of rows - 4 bytes

  • Null flags

  • Values - (number of rows - number of nulls) * <number of bytes per value> bytes; only rows with non-null values are represented

The number of bytes per value is:

Encoding Name

Number of bytes per value

BYTE_ARRAY

1

SHORT_ARRAY

2

INT_ARRAY

4

LONG_ARRAY

8

INT128_ARRAY

16

Let’s take the example from the Null Flags section and say that we have an integer column with 10 rows with nulls in zero-based rows 1, 4, 6, 7, 9. We’ll have 4 bytes storing the number of rows: 10, followed by 3 bytes of null flags, followed by 20 bytes representing 5 non-null integer values for rows 0, 2, 3, 5, 8..

../_images/serialized-page-int-column.png

VARIABLE_WIDTH Encoding#

  • Number of rows - 4 bytes

  • Offsets - number of rows * 4 bytes; 4 bytes per offset

  • Null flags

  • Total number of bytes in all values - 4 bytes

  • Concatenated values

Let’s again take the example from the Null Flags section and say that we have a string column with 10 rows with nulls in zero-based rows 1, 4, 6, 7, 9. The non-null rows will have values: 0 - Denali, 2 - Reinier, 3 - Whitney, 5 - Bona, 8 - Bear. We’ll have 4 bytes storing the number of rows: 10, followed by 40 bytes of offsets, followed by 3 bytes of null flags, followed by 1 bytes storing total size of all strings: 28, followed by the concatenated string values. Notice that we have offsets for all rows, not just the non-null rows.

../_images/serialized-page-string-column.png

ARRAY Encoding#

  • Elements column

  • Number of rows - 4 bytes

  • Offsets - (number of rows + 1) * 4 bytes; 4 bytes per offset

  • Null flags

An array column with 10 rows is represented like this:

../_images/serialized-page-array-column.png

MAP Encoding#

  • Keys column

  • Values column

  • Hash table size (number of 4-byte chunks in the Hash table) - 4 bytes

  • [optional] Hash table: <Hash table size> * <4 bytes>

  • Number of rows - 4 bytes

  • Offsets - (number of rows + 1) * 4 bytes; 4 bytes per offset

  • Null flags

A map column with 10 rows is represented like this:

../_images/serialized-page-map-column.png

ROW Encoding#

  • Number of fields - 4 bytes

  • One column per field

  • Number of rows - 4 bytes

  • Offsets - (number of rows + 1) * 4 bytes; 4 bytes per offset

  • Null flags

Nested columns are serialized only for non-null rows. In the presence of null rows, the row numbers of the nested columns don’t match the top-level row numbers. Offsets are specifying the row numbers of the nested columns.

Let’s again take the example from the Null Flags section and say that we have a column of type ROW(a, b, c, d) with 10 rows with nulls in zero-based rows 1, 4, 6, 7, 9. Nested columns will have only 5 rows and offsets will be: 0, 0, 1, 2, 0, 3, 0, 0, 4, 0. Offsets for null rows are zero.

Note: Offsets are redundant information as they can be reconstructed from the null flags.

../_images/serialized-page-row-column.png

DICTIONARY Encoding#

  • Number of rows - 4 bytes

  • Dictionary values column. This column itself is a serialized block whose encoding can be any of the encodings mentioned in this document.

  • Indices - number of rows * 4 bytes; 4 bytes per index

  • Dictionary ID - 24 bytes

RLE Encoding#

  • Number of rows - 4 bytes

  • Single-row constant value column

Additional Usage#

SerializedPage format is also used to specify constant values in the plan fragment sent from the coordinator to the workers. In this case, the binary representation is converted to an ASCII string using base64 encoding.

For example, the plan for the SELECT array[1, 23, 456] query includes a Project node with array[1, 23, 456] value represented as base64 encoded binary of the SerializedPage format.

- Project[projectLocality = LOCAL] => [expr:array(integer)]
        Estimates: {rows: 1 (51B), cpu: 51.00, memory: 0.00, network: 0.00}
        expr := [Block: position count: 3; size: 92 bytes]

Also, this format is used to store intermediate data when session property exchange_materialization_strategy is ALL and temporary_table_storage_format is PAGEFILE.