# Data Types#

Presto has a set of built-in data types, described below. Additional types can be provided by plugins.

Note

Connectors are not required to support all types. See connector documentation for details on supported types.

## Boolean#

### `BOOLEAN`#

This type captures boolean values `true` and `false`.

## Integer#

### `TINYINT`#

A 8-bit signed two’s complement integer with a minimum value of `-2^7` and a maximum value of `2^7 - 1`.

### `SMALLINT`#

A 16-bit signed two’s complement integer with a minimum value of `-2^15` and a maximum value of `2^15 - 1`.

### `INTEGER`#

A 32-bit signed two’s complement integer with a minimum value of `-2^31` and a maximum value of `2^31 - 1`. The name `INT` is also available for this type.

### `BIGINT`#

A 64-bit signed two’s complement integer with a minimum value of `-2^63` and a maximum value of `2^63 - 1`.

## Floating-Point#

### `REAL`#

A real is a 32-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.

### `DOUBLE`#

A double is a 64-bit inexact, variable-precision implementing the IEEE Standard 754 for Binary Floating-Point Arithmetic.

## Fixed-Precision#

### `DECIMAL`#

A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.

The decimal type takes two literal parameters:

• precision - total number of digits

• scale - number of digits in fractional part. Scale is optional and defaults to 0.

Example type definitions: `DECIMAL(10,3)`, `DECIMAL(20)`

Example literals: `DECIMAL '10.3'`, `DECIMAL '1234567890'`, `1.1`

Note

For compatibility reasons decimal literals without explicit type specifier (e.g. `1.2`) are treated as values of the `DOUBLE` type by default up to version 0.198. After 0.198 they are parsed as DECIMAL.

• System wide property: `parse-decimal-literals-as-double`

• Session wide property: `parse_decimal_literals_as_double`

## String#

### `VARCHAR`#

Variable length character data with an optional maximum length.

Example type definitions: `varchar`, `varchar(20)`.

SQL supports simple and Unicode string literals:
• Literal string : `'Hello winter !'`

• Unicode string with default escape character: `U&'Hello winter \2603 !'`

• Unicode string with custom escape character: `U&'Hello winter #2603 !' UESCAPE '#'`

A Unicode string is prefixed with `U&` and requires an escape character before any Unicode character usage with 4 digits. In these examples `\2603` and `#2603` represent a snowman character. Long Unicode codes with 6 digits require a plus symbol `+` before the code. For example, use `\+01F600` for a grinning face emoji.

Single quotes in string literals can be escaped by using another single quote: `'It''s a beautiful day!'`

### `CHAR`#

Fixed length character data. A CHAR type without length specified has a default length of 1. A CHAR(x) value always has x characters. For example, casting dog to CHAR(7) adds 4 implicit trailing spaces. Leading and trailing spaces are included in comparisons of CHAR values. As a result, two character values with different lengths (CHAR(x) and CHAR(y) where x != y) are never equal, but comparison of such values implicitly converts the types to the same length and pads with spaces so that the following query returns true:

````sql SELECT cast('example' AS char(20)) = cast('example    ' AS char(25)); ````

As with VARCHAR, a single quote in a CHAR literal can be escaped with another single quote:

````sql SELECT CHAR 'All right, Mr. DeMille, I''m ready for my close-up.' ````

### `VARBINARY`#

Variable length binary data.

Note

Binary strings with length are not yet supported: `varbinary(n)`

### `JSON`#

JSON value type, which can be a JSON object, a JSON array, a JSON number, a JSON string, `true`, `false` or `null`.

## Date and Time#

### `DATE`#

Calendar date (year, month, day).

Example: `DATE '2001-08-22'`

### `TIME`#

Time of day (hour, minute, second, millisecond) without a time zone. Values of this type are parsed and rendered in the session time zone.

Example: `TIME '01:02:03.456'`

### `TIME WITH TIME ZONE`#

Time of day (hour, minute, second, millisecond) with a time zone. Values of this type are rendered using the time zone from the value.

Example: `TIME '01:02:03.456 America/Los_Angeles'`

### `TIMESTAMP`#

Instant in time that includes the date and time of day without a time zone. Values of this type are parsed and rendered in the session time zone.

Example: `TIMESTAMP '2001-08-22 03:04:05.321'`

### `TIMESTAMP WITH TIME ZONE`#

Instant in time that includes the date and time of day with a time zone. Values of this type are rendered using the time zone from the value.

Example: `TIMESTAMP '2001-08-22 03:04:05.321 America/Los_Angeles'`

### `INTERVAL YEAR TO MONTH`#

Span of years and months.

Example: `INTERVAL '3' MONTH`

### `INTERVAL DAY TO SECOND`#

Span of days, hours, minutes, seconds and milliseconds.

Example: `INTERVAL '2' DAY`

## Structural#

### `ARRAY`#

An array of the given component type.

Example: `ARRAY[1, 2, 3]`

### `MAP`#

A map between the given component types.

Example: `MAP(ARRAY['foo', 'bar'], ARRAY[1, 2])`

### `ROW`#

A structure made up of named fields. The fields may be of any SQL type, and are accessed with field reference operator `.`

Example: `CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE))`

### `IPADDRESS`#

An IP address that can represent either an IPv4 or IPv6 address.

Internally, the type is a pure IPv6 address. Support for IPv4 is handled using the IPv4-mapped IPv6 address range (RFC 4291#section-2.5.5.2). When creating an `IPADDRESS`, IPv4 addresses will be mapped into that range.

When formatting an `IPADDRESS`, any address within the mapped range will be formatted as an IPv4 address. Other addresses will be formatted as IPv6 using the canonical format defined in RFC 5952.

Examples: `IPADDRESS '10.0.0.1'`, `IPADDRESS '2001:db8::1'`

### `IPPREFIX`#

An IP routing prefix that can represent either an IPv4 or IPv6 address.

Internally, an address is a pure IPv6 address. Support for IPv4 is handled using the IPv4-mapped IPv6 address range (RFC 4291#section-2.5.5.2). When creating an `IPPREFIX`, IPv4 addresses will be mapped into that range. Additionally, addresses will be reduced to the first address of a network.

`IPPREFIX` values will be formatted in CIDR notation, written as an IP address, a slash (‘/’) character, and the bit-length of the prefix. Any address within the IPv4-mapped IPv6 address range will be formatted as an IPv4 address. Other addresses will be formatted as IPv6 using the canonical format defined in RFC 5952.

Examples: `IPPREFIX '10.0.1.0/24'`, `IPPREFIX '2001:db8::/48'`

## UUID#

### `UUID`#

This type represents a UUID (Universally Unique IDentifier), also known as a GUID (Globally Unique IDentifier), using the format defined in RFC 4122.

Example: `UUID '12151fd2-7586-11e9-8f9e-2a86e4085a59'`

## HyperLogLog#

Calculating the approximate distinct count can be done much more cheaply than an exact count using the HyperLogLog data sketch. See HyperLogLog Functions.

### `HyperLogLog`#

A HyperLogLog sketch allows efficient computation of `approx_distinct()`. It starts as a sparse representation, switching to a dense representation when it becomes more efficient.

### `P4HyperLogLog`#

A P4HyperLogLog sketch is similar to HyperLogLog, but it starts (and remains) in the dense representation.

## KHyperLogLog#

### `KHyperLogLog`#

A KHyperLogLog is a data sketch that can be used to compactly represents the association of two columns. See KHyperLogLog Functions.

## SetDigest#

### `SetDigest`#

A SetDigest (setdigest) is a data sketch structure used in calculating Jaccard similarity coefficient between two sets.

SetDigest encapsulates the following components:

The HyperLogLog structure is used for the approximation of the distinct elements in the original set.

The MinHash structure is used to store a low memory footprint signature of the original set. The similarity of any two sets is estimated by comparing their signatures.

SetDigests are additive, meaning they can be merged together.

## Quantile Digest#

### `QDigest`#

A quantile digest (qdigest) is a summary structure which captures the approximate distribution of data for a given input set, and can be queried to retrieve approximate quantile values from the distribution. The level of accuracy for a qdigest is tunable, allowing for more precise results at the expense of space.

A qdigest can be used to give approximate answer to queries asking for what value belongs at a certain quantile. A useful property of qdigests is that they are additive, meaning they can be merged together without losing precision.

A qdigest may be helpful whenever the partial results of `approx_percentile` can be reused. For example, one may be interested in a daily reading of the 99th percentile values that are read over the course of a week. Instead of calculating the past week of data with `approx_percentile`, `qdigest`s could be stored daily, and quickly merged to retrieve the 99th percentile value.

## T-Digest#

### `TDigest`#

A t-digest is similar to qdigest, but it uses a different algorithm to represent the approximate distribution of a set of numbers. T-digest has better performance than quantile digests but only supports the `DOUBLE` type. See T-Digest Functions.