Presto vs Prestissimo – Known differences and workarounds

    TL;DR

    This blog outlines the known differences between Presto and Prestissimo where existing Presto queries require adjustment to work in Prestissimo. 

    Details

    Prestissimo is generally available to use and has feature parity (except for a few functions) with Presto Java. There are differences in libraries used in both stacks. Also we have ensured that bugs aren’t ported from the Java stack and have been fixed in the Java stack when applicable. As a result, in rare cases queries that run in Presto Java can fail or provide different results in Prestissimo. These queries should be changed so that they continue to work. Below we publish the known differences between two engines. 

    Array Functions

    Array sort with lambda comparator:

    Array sort with transform lambda works significantly better than comparator lambda. Writing a proper comparator is tricky and error prone.

    1. For lambda usage, ‘Case’ is not supported. Use ‘If’ Instead. The following is not supported:

    (x, y) ->
    CASE
    WHEN x.event_time < y.event_time THEN
      -1
    WHEN x.event_time > y.event_time THEN
      1
      ELSE 0
    END

    To work with Prestissimo, rewrite using transform lambda. For example: (x) -> x.event_time
    Alternately it can be written with if, like below:

    (x, y) -> IF (x.event_time < y.event_time, -1, IF (x.event_time > y.event_time, 1, 0))

    When If is used to rewrite, follow these rules for using a lambda in array sort:

    • The lambda should use if else (Case is not supported)
    • The lambda should return 1, 0, -1 (i.e cover all cases)
    • The lambda should use the same expression when doing the comparison, for example in the above case event_time is used for comparison throughout the lambda. If we rewrote the expression as following, where x and y have different fields, it will fail: (x, y) -> if (x.event_time < y.event_start_time, -1, if (x.event_time > y.event_start_time, 1, 0))
    • Note that any additional nesting apart from the two if’s shown above will fail

    The best option is to use transform lambda whenever possible.

    2. Array_sort can support any transformation lambda that returns a comparable type. For example the lambda below is not supported:

    "array_sort"("map_values"(m), (a, b) -> (
                    CASE WHEN (a[1] [2] > b[1] [2]) THEN 1 
                         WHEN (a[1] [2] < b[1] [2]) THEN -1 
                         WHEN (a[1] [2] = b[1] [2]) THEN 
                              IF((a[3] > b[3]), 1, -1) END)

    However this can be rewritten to the below format , which is supported:

    “array_sort”(“map_values”(m), (a) -> ROW(a[1][2], a[3]))

    JSON Functions

    • json_extract:
      • Use of functions in JSON path
        • Issue: Using functions inside a JSON path is not supported
        • Workaround: Rewrite paths to use equivalent and often faster UDFs (User-Defined Functions) outside the JSON path, improving job portability and efficiency.
        • Example:
        • Original:
    CAST(JSON_EXTRACT(config, '$.table_name_to_properties.keys()'
      ) AS ARRAY(ARRAY(VARCHAR)))

    Rewritten:

    map_keys(JSON_EXTRACT( config, '$.table_name_to_properties') )
    • This rewritten approach works in both Presto and Prestissimo. Aggregates might occasionally be necessary. Generally, functions should be extracted from the JSON path for better portability.
    • Use of expressions in JSON path
      • Issue: Using paths that have filter expressions aren’t supported 
      • Workaround: Do the filtering as a part of the SQL expression, query rather than in the JSON path – This will be more efficient and faster.
      • Example:
        • Original:
    JSON_EXTRACT(config, '$.store.book[?(@.price > 10)]')
    • Rewritten
    filter(
       CAST(json_extract(data, '$.store.book') AS ARRAY<JSON>), 
         x -> CAST(json_extract_scalar(x.value, '$.price') AS   DOUBLE) > 10)
       )
    • Erroring on Invalid JSON
      • Issue: Presto Java can successfully run json_extract on certain invalid json but Prestissimo will always fail
      • Workaround: Extracting data from invalid JSON is indeterminate, and thus relying on that behavior can have unintended consequences. Prestissimo takes the safe approach to always throw on invalid JSON. Wrap your calls in a try to ensure the query succeeds and validate that the results correspond to your expectations. 
    • Canonicalization

    Regex Functions

    Prestissimo uses RE2, a widely adopted modern regular expression parsing library. RE2 provides most of the functionality of PCRE using a C++ interface very close to PCRE’s, but it guarantees linear time execution and a fixed stack footprint. Presto Java uses JONI (a port of ONI, now archived/deprecated). While both frameworks support almost all regular expression syntaxes, RE2 differs from JONI/PCRE in certain cases. A full list of unsupported regular expressions can be found in this wiki. A very interesting read about why RE2 skips certain perl syntax is also available here. Fundamentals and detailed design of RE2 can be found in this three part amazing post (part1, part2 and part3).

    Regex Compilation Limit in Velox

    The number of regular expressions that can be compiled for a query is limited to 250. The hard cap is placed so that each individual query does not compile more than 250 regexes to keep the overall shared cluster environment healthy. If this limit is hit, try to optimize the query to use fewer compiled regular expressions. The reason for this limitation is that Regex compilation is CPU intensive and we have observed several cases where unbounded compilation has caused problems in the cluster historically.
    NOTE: This number is only for regex’s that are created dynamically , for example:

    code_location_path LIKE '%' || test_name || '%'

    In this example the regex can change based on the test_name column value and a large enough number of test_name values can quickly exceed the 250 limit. 

    Instead this can be simply rewritten as:
    strpos(code_location, test_name) > 0

    Note: Velox also does several optimizations to improve performance by skipping regex completely and using string comparisons when applicable.

    Unsupported cases

    • RE2 does not support
      • before text matching (?=re)
      • before text not matching (?!re)
      • after text matching (?<=re) 
      • after text not matching (?<!re)

    These are supported in JONI

    • RE2 wiki has a list of unsupported PCRE syntaxes, some of which are supported in JONI today.
    • Unsupported queries will fail in Prestissimo and must be re-written in a different way. However each rewrite may be different and the solution depends on the use case.

    Aggregate Functions:

    • Reduce Lambda Function:
      • Support is controlled by a session property (native_expression_max_array_size_in_reduce) as it is inefficient to support such cases for arbitrarily large arrays. Currently this is capped at 100K. Queries that fail due to this limit must be revised to meet this limit.

    Window Functions

    • Issue: Aggregate window functions do not support `IGNORE NULLS`.
    • Error Message: `!ignoreNulls Aggregate window functions do not support IGNORE NULLS.`
    • Workaround: When encountering this issue, remove the `IGNORE NULLS` clause. This clause is only defined for value functions and does not apply to aggregate window functions; In Presto the results obtained with and without the clause are similar, Prestissimo includes this clause whereas Presto just warns.

    Casting 

      Unicode Numerals

    • Casting of unicode strings to digits is not supported
      • Example
    CAST (‘Ⅶ’ as integer)  -- Unsupported

    URL Functions

    Presto and Prestissimo implement different URL functions specs which can lead to some URL function mismatches. Prestissimo implements RFC-3986 whereas Presto implements RFC-2396. This can lead to subtle differences as highlighted in this issue.

    Date Time Functions

    From_unixtime

    • Currently the maximum date range supported for from_unixtime is between (292 Million BCE, 292 Million CE). The exact values corresponding to this are [292,275,055-05-16 08:54:06.192 BC, +292,278,994-08-17 00:12:55.807 CE], and this corresponds to a unix time between [-9223372036854775, 9223372036854775].  Presto Java also supports the same range (with JODA library), however it silently truncates (hence queries used to succeed), whereas Prestissimo throws an error if the values exceed this range. 

    Geospatial Differences

    There are cosmetic representation changes as well as numerical precision differences. Some of these differences result in different output for spatial predicates such as ST_Intersects. Differences include:

    1. Equivalent but different representations for geometries. Polygons may have their rings rotated, EMPTY geometries may be of a different type, MULTI-types and GEOMETRYCOLLECTIONs may have their elements in a different order. In general, WKTs/WKBs may be different.
    2. Numerical precision: Differences in numerical techniques may result in different coordinate values, and also different results for predicates (ST_Relates and children, including ST_Contains, ST_Crosses, ST_Disjoint, ST_Equals, ST_Intersects, ST_Overlaps, ST_Relate, ST_Touches, ST_Within).
    3. ST_IsSimple, ST_IsValid, simplify_geometry and geometry_invalid_reason may give different results.

    Time and Time with Time Zone

    IANA Named Timezones Support Removed

    • The support for IANA named time zones (e.g., ‘Europe/London’, ‘UTC’, ‘America/New_York’, ‘Asia/Kolkata’) in TIME and TIME WITH TIME ZONE has been removed to align with the SQL standard. For more details, refer to this Presto issue.
    • Only fixed-offset time zones (e.g., +02:00) are now supported for these types.
    • Note: Named time zones may still work when the Presto Java coordinator handles the query, but this support will be removed in the future. Migrate to fixed-offset time zones as soon as possible.

    Example of Impacted Queries:

    -- These will fail in Prestissimo (C++ engine), but may still work in legacy Presto:
    
    cast('14:00:01 UTC' as TIME WITH TIME ZONE)           -- ❌ Error
    cast('14:00:01 Europe/Paris' as TIME WITH TIME ZONE)   -- ❌ Error
    cast('14:00:01 America/New_York' as TIME WITH TIME ZONE) -- ❌ Error
    cast('14:00:01 Asia/Kolkata' as TIME WITH TIME ZONE)   -- ❌ Error
    
    -- These will work (fixed offset):
    cast('14:00:01 +00:00' as TIME WITH TIME ZONE)         -- ✅ OK
    cast('14:00:01 +05:30' as TIME WITH TIME ZONE)         -- ✅ OK

    Casting from TIMESTAMP to TIME

    • One key difference is when TIME and TIME WITH TIME ZONE are produced when casting from TIMESTAMP (with or without TIME ZONE). (Note that TIMESTAMP behavior in Presto/Prestissimo is unchanged).
    • Previously, the result of CAST(TIMESTAMP AS TIME/TIME WITH TIME ZONE) would change based on the session property legacy_timestamp (true by default), when applied to the user’s time zone. In Prestissimo for TIME/TIME WITH TIMEZONE the behavior will be equivalent to the property being false.

    Example of Impacted Queries:

    Legacy/Default Presto Behavior

    -- Default behavior with legacy_timestamp=true:
    -- Session Timezone - America/Los_Angeles
    
    -- DST Active Dates 
    select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME);
    -- Returns: 09:15:00.000
    select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME WITH TIME ZONE);
    -- Returns: 09:15:00.000 America/Los_Angeles
    select cast(TIMESTAMP '2023-08-05 10:15:00.000 America/Los_Angeles' as TIME);
    -- Returns: 09:15:00.000
    select cast(TIMESTAMP '2023-08-05 10:15:00.000 America/Los_Angeles' as TIME WITH TIME ZONE);
    -- Returns: 09:15:00.000
    
    -- DST Inactive Dates
    select cast(TIMESTAMP '2023-12-05 10:15:00.000' as TIME);
    -- Returns: 10:15:00.000
    select cast(TIMESTAMP '2023-12-05 10:15:00.000' as TIME WITH TIME ZONE);
    -- Returns: 10:15:00.000 America/Los_Angeles
    select cast(TIMESTAMP '2023-08-05 10:15:00.000 America/Los_Angeles' as TIME);
    -- Returns: 10:15:00.000
    select cast(TIMESTAMP '2023-12-05 10:15:00.000 America/Los_Angeles' as TIME WITH TIME ZONE);
    -- 10:15:00.000 America/Los_Angeles

    New Behavior With Prestissimo (Velox)

    -- New Expected behavior similar to what currently exists if legacy_timestamp=false:
    -- Session Timezone - America/Los_Angeles
    
    
    -- DST Active Dates 
    select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME);
    -- Returns: 10:15:00.000
    select cast(TIMESTAMP '2023-08-05 10:15:00.000' as TIME WITH TIME ZONE);
    -- Returns: 10:15:00.000 -07:00
    select cast(TIMESTAMP '2023-08-05 10:15:00.000 America/Los_Angeles' as TIME);
    -- Returns: 10:15:00.000
    select cast(TIMESTAMP '2023-08-05 10:15:00.000 America/Los_Angeles' as TIME WITH TIME ZONE);
    -- Returns: 10:15:00.000 -07:00
    
    -- DST Inactive Dates
    select cast(TIMESTAMP '2023-12-05 10:15:00.000' as TIME);
    -- Returns: 10:15:00.000
    select cast(TIMESTAMP '2023-12-05 10:15:00.000' as TIME WITH TIME ZONE);
    -- Returns: 10:15:00.000 -08:00
    select cast(TIMESTAMP '2023-08-05 10:15:00.000 America/Los_Angeles' as TIME);
    -- Returns: 10:15:00.000
    select cast(TIMESTAMP '2023-12-05 10:15:00.000 America/Los_Angeles' as TIME WITH TIME ZONE);
    -- Returns: 10:15:00.000 -08:00
    • Note: TIMESTAMP will continue to support named time zones unlike TIME and TIME WITH TIME ZONE

    DST Implications

    • Since IANA zones are not supported for TIME (still supported for Timestamp!), Prestissimo does not manage DST transitions. All time interpretation is strictly in the provided offset, not local civil time.
    • For example, ’14:00:00 +02:00′ always means 14:00 at a +02:00 fixed offset, regardless of DST changes that might apply under an IANA zone.

    Recommendations

    • Don’t use IANA time zone names for TIME/TIME WITH TIME ZONE: Use fixed-offset time zones like +02:00 instead of Europe/Paris or similar for TIME/TIME WITH TIME ZONE.
    • Confirm your Prestissimo usage does not depend on legacy timestamp behavior: If your workload depends on having legacy TIME behavior (including support of IANA timezones), handle this outside Presto or reach out so that we can discuss alternative solutions.
    • Test: Try your most critical workflows with these settings.

    Follow Us