Release 0.233#
Warning
There is a bug in this release that will cause queries with the predicate IS NULL
on
bucketed columns to produce incorrect results.
General Changes#
Fix an optimizer failure introduced in
0.229
, where aLIKE
pattern is deduced into a constant, e.g.,col LIKE 'a' and col = 'b'
.Fix correctness issue in queries with joins over
UNNEST
. (#14257).Fix
ArbitraryOutputBuffer
to avoid skewing output data distribution. (#14083).Fix an issue where
classification_fall_out()
cannot be found.Add support for async page transport with non-blocking IO. This can be enabled by the
exchange.async-page-transport-enabled
configuration property.Add support to handle http request timeouts using multiple thread pools. This can be controlled by the
task.http-timeout-concurrency
configuration property.Add support for soft affinity scheduling, which makes the best effort to fetch the same piece of data from the same worker, while allowing fallback to random workers if the preferred workers are too busy to handle additional splits. (See Connectors).
Add hash functions
fnv1_32
,fnv1_64
,fnv1a_32
, andfnv1a_64
.Add IP address functions
ip_subnet_min()
,ip_subnet_max()
,ip_subnet_range()
, andis_subnet_of()
.Improve performance of
StreamingAggregationOperator
.
Hive Changes#
Fix an issue where Presto fail to start when configuration property
hive.s3-file-system-type
is set toHADOOP_DEFAULT
.Add directory listing cache for Hive Connector. This can be enabled by setting configuration property
hive.file-status-cache-tables
.Add support for storing column names and types for views in the Hive metastore. Views in the Hive connector can now only use types supported by Hive.
Add configuration property
hive.insert-overwrite-immutable-partitions-enabled
to allow admin to set insert overwrite as the default insertion behavior for Hive connector.Add configuration property
hive.node-selection-strategy
to chooseNodeSelectionStrategy
. WhenSOFT_AFFINITY
is selected, scheduler will make the best effort to request the same worker to fetch the same file.Remove configuration property
hive.force-local-scheduling
. The same functionality can be achieved by settinghive.node-selection-strategy
toHARD_AFFINITY
.
Verifier Changes#
Fix an issue where invalid checksum queries can be generated for certain queries containing columns of
RowType
.Fix an issue where checksum query would fail for queries containing map columns whose key or value types are arrays or rows.
Fix incorrect decision for determinism analysis of queries with top-level
LIMIT
clause. (#14176).Add checks for keys, values, and cardinality sum when validating a map column.
Add support to disable individual failure resolvers (#14148).
Add support to auto-resolve control checksum query failures with
COMPILER_ERROR
, instead of skipping the verification.Add support for specifying non-deterministic catalogs by the
determinism.non-determinism-catalogs
configuration property. Queries explicitly referencing tables from those catalogs are treated as non-deterministic.Improve query performance during determinism analysis of queries with top-level
LIMIT
clause.Improve correctness check for floating point columns whose mean values of either the control query or the test query is closed to 0.
Druid Changes#
Add Druid Connector.
Geospatial Changes#
Improve
ST_Points()
to add support for major well-known spatial objects.ST_Points()
now supportsPOINT
,LINESTRING
,POLYGON
,MULTIPOINT
,MULTILINESTRING
,MULTIPOLYGON
andGEOMETRYCOLLECTION
.Improve
ST_IsValid()
andST_IsSimple()
to adhere to the ISO/OGC standards more closely. The two functions used to return the same result but may now be different. Users should check both functions to be sure their geometries are well-behaved.geometry_invalid_reason()
will return different but semantically similar strings.Improve performance of
ST_Intersection()
by simply returning the geometry if it has an enclosing envelope. This can reduce CPU cost by up to10^5x
for complex polygons.
SPI Changes#
Add parameter
NodeSelectionStrategy nodeSelectionStrategy
in methodConnectorBucketNodeMap#createBucketNodeMap
to indicate which affinity strategy to use when creating a bucket node map.Add parameter
List<Node> sortedNodes
in methodConnectorNodePartitioningProvider#getBucketNodeMap
to provide a sorted list of nodes from which a connector can choose to perform affinity scheduling.Add enum
NodeSelectionStrategy
.NO_PREFERENCE
indicates data is remotely accessible from workers,HARD_AFFINITY
to indicate data and workers are collocated, andSOFT_AFFINITY
to indicate data is remotely accessible but scheduler will make the best effort to fetch the same piece of data from the same worker.Replace
ConnectorSplit#isRemoteAccessible
withgetNodeSelectionStrategy
.Replace
ConnectorSplit#getAddresses
withgetPreferredNodes
, to provide hints to the scheduler where to schedule splits.Replace the
SchemaTableName
parameter inConnectorMetadata#createView
with aConnectorTableMetadata
.Move
JsonType
to SPI.