String Functions and Operators#
String Operators#
The || operator performs concatenation.
String Functions#
Note
These functions assume that the input strings contain valid UTF-8 encoded
Unicode code points. There are no explicit checks for valid UTF-8 and
the functions may return incorrect results on invalid UTF-8.
Invalid UTF-8 data can be corrected with from_utf8().
Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.
The lower() and upper() functions do not perform
locale-sensitive, context-sensitive, or one-to-many mappings required for
some languages. Specifically, this will return incorrect results for
Lithuanian, Turkish and Azeri.
- chr(n) varchar#
Returns the Unicode code point
nas a single character string.
- codepoint(string) integer#
Returns the Unicode code point of the only character of
string.
- concat(string1, ..., stringN) varchar#
Returns the concatenation of
string1,string2,...,stringN. This function provides the same functionality as the SQL-standard concatenation operator (||).
- hamming_distance(string1, string2) bigint#
Returns the Hamming distance of
string1andstring2, i.e. the number of positions at which the corresponding characters are different. Note that the two strings must have the same length.
- length(string) bigint#
Returns the length of
stringin characters.
- levenshtein_distance(string1, string2) bigint#
Returns the Levenshtein edit distance of
string1andstring2, i.e. the minimum number of single-character edits (insertions, deletions or substitutions) needed to changestring1intostring2.
- lower(string) varchar#
Converts
stringto lowercase.
- lpad(string, size, padstring) varchar#
Left pads
stringtosizecharacters withpadstring. Ifsizeis less than the length ofstring, the result is truncated tosizecharacters.sizemust not be negative andpadstringmust be non-empty.
- ltrim(string) varchar#
Removes leading whitespace from
string.
- replace(string, search) varchar#
Removes all instances of
searchfromstring.
- replace(string, search, replace) varchar#
Replaces all instances of
searchwithreplaceinstring.If
searchis an empty string, insertsreplacein front of every character and at the end of thestring.
- reverse(string) varchar#
Returns
stringwith the characters in reverse order.
- rpad(string, size, padstring) varchar#
Right pads
stringtosizecharacters withpadstring. Ifsizeis less than the length ofstring, the result is truncated tosizecharacters.sizemust not be negative andpadstringmust be non-empty.
- rtrim(string) varchar#
Removes trailing whitespace from
string.
- split(string, delimiter)#
Splits
stringondelimiterand returns an array.
- split(string, delimiter, limit)#
Splits
stringondelimiterand returns an array of size at mostlimit. The last element in the array always contain everything left in thestring.limitmust be a positive number.
- split_part(string, delimiter, index) varchar#
Splits
stringondelimiterand returns the fieldindex. Field indexes start with1. If the index is larger than than the number of fields, then null is returned.
- split_to_map(string, entryDelimiter, keyValueDelimiter) map<varchar, varchar>#
Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value. Note thatentryDelimiterandkeyValueDelimiterare interpreted literally, i.e., as full string matches.
- split_to_map(string, entryDelimiter, keyValueDelimiter, function(K, V1, V2, R)) map<varchar, varchar>#
Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value. Note thatentryDelimiterandkeyValueDelimiterare interpreted literally, i.e., as full string matches.function(K,V1,V2,R)is invoked in cases of duplicate keys to resolve the value that should be in the map.SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> v1)); – {“a”: “1”, “b”: “2”} SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> CONCAT(v1, v2))); – {“a”: “13”, “b”: “2”}
- split_to_multimap(string, entryDelimiter, keyValueDelimiter)#
Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map containing an array of values for each unique key.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value. The values for each key will be in the same order as they appeared instring. Note thatentryDelimiterandkeyValueDelimiterare interpreted literally, i.e., as full string matches.
- strpos(string, substring) bigint#
Returns the starting position of the first instance of
substringinstring. Positions start with1. If not found,0is returned.
- strpos(string, substring, instance) bigint#
Returns the position of the N-th
instanceofsubstringinstring.instancemust be a positive number. Positions start with1. If not found,0is returned.
- strrpos(string, substring) bigint#
Returns the starting position of the last instance of
substringinstring. Positions start with1. If not found,0is returned.
- strrpos(string, substring, instance) bigint#
Returns the position of the N-th
instanceofsubstringinstringstarting from the end of the string.instancemust be a positive number. Positions start with1. If not found,0is returned.
- position(substring IN string) bigint#
Returns the starting position of the first instance of
substringinstring. Positions start with1. If not found,0is returned.
- substr(string, start) varchar#
Returns the rest of
stringfrom the starting positionstart. Positions start with1. A negative starting position is interpreted as being relative to the end of the string.
- substr(string, start, length) varchar#
Returns a substring from
stringof lengthlengthfrom the starting positionstart. Positions start with1. A negative starting position is interpreted as being relative to the end of the string.
- trim(string) varchar#
Removes leading and trailing whitespace from
string.
- upper(string) varchar#
Converts
stringto uppercase.
- word_stem(word) varchar#
Returns the stem of
wordin the English language.
- word_stem(word, lang) varchar#
Returns the stem of
wordin thelanglanguage.
Unicode Functions#
- normalize(string) varchar#
Transforms
stringwith NFC normalization form.
- normalize(string, form) varchar#
Transforms
stringwith the specified normalization form.formmust be be one of the following keywords:Form
Description
NFDCanonical Decomposition
NFCCanonical Decomposition, followed by Canonical Composition
NFKDCompatibility Decomposition
NFKCCompatibility Decomposition, followed by Canonical Composition
Note
This SQL-standard function has special syntax and requires specifying
formas a keyword, not as a string.
- to_utf8(string) varbinary#
Encodes
stringinto a UTF-8 varbinary representation.
- from_utf8(binary) varchar#
Decodes a UTF-8 encoded string from
binary. Invalid UTF-8 sequences are replaced with the Unicode replacement characterU+FFFD.
- from_utf8(binary, replace) varchar#
Decodes a UTF-8 encoded string from
binary. Invalid UTF-8 sequences are replaced with replace. The replacement string replace must either be a single character or empty (in which case invalid characters are removed).
- key_sampling_percent(varchar) double#
Generates a double value between 0.0 and 1.0 based on the hash of the given
varchar. This function is useful for deterministic sampling of data.