String Functions and Operators

    Note

    These functions assume that the input strings contain valid UTF-8 encoded Unicode code points. There are no explicit checks for valid UTF-8 and the functions may return incorrect results on invalid UTF-8. Invalid UTF-8 data can be corrected with from_utf8().

    Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.

    The and upper() functions do not perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish and Azeri.

    chr(n) → varchar

    Returns the Unicode code point n as a single character string.

    codepoint(string) → integer

    Returns the Unicode code point of the only character of string.

    concat(string1, , stringN) → varchar

    Returns the concatenation of string1, string2, ..., stringN. This function provides the same functionality as the SQL-standard concatenation operator (||).

    hamming_distance(string1, string2) → bigint

    Returns the Hamming distance of string1 and string2, i.e. the number of positions at which the corresponding characters are different. Note that the two strings must have the same length.

    length(string) → bigint

    Returns the length of string in characters.

    levenshtein_distance(string1, string2) → bigint

    Returns the Levenshtein edit distance of string1 and string2, i.e. the minimum number of single-character edits (insertions, deletions or substitutions) needed to change string1 into string2.

    lower(string) → varchar

    Converts string to lowercase.

    lpad(string, size, padstring) → varchar

    Left pads string to size characters with padstring. If size is less than the length of string, the result is truncated to size characters. size must not be negative and padstring must be non-empty.

    ltrim(string) → varchar

    Removes leading whitespace from string.

    replace(string, search) → varchar

    Removes all instances of search from string.

    replace(string, search, replace) → varchar

    Replaces all instances of search with replace in string.

    If search is an empty string, inserts replace in front of every character and at the end of the string.

    reverse(string) → varchar

    Returns string with the characters in reverse order.

    rpad(string, size, padstring) → varchar

    (string) → varchar

    Removes trailing whitespace from string.

    split(string, delimiter)

    Splits string on delimiter and returns an array.

    split(string, delimiter, limit)

    Splits string on delimiter and returns an array of size at most limit. The last element in the array always contain everything left in the string. limit must be a positive number.

    split_part(string, delimiter, index) → varchar

    Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

    split_to_map(string, entryDelimiter, keyValueDelimiter) → map<varchar, varchar>

    Splits string by entryDelimiter and keyValueDelimiter and returns a map. entryDelimiter splits string into key-value pairs. keyValueDelimiter splits each pair into key and value. Note that entryDelimiter and keyValueDelimiter are interpreted literally, i.e., as full string matches.

    split_to_map(string, entryDelimiter, keyValueDelimiter, function(K, V1, V2, R)) → map<varchar, varchar>

    Splits string by entryDelimiter and keyValueDelimiter and returns a map. entryDelimiter splits string into key-value pairs. keyValueDelimiter splits each pair into key and value. Note that entryDelimiter and keyValueDelimiter are interpreted literally, i.e., as full string matches. function(K,V1,V2,R) is invoked in cases of duplicate keys to resolve the value that should be in the map.

    split_to_multimap(string, entryDelimiter, keyValueDelimiter)

    Splits string by entryDelimiter and keyValueDelimiter and returns a map containing an array of values for each unique key. entryDelimiter splits string into key-value pairs. keyValueDelimiter splits each pair into key and value. The values for each key will be in the same order as they appeared in string. Note that entryDelimiter and keyValueDelimiter are interpreted literally, i.e., as full string matches.

    strpos(string, substring) → bigint

    Returns the starting position of the first instance of substring in string. Positions start with 1. If not found, 0 is returned.

    strpos(string, substring, instance) → bigint

    Returns the position of the N-th instance of substring in string. instance must be a positive number. Positions start with . If not found, 0 is returned.

    strrpos(string, substring) → bigint

    Returns the starting position of the last instance of substring in string. Positions start with 1. If not found, 0 is returned.

    strrpos(string, substring, instance) → bigint

    Returns the position of the N-th instance of substring in string starting from the end of the string. instance must be a positive number. Positions start with 1. If not found, 0 is returned.

    position(substring IN string) → bigint

    Returns the starting position of the first instance of substring in string. Positions start with 1. If not found, 0 is returned.

    substr(string, start) → varchar

    Returns the rest of string from the starting position start. Positions start with 1. A negative starting position is interpreted as being relative to the end of the string.

    substr(string, start, length) → varchar

    Returns a substring from string of length length from the starting position start. Positions start with 1. A negative starting position is interpreted as being relative to the end of the string.

    trim(string) → varchar

    upper(string) → varchar

    Converts string to uppercase.

    word_stem(word) → varchar

    Returns the stem of word in the English language.

    word_stem(word, lang) → varchar

    Returns the stem of word in the lang language.

    normalize(string) → varchar

    Transforms string with NFC normalization form.

    normalize(string, form) → varchar

    Transforms string with the specified normalization form. form must be be one of the following keywords:

    Note

    This SQL-standard function has special syntax and requires specifying form as a keyword, not as a string.

    to_utf8(string) → varbinary

    Encodes string into a UTF-8 varbinary representation.

    from_utf8(binary) → varchar

    Decodes a UTF-8 encoded string from binary. Invalid UTF-8 sequences are replaced with the Unicode replacement character U+FFFD.

    from_utf8(binary, replace) → varchar

    Decodes a UTF-8 encoded string from binary. Invalid UTF-8 sequences are replaced with replace. The replacement string replace must either be a single character or empty (in which case invalid characters are removed).

    key_sampling_percent(varchar) → double

    Generates a double value between 0.0 and 1.0 based on the hash of the given . This function is useful for deterministic sampling of data.