SPLIT_BY_REGEXP/REGEXP_SPLIT_TO_ARRAY

Description

The SPLIT_BY_REGEXP function splits a string into an array of strings based on a specified regular expression pattern. Unlike SPLIT_BY_STRING, this function supports complex regular expression matching for more flexible splitting rules. It optionally supports a maximum split count limit, which is useful for structured text processing, data cleansing, and pattern matching.

Syntax

SPLIT_BY_REGEXP(<str>, <pattern> [, <max_limit>])

Parameters

Parameter	Description
`<str>`	The source string to be split. Type: VARCHAR
`<pattern>`	Regular expression pattern used as the delimiter. Type: VARCHAR
`<max_limit>`	Optional parameter, limits the maximum number of elements in the returned array. Type: INT

Return Value

Returns ARRAY type, representing an array of strings split by the regular expression.

Splitting rules:

Uses regular expression pattern to match split points
Supports standard regular expression syntax
Empty string pattern will split the string into individual characters
If pattern doesn't match anything, returns single-element array containing the original string
max_limit restricts the maximum length of the result array

Special cases:

If any parameter is NULL, returns NULL
If string is empty, returns single-element array containing empty string
If regular expression is empty string, splits by characters
If max_limit is 0 or negative, no limit is applied
Consecutive matches produce empty string elements

Examples

Empty pattern splits by characters

SELECT SPLIT_BY_REGEXP('abcde', '');

+------------------------------+
| SPLIT_BY_REGEXP('abcde', '') |
+------------------------------+
| ["a", "b", "c", "d", "e"]    |
+------------------------------+

Digit pattern splitting

SELECT SPLIT_BY_REGEXP('a12bc23de345f', '\\d+');

+-----------------------------------------+
| SPLIT_BY_REGEXP('a12bc23de345f', '\d+') |
+-----------------------------------------+
| ["a", "bc", "de", "f"]                  |
+-----------------------------------------+

NULL value handling

SELECT SPLIT_BY_REGEXP(NULL, '\\d+'), SPLIT_BY_REGEXP('test', NULL);

+--------------------------------+--------------------------------+
| SPLIT_BY_REGEXP(NULL, '\d+')   | SPLIT_BY_REGEXP('test', NULL)  |
+--------------------------------+--------------------------------+
| NULL                           | NULL                           |
+--------------------------------+--------------------------------+

Empty string handling

SELECT SPLIT_BY_REGEXP('', ','), SPLIT_BY_REGEXP('hello', 'xyz');

+---------------------------+-------------------------------+
| SPLIT_BY_REGEXP('', ',')  | SPLIT_BY_REGEXP('hello', 'xyz') |
+---------------------------+-------------------------------+
| [""]                      | ["hello"]                     |
+---------------------------+-------------------------------+

Using maximum limit parameter

SELECT SPLIT_BY_REGEXP('a,b,c,d,e', ',', 3), SPLIT_BY_REGEXP('1-2-3-4-5', '-', 2);

+--------------------------------------+--------------------------------------+
| SPLIT_BY_REGEXP('a,b,c,d,e', ',', 3) | SPLIT_BY_REGEXP('1-2-3-4-5', '-', 2) |
+--------------------------------------+--------------------------------------+
| ["a", "b", "c,d,e"]                  | ["1", "2-3-4-5"]                    |
+--------------------------------------+--------------------------------------+

Whitespace pattern

SELECT SPLIT_BY_REGEXP('hello world  test', '\\s+'), SPLIT_BY_REGEXP('a\tb\nc\rd', '\\s');

+------------------------------------------+------------------------------------+
| SPLIT_BY_REGEXP('hello world  test', '\s+') | SPLIT_BY_REGEXP('a\tb\nc\rd', '\s') |
+------------------------------------------+------------------------------------+
| ["hello", "world", "test"]               | ["a", "b", "c", "d"]               |
+------------------------------------------+------------------------------------+

Special characters and escaping

SELECT SPLIT_BY_REGEXP('a.b.c.d', '\\.'), SPLIT_BY_REGEXP('x(y)z[w]', '[\\(\\)\\[\\]]');

+----------------------------------+--------------------------------------------+
| SPLIT_BY_REGEXP('a.b.c.d', '\.') | SPLIT_BY_REGEXP('x(y)z[w]', '[\(\)\[\]]') |
+----------------------------------+--------------------------------------------+
| ["a", "b", "c", "d"]             | ["x", "y", "z", "w"]                   |
+----------------------------------+--------------------------------------------+

Word boundaries and complex patterns

SELECT SPLIT_BY_REGEXP('TheQuickBrownFox', '[A-Z]'), SPLIT_BY_REGEXP('user@example.com', '@|\\.');

+------------------------------------------+-------------------------------------------+
| SPLIT_BY_REGEXP('TheQuickBrownFox', '[A-Z]') | SPLIT_BY_REGEXP('user@example.com', '@|\.') |
+------------------------------------------+-------------------------------------------+
| ["", "he", "uick", "rown", "ox"]         | ["user", "example", "com"]                |
+------------------------------------------+-------------------------------------------+

UTF-8 multi-byte characters

SELECT SPLIT_BY_REGEXP('ṭṛì→ḍḍumai→hello', '→'), SPLIT_BY_REGEXP('αβγδε', '[βδ]');

+------------------------------------------+----------------------------------+
| SPLIT_BY_REGEXP('ṭṛì→ḍḍumai→hello', '→') | SPLIT_BY_REGEXP('αβγδε', '[βδ]') |
+------------------------------------------+----------------------------------+
| ["ṭṛì", "ḍḍumai", "hello"]               | ["α", "γ", "ε"]                  |
+------------------------------------------+----------------------------------+

Consecutive matches and empty elements

SELECT SPLIT_BY_REGEXP('a,,b,c', ','), SPLIT_BY_REGEXP('123abc456def', '[a-z]+');

+-------------------------------+---------------------------------------+
| SPLIT_BY_REGEXP('a,,b,c', ',') | SPLIT_BY_REGEXP('123abc456def', '[a-z]+') |
+-------------------------------+---------------------------------------+
| ["a", "", "b", "c"]           | ["123", "456", ""]                    |
+-------------------------------+---------------------------------------+

Keywords

SPLIT_BY_REGEXP, SPLIT, REGEXP, REGEXP_SPLIT_TO_ARRAY

Description​

Syntax​

Parameters​

Return Value​

Examples​

Keywords​