ARRAY_DISTINCT
array_distinct​
Description​
Removes duplicate elements from an array and returns a new array containing unique elements. The function maintains the original order of elements, keeping only the first occurrence of each element.
Syntax​
array_distinct(ARRAY<T> arr)
Parameters​
arr
:ARRAY<T> type, the array to deduplicate. Supports column names or constant values.
T supported types:
- Numeric types: TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL
- String types: CHAR, VARCHAR, STRING
- Date and time types: DATE, DATETIME, DATEV2, DATETIMEV2
- Boolean type: BOOLEAN
- IP types: IPV4, IPV6
Return Value​
Return type: ARRAY<T>
Return value meaning:
- The deduplicated array containing all unique elements from the original array
- Maintains the original order of elements
- NULL: if the input array is NULL
Usage notes:
- The function traverses the array from left to right, keeping the first occurrence of each element and removing subsequent duplicate elements
- Empty array returns empty array, NULL array returns NULL
- Deduplication maintains the relative order of elements in the original array, does not reorder
- For null values in array elements: null elements will be deduplicated, multiple nulls only keep one
Examples​
Query examples:
Integer array deduplication, the original array [1, 2, 3, 4, 5] has no duplicate elements, so the result after deduplication is the same as the original array.
SELECT array_distinct([1, 2, 3, 4, 5]);
+---------------------------------+
| array_distinct([1, 2, 3, 4, 5]) |
+---------------------------------+
| [1, 2, 3, 4, 5] |
+---------------------------------+
String array deduplication: removes duplicate string elements. In the original array ['a', 'b', 'a', 'c', 'b', 'd'], 'a' appears twice (keeping the first occurrence), 'b' appears twice (keeping the first occurrence), after deduplication it becomes ["a", "b", "c", "d"].
SELECT array_distinct(['a', 'b', 'a', 'c', 'b', 'd']);
+------------------------------------------------+
| array_distinct(['a', 'b', 'a', 'c', 'b', 'd']) |
+------------------------------------------------+
| ["a", "b", "c", "d"] |
+------------------------------------------------+
Array containing null values: null elements will also be deduplicated, multiple nulls only keep one. In the original array [1, null, 2, null, 3, null], null appears three times, after deduplication only the first null is kept, resulting in [1, null, 2, 3].
SELECT array_distinct([1, null, 2, null, 3, null]);
+---------------------------------------------+
| array_distinct([1, null, 2, null, 3, null]) |
+---------------------------------------------+
| [1, null, 2, 3] |
+---------------------------------------------+
IP type array deduplication: deduplication of IPv4 address array. In the original array ['192.168.1.1', '192.168.1.2', '192.168.1.1'], '192.168.1.1' appears twice, after deduplication only the first occurrence of the address is kept, resulting in [192.168.1.1, 192.168.1.2].
SELECT array_distinct(CAST(['192.168.1.1', '192.168.1.2', '192.168.1.1'] AS ARRAY<IPV4>));
+------------------------------------------------------------------------------------+
| array_distinct(CAST(['192.168.1.1', '192.168.1.2', '192.168.1.1'] AS ARRAY<IPV4>)) |
+------------------------------------------------------------------------------------+
| ["192.168.1.1", "192.168.1.2"] |
+------------------------------------------------------------------------------------+
IPv6 type array deduplication: deduplication of IPv6 address array. In the original array ['2001:db8::1', '2001:db8::2', '2001:db8::1'], '2001:db8::1' appears twice, after deduplication only the first occurrence of the address is kept, resulting in [2001:db8::1, 2001:db8::2].
SELECT array_distinct(CAST(['2001:db8::1', '2001:db8::2', '2001:db8::1'] AS ARRAY<IPV6>));
+------------------------------------------------------------------------------------+
| array_distinct(CAST(['2001:db8::1', '2001:db8::2', '2001:db8::1'] AS ARRAY<IPV6>)) |
+------------------------------------------------------------------------------------+
| ["2001:db8::1", "2001:db8::2"] |
+------------------------------------------------------------------------------------+
Empty array returns empty array: empty array has no elements to deduplicate, directly returns empty array.
+--------------------+
| array_distinct([]) |
+--------------------+
| [] |
+--------------------+
NULL array returns NULL: returning NULL when the input array is NULL without throwing an error.
+----------------------+
| array_distinct(NULL) |
+----------------------+
| NULL |
+----------------------+
Single element array returns the original array: array with only one element has no duplicate elements, the result after deduplication is the same as the original array.
SELECT array_distinct([42]);
+----------------------+
| array_distinct([42]) |
+----------------------+
| [42] |
+----------------------+
Complex types not supported:
Nested array type not supported, throws error.
SELECT array_distinct([[1,2,3], [4,5,6], [1,2,3]]);
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[RUNTIME_ERROR]execute failed or unsupported types for function array_distinct(Array(Nullable(Array(Nullable(TINYINT)))))
Map type not supported, throws error.
SELECT array_distinct([{'a':1}, {'b':2}, {'a':1}]);
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[RUNTIME_ERROR]execute failed or unsupported types for function array_distinct(Array(Nullable(Map(Nullable(String), Nullable(TINYINT)))))
Struct type not supported, throws error.
SELECT array_distinct(array(named_struct('name','Alice','age',20), named_struct('name','Bob','age',30), named_struct('name','Alice','age',20)));
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[RUNTIME_ERROR]execute failed or unsupported types for function array_distinct(Array(Nullable(Struct(name:Nullable(String), age:Nullable(TINYINT)))))
Incorrect number of parameters will cause an error: the array_distinct function only accepts one array parameter, passing multiple parameters will cause an error.
SELECT array_distinct([1, 2, 3], [4, 5, 6]);
ERROR 1105 (HY000): errCode = 2, detailMessage = Can not found function 'array_distinct' which has 2 arity. Candidate functions are: [array_distinct(Expression)]
Passing non-array type will cause an error: the array_distinct function only accepts array type parameters, passing non-array types like strings will cause an error.
SELECT array_distinct('not_an_array');
ERROR 1105 (HY000): errCode = 2, detailMessage = Can not find the compatibility function signature: array_distinct(VARCHAR(12))
Keywords​
ARRAY, DISTINCT, ARRAY_DISTINCT