Skip to main content

ARRAY_DISTINCT

array_distinct​

SinceVersion 2.0.0

Description​

Removes duplicate elements from an array and returns a new array containing unique elements. The function maintains the original order of elements, keeping only the first occurrence of each element.

Syntax​

array_distinct(ARRAY<T> arr)

Parameters​

  • arr:ARRAY<T> type, the array to deduplicate. Supports column names or constant values.

T supported types:

  • Numeric types: TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL
  • String types: CHAR, VARCHAR, STRING
  • Date and time types: DATE, DATETIME, DATEV2, DATETIMEV2
  • Boolean type: BOOLEAN
  • IP types: IPV4, IPV6

Return Value​

Return type: ARRAY<T>

Return value meaning:

  • The deduplicated array containing all unique elements from the original array
  • Maintains the original order of elements
  • NULL: if the input array is NULL

Usage notes:

  • The function traverses the array from left to right, keeping the first occurrence of each element and removing subsequent duplicate elements
  • Empty array returns empty array, NULL array returns NULL
  • Deduplication maintains the relative order of elements in the original array, does not reorder
  • For null values in array elements: null elements will be deduplicated, multiple nulls only keep one

Examples​

Query examples:

Integer array deduplication, the original array [1, 2, 3, 4, 5] has no duplicate elements, so the result after deduplication is the same as the original array.

SELECT array_distinct([1, 2, 3, 4, 5]);
+---------------------------------+
| array_distinct([1, 2, 3, 4, 5]) |
+---------------------------------+
| [1, 2, 3, 4, 5] |
+---------------------------------+

String array deduplication: removes duplicate string elements. In the original array ['a', 'b', 'a', 'c', 'b', 'd'], 'a' appears twice (keeping the first occurrence), 'b' appears twice (keeping the first occurrence), after deduplication it becomes ["a", "b", "c", "d"].

SELECT array_distinct(['a', 'b', 'a', 'c', 'b', 'd']);
+------------------------------------------------+
| array_distinct(['a', 'b', 'a', 'c', 'b', 'd']) |
+------------------------------------------------+
| ["a", "b", "c", "d"] |
+------------------------------------------------+

Array containing null values: null elements will also be deduplicated, multiple nulls only keep one. In the original array [1, null, 2, null, 3, null], null appears three times, after deduplication only the first null is kept, resulting in [1, null, 2, 3].

SELECT array_distinct([1, null, 2, null, 3, null]);
+---------------------------------------------+
| array_distinct([1, null, 2, null, 3, null]) |
+---------------------------------------------+
| [1, null, 2, 3] |
+---------------------------------------------+

IP type array deduplication: deduplication of IPv4 address array. In the original array ['192.168.1.1', '192.168.1.2', '192.168.1.1'], '192.168.1.1' appears twice, after deduplication only the first occurrence of the address is kept, resulting in [192.168.1.1, 192.168.1.2].

SELECT array_distinct(CAST(['192.168.1.1', '192.168.1.2', '192.168.1.1'] AS ARRAY<IPV4>));
+------------------------------------------------------------------------------------+
| array_distinct(CAST(['192.168.1.1', '192.168.1.2', '192.168.1.1'] AS ARRAY<IPV4>)) |
+------------------------------------------------------------------------------------+
| ["192.168.1.1", "192.168.1.2"] |
+------------------------------------------------------------------------------------+

IPv6 type array deduplication: deduplication of IPv6 address array. In the original array ['2001:db8::1', '2001:db8::2', '2001:db8::1'], '2001:db8::1' appears twice, after deduplication only the first occurrence of the address is kept, resulting in [2001:db8::1, 2001:db8::2].

SELECT array_distinct(CAST(['2001:db8::1', '2001:db8::2', '2001:db8::1'] AS ARRAY<IPV6>));
+------------------------------------------------------------------------------------+
| array_distinct(CAST(['2001:db8::1', '2001:db8::2', '2001:db8::1'] AS ARRAY<IPV6>)) |
+------------------------------------------------------------------------------------+
| ["2001:db8::1", "2001:db8::2"] |
+------------------------------------------------------------------------------------+

Empty array returns empty array: empty array has no elements to deduplicate, directly returns empty array.

+--------------------+
| array_distinct([]) |
+--------------------+
| [] |
+--------------------+

NULL array returns NULL: returning NULL when the input array is NULL without throwing an error.

+----------------------+
| array_distinct(NULL) |
+----------------------+
| NULL |
+----------------------+

Single element array returns the original array: array with only one element has no duplicate elements, the result after deduplication is the same as the original array.

SELECT array_distinct([42]);
+----------------------+
| array_distinct([42]) |
+----------------------+
| [42] |
+----------------------+

Complex types not supported:

Nested array type not supported, throws error.

SELECT array_distinct([[1,2,3], [4,5,6], [1,2,3]]);
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[RUNTIME_ERROR]execute failed or unsupported types for function array_distinct(Array(Nullable(Array(Nullable(TINYINT)))))

Map type not supported, throws error.

SELECT array_distinct([{'a':1}, {'b':2}, {'a':1}]);
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[RUNTIME_ERROR]execute failed or unsupported types for function array_distinct(Array(Nullable(Map(Nullable(String), Nullable(TINYINT)))))

Struct type not supported, throws error.

SELECT array_distinct(array(named_struct('name','Alice','age',20), named_struct('name','Bob','age',30), named_struct('name','Alice','age',20)));
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[RUNTIME_ERROR]execute failed or unsupported types for function array_distinct(Array(Nullable(Struct(name:Nullable(String), age:Nullable(TINYINT)))))

Incorrect number of parameters will cause an error: the array_distinct function only accepts one array parameter, passing multiple parameters will cause an error.

SELECT array_distinct([1, 2, 3], [4, 5, 6]);
ERROR 1105 (HY000): errCode = 2, detailMessage = Can not found function 'array_distinct' which has 2 arity. Candidate functions are: [array_distinct(Expression)]

Passing non-array type will cause an error: the array_distinct function only accepts array type parameters, passing non-array types like strings will cause an error.

SELECT array_distinct('not_an_array');
ERROR 1105 (HY000): errCode = 2, detailMessage = Can not find the compatibility function signature: array_distinct(VARCHAR(12))

Keywords​

ARRAY, DISTINCT, ARRAY_DISTINCT