ARRAY Documentation
Type Descriptionβ
The ARRAY<T>
type is used to represent an ordered collection of elements, where each element has the same data type. For example, an array of integers can be represented as [1, 2, 3]
, and an array of strings as ["a", "b", "c"]
.
ARRAY<T>
represents an array composed of elements of type T, where T is nullable. Supported types for T include:BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6, STRUCT, MAP, VARIANT, JSONB, ARRAY<T>
.- Note: Among the above T types,
JSONB
andVARIANT
are only supported in the computation layer of Doris and do not support usingARRAY<JSONB>
andARRAY<VARIANT>
in table creation in Doris.
- Note: Among the above T types,
Type Constraintsβ
- The maximum nesting depth supported by
ARRAY<T>
type is 9. - Conversion between
ARRAY<T>
types depends on whether T can be converted.Array<T>
type cannot be converted to other types.- For example:
ARRAY<INT>
can be converted toARRAY<BIGINT>
becauseINT
andBIGINT
can be converted. Variant
type can be converted toArray<T>
type.- String type can be converted to
ARRAY<T>
type (through parsing, returning NULL if parsing fails).
- For example:
- In the
AGGREGATE
table model,ARRAY<T>
type only supportsREPLACE
andREPLACE_IF_NOT_NULL
. In any table model, it cannot be used as a KEY column, nor as a partition or bucket column. - Columns of
ARRAY<T>
type supportORDER BY
andGROUP BY
operations.- T types that support
ORDER BY
andGROUP BY
include:BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6
.
- T types that support
- Columns of
ARRAY<T>
type do not support being used asJOIN KEY
and do not support being used inDELETE
statements.
Constant Constructionβ
-
Use the
ARRAY()
function to construct a value of typeARRAY<T>
, where T is the common type of the parameters.-- [1, 2, 3] T is INT
SELECT ARRAY(1, 2, 3);
-- ["1", "2", "abc"] , T is STRING
SELECT ARRAY(1, 2, 'abc'); -
Use
[]
to construct a value of typeARRAY<T>
, where T is the common type of the parameters.-- ["abc", "def", "efg"] T is STRING
SELECT ["abc", "def", "efg"];
-- ["1", "2", "abc"] , T is STRING
SELECT [1, 2, 'abc'];
Modifying Typeβ
-
Modification is only allowed when the element type inside
ARRAY
isVARCHAR
.- Only allows changing the parameter of
VARCHAR
from smaller to larger, not the other way around.
CREATE TABLE `array_table` (
`k` INT NOT NULL,
`array_column` ARRAY<VARCHAR(10)>
) ENGINE=OLAP
DUPLICATE KEY(`k`)
DISTRIBUTED BY HASH(`k`) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);
ALTER TABLE array_table MODIFY COLUMN array_column ARRAY<VARCHAR(20)>; - Only allows changing the parameter of
-
The default value for columns of type
ARRAY<T>
can only be specified as NULL, and once specified, it cannot be modified.
Element Accessβ
-
Use
[k]
to access the k-th element ofARRAY<T>
, where k starts from 1. If out of bounds, returns NULL.SELECT [1, 2, 3][1];
+--------------+
| [1, 2, 3][1] |
+--------------+
| 1 |
+--------------+
SELECT ARRAY(1, 2, 3)[2];
+-------------------+
| ARRAY(1, 2, 3)[2] |
+-------------------+
| 2 |
+-------------------+
SELECT [[1,2,3],[2,3,4]][1][3];
+-------------------------+
| [[1,2,3],[2,3,4]][1][3] |
+-------------------------+
| 3 |
+-------------------------+ -
Use
ELEMENT_AT(ARRAY, k)
to access the k-th element ofARRAY<T>
, where k starts from 1. If out of bounds, returns NULL.SELECT ELEMENT_AT(ARRAY(1, 2, 3) , 2);
+--------------------------------+
| ELEMENT_AT(ARRAY(1, 2, 3) , 2) |
+--------------------------------+
| 2 |
+--------------------------------+
SELECT ELEMENT_AT([1, 2, 3] , 3);
+---------------------------+
| ELEMENT_AT([1, 2, 3] , 3) |
+---------------------------+
| 3 |
+---------------------------+
SELECT ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3);
+-------------------------------------------------------+
| ELEMENT_AT([["abc", "def"], ["def", "gef"], [3]] , 3) |
+-------------------------------------------------------+
| ["3"] |
+-------------------------------------------------------+
Query Accelerationβ
- Columns of type
ARRAY<T>
in Doris tables support adding inverted indexes to accelerate computations involvingARRAY
functions on this column.- T types supported by inverted indexes:
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING, IPV4, IPV6
. - Accelerated
ARRAY
functions:ARRAY_CONTAINS
,ARRAYS_OVERLAP
, but when the function parameters include NULL, it falls back to regular vectorized computation.
- T types supported by inverted indexes:
Examplesβ
-
Multidimensional Arrays
-- Create table
CREATE TABLE IF NOT EXISTS array_table (
id INT,
two_dim_array ARRAY<ARRAY<INT>>,
three_dim_array ARRAY<ARRAY<ARRAY<STRING>>>
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);
-- Insert
INSERT INTO array_table VALUES (1, [[1, 2, 3], [4, 5, 6]], [[['ab', 'cd', 'ef'], ['gh', 'ij', 'kl']], [['mn', 'op', 'qr'], ['st', 'uv', 'wx']]]);
INSERT INTO array_table VALUES (2, ARRAY(ARRAY(1, 2, 3), ARRAY(4, 5, 6)), ARRAY(ARRAY(ARRAY('ab', 'cd', 'ef'), ARRAY('gh', 'ij', 'kl')), ARRAY(ARRAY('mn', 'op', 'qr'), ARRAY('st', 'uv', 'wx'))));
-- Query
SELECT two_dim_array[1][2], three_dim_array[1][1][2] FROM array_table ORDER BY id;
+---------------------+--------------------------+
| two_dim_array[1][2] | three_dim_array[1][1][2] |
+---------------------+--------------------------+
| 2 | cd |
| 2 | cd |
+---------------------+--------------------------+ -
Nested Complex Types
-- Create table
CREATE TABLE IF NOT EXISTS array_map_table (
id INT,
array_map ARRAY<MAP<STRING, INT>>
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);
-- Insert
INSERT INTO array_map_table VALUES (1, ARRAY(MAP('key1', 1), MAP('key2', 2)));
INSERT INTO array_map_table VALUES (2, ARRAY(MAP('key1', 1), MAP('key2', 2)));
-- Query
SELECT array_map[1], array_map[2] FROM array_map_table ORDER BY id;
+--------------+--------------+
| array_map[1] | array_map[2] |
+--------------+--------------+
| {"key1":1} | {"key2":2} |
| {"key1":1} | {"key2":2} |
+--------------+--------------+
-- Create table
CREATE TABLE IF NOT EXISTS array_table (
id INT,
array_struct ARRAY<STRUCT<id: INT, name: STRING>>,
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO array_table VALUES (1, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));
INSERT INTO array_table VALUES (2, ARRAY(STRUCT(1, 'John'), STRUCT(2, 'Jane')));
SELECT array_struct[1], array_struct[2] FROM array_table ORDER BY id;
+-------------------------+-------------------------+
| array_struct[1] | array_struct[2] |
+-------------------------+-------------------------+
| {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
| {"id":1, "name":"John"} | {"id":2, "name":"Jane"} |
+-------------------------+-------------------------+ -
Modifying Type
-- Create table
CREATE TABLE array_table (
id INT,
array_varchar ARRAY<VARCHAR(10)>
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
-- Modify ARRAY type
ALTER TABLE array_table MODIFY COLUMN array_varchar ARRAY<VARCHAR(20)>;
-- Check column type
DESC array_table;
+---------------+--------------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------------+------+-------+---------+-------+
| id | int | Yes | true | NULL | |
| array_varchar | array<varchar(20)> | Yes | false | NULL | NONE |
+---------------+--------------------+------+-------+---------+-------+ -
Inverted Index
-- Create table statement
CREATE TABLE `array_table` (
`k` int NOT NULL,
`array_column` ARRAY<INT>,
INDEX idx_array_column (array_column) USING INVERTED
) ENGINE=OLAP
DUPLICATE KEY(`k`)
DISTRIBUTED BY HASH(`k`) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);
-- Insert
INSERT INTO array_table VALUES (1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9]);
-- The inverted index accelerates the execution of the ARRAY_CONTAINS function
SELECT * FROM array_table WHERE ARRAY_CONTAINS(array_column, 5);
+------+--------------+
| k | array_column |
+------+--------------+
| 2 | [4, 5, 6] |
+------+--------------+
-- The inverted index accelerates the execution of the ARRAYS_OVERLAP function
SELECT * FROM array_table WHERE ARRAYS_OVERLAP(array_column, [6, 9]);
+------+--------------+
| k | array_column |
+------+--------------+
| 2 | [4, 5, 6] |
| 3 | [7, 8, 9] |
+------+--------------+