Skip to main content

Data Types

The list of data types supported by Apache Doris is as follows:

Numeric Types

Type NameStorage Size (Bytes)Description
BOOLEAN1Boolean value. 0 represents false, 1 represents true.
TINYINT1Signed integer, range [-128, 127].
SMALLINT2Signed integer, range [-32768, 32767].
INT4Signed integer, range [-2147483648, 2147483647].
BIGINT8Signed integer, range [-9223372036854775808, 9223372036854775807].
LARGEINT16Signed integer, range [-2^127 + 1 ~ 2^127 - 1].
FLOAT4Floating-point number, range [-3.410^38 ~ 3.410^38].
DOUBLE8Floating-point number, range [-1.7910^308 ~ 1.7910^308].
DECIMAL4/8/16/32High-precision fixed-point number. Format: DECIMAL(P[,S]). P represents the total number of significant digits (precision), and S represents the number of digits after the decimal point (scale). The range of P is [1, MAX_P]. When enable_decimal256=false, MAX_P=38; when enable_decimal256=true, MAX_P=76. The range of S is [0, P].
The default value of enable_decimal256 is false. Setting it to true yields more precise results but incurs some performance overhead.
Storage size:
  • When 0 < precision <= 9, occupies 4 bytes.
  • When 9 < precision <= 18, occupies 8 bytes.
  • When 16 < precision <= 38, occupies 16 bytes.
  • When 38 < precision <= 76, occupies 32 bytes.

    Date Types

    Type NameStorage Size (Bytes)Description
    DATE4Date type. The current value range is ['0000-01-01', '9999-12-31'], and the default print format is 'yyyy-MM-dd'.
    DATETIME8Date and time type. Format: DATETIME([P]). The optional parameter P represents the time precision, with a value range of [0, 6], meaning up to 6 decimal digits (microseconds) are supported. The default value when not set is 0.
    The value range is ['0000-01-01 00:00:00[.000000]', '9999-12-31 23:59:59[.999999]']. The print format is 'yyyy-MM-dd HH:mm:ss.SSSSSS'.

    String Types

    Type NameStorage Size (Bytes)Description
    CHARMFixed-length string. M represents the byte length of the fixed-length string. The range of M is 1-255.
    VARCHARVariableVariable-length string. M represents the byte length of the variable-length string. The range of M is 1-65533. Variable-length strings are stored in UTF-8 encoding, so an English character typically occupies 1 byte, while a Chinese character occupies 3 bytes.
    STRINGVariableVariable-length string. By default supports 1048576 bytes (1MB), and can be increased up to 2147483643 bytes (2GB). It can be adjusted via the BE configuration string_type_length_soft_limit_bytes. The String type can only be used in Value columns, not in Key columns or partition/bucketing columns.

    Semi-Structured Types

    Type NameStorage Size (Bytes)Description
    ARRAYVariableAn array composed of elements of type T. It cannot be used as a Key column. Currently supported in tables of the Duplicate and Unique models.
    MAPVariableA map composed of elements of types K and V. It cannot be used as a Key column. Currently supported in tables of the Duplicate and Unique models.
    STRUCTVariableA struct composed of multiple Fields, which can also be understood as a collection of multiple columns. It cannot be used as a Key. Currently, STRUCT is only supported in tables of the Duplicate model. The names and number of Fields in a Struct are fixed and are always Nullable.
    JSONVariableBinary JSON type. It is stored in binary JSON format, and JSON internal fields are accessed via JSON functions. The length limit and configuration method are the same as for String.
    VARIANTVariableDynamically variable data type, designed for semi-structured data such as JSON. It can store any JSON, automatically splitting fields in the JSON into sub-columns for storage to improve storage efficiency and query analysis performance. The length limit and configuration method are the same as for String. The Variant type can only be used in Value columns, not in Key columns or partition/bucketing columns.

    Aggregate Types

    Type NameStorage Size (Bytes)Description
    HLLVariableHLL is approximate deduplication. When the data volume is large, its performance is better than Count Distinct. The error of HLL is typically around 1%, and sometimes reaches 2%. HLL cannot be used as a Key column, and the aggregation type HLL_UNION must be specified when creating the table. Users do not need to specify the length and default value. The length is internally controlled by the system based on the aggregation degree of the data. HLL columns can only be queried or used through the corresponding hll_union_agg, hll_raw_agg, hll_cardinality, and hll_hash functions.
    BITMAPVariableColumns of the Bitmap type can be used in Aggregate tables, Unique tables, or Duplicate tables. When used in Unique tables or Duplicate tables, they must be used as non-Key columns. When used in Aggregate tables, they must be used as non-Key columns, and the aggregation type BITMAP_UNION must be specified when creating the table. Users do not need to specify the length and default value. The length is internally controlled by the system based on the aggregation degree of the data. BITMAP columns can only be queried or used through the corresponding bitmap_union_count, bitmap_union, bitmap_hash, bitmap_hash64, and other functions.
    QUANTILE_STATEVariableQUANTILE_STATE is a type used for computing approximate quantile values. During ingestion, it pre-aggregates different Values for the same Key. When the number of values does not exceed 2048, it records all data in detail. When the number of values exceeds 2048, it uses the TDigest algorithm to aggregate (cluster) the data and stores the centroids after clustering. QUANTILE_STATE cannot be used as a Key column, and the aggregation type QUANTILE_UNION must be specified when creating the table. Users do not need to specify the length and default value. The length is internally controlled by the system based on the aggregation degree of the data. QUANTILE_STATE columns can only be queried or used through the corresponding QUANTILE_PERCENT, QUANTILE_UNION, TO_QUANTILE_STATE, and other functions.
    AGG_STATEVariableAggregate function. It can only be used together with the state/merge/union function combinators. AGG_STATE cannot be used as a Key column, and the signature of the aggregate function must be declared at the same time when creating the table. Users do not need to specify the length and default value. The actual size of stored data depends on the function implementation.

    IP Types

    Type NameStorage Size (Bytes)Description
    IPv44 bytesStores an IPv4 address as 4 bytes of binary data, used together with the ipv4_* family of functions.
    IPv616 bytesStores an IPv6 address as 16 bytes of binary data, used together with the ipv6_* family of functions.

    You can also view all data types supported by Apache Doris through the SHOW DATA TYPES; statement.