Skip to main content

Export Overview

The data export function is used to write the query result set or Doris table data into the specified storage system in the specified file format.

The differences between the export function and the data backup function are as follows:

 Data ExportData Backup
Final Storage LocationHDFS, Object Storage, Local File SystemHDFS, Object Storage
Data FormatOpen file formats such as Parquet, ORC, CSVDoris internal storage format
Execution SpeedModerate (requires reading data and converting to the target data format)Fast (no parsing and conversion required, directly upload Doris data files)
FlexibilityCan flexibly define the data to be exported through SQL statementsOnly supports table-level full backup
Use CasesResult set download, data exchange between different systemsData backup, data migration between Doris clusters

Choosing Export Methods​

Doris provides three different data export methods:

  • SELECT INTO OUTFILE: Supports the export of any SQL result set.
  • EXPORT: Supports the export of partial or full table data.
  • MySQL DUMP: Compatible with the MySQL dump command for data export.

The similarities and differences between the three export methods are as follows:

 SELECT INTO OUTFILEEXPORTMySQL DUMP
Synchronous/AsynchronousSynchronousAsynchronous (submit EXPORT tasks and check task progress via SHOW EXPORT command)Synchronous
Supports any SQLYesNoNo
Export specific partitionsYesYesNo
Export specific tabletsYesNoNo
Concurrent exportSupported with high concurrency (depends on whether the SQL statement has operators such as ORDER BY that need to be processed on a single node)Supported with high concurrency (supports tablet-level concurrent export)Not supported, single-threaded export only
Supported export data formatsParquet, ORC, CSVParquet, ORC, CSVMySQL Dump proprietary format
Supports exporting external tablesYesPartially supportedNo
Supports exporting viewsYesYesYes
Supported export locationsS3, HDFSS3, HDFSLOCAL

SELECT INTO OUTFILE​

Suitable for the following scenarios:

  • Data needs to be exported after complex calculations, such as filtering, aggregation, joins, etc.
  • Suitable for scenarios that require synchronous tasks.

EXPORT​

Suitable for the following scenarios:

  • Large-scale single table export, with simple filtering conditions.
  • Scenarios that require asynchronous task submission.

MySQL Dump​

Suitable for the following scenarios:

  • Compatible with the MySQL ecosystem, requires exporting both table structure and data.
  • Only for development testing or scenarios with very small data volumes.

Export File Column Type Mapping​

Parquet and ORC file formats have their own data types. Doris's export function can automatically map Doris's data types to the corresponding data types in Parquet and ORC file formats. The CSV format does not have types, all data is output as text.

The following table shows the mapping between Doris data types and Parquet, ORC file format data types:

  • ORC

    Doris TypeOrc Type
    booleanboolean
    tinyinttinyint
    smallintsmallint
    intint
    bigintbigint
    largeIntstring
    datestring
    datev2string
    datetimestring
    datetimev2timestamp
    floatfloat
    doubledouble
    char / varchar / stringstring
    decimaldecimal
    structstruct
    mapmap
    arrayarray
    jsonstring
    variantstring
    bitmapbinary
    quantile_statebinary
    hllbinary
  • Parquet

    When Doris is exported to the Parquet file format, the Doris memory data is first converted to the Arrow memory data format, and then written out to the Parquet file format by Arrow.

    Doris TypeArrow TypeParquet Physical TypeParquet Logical Type
    booleanbooleanBOOLEAN
    tinyintint8INT32INT_8
    smallintint16INT32INT_16
    intint32INT32INT_32
    bigintint64INT64INT_64
    largeIntutf8BYTE_ARRAYUTF8
    dateutf8BYTE_ARRAYUTF8
    datev2date32INT32DATE
    datetimeutf8BYTE_ARRAYUTF8
    datetimev2timestampINT96/INT64TIMESTAMP(MICROS/MILLIS/SECONDS)
    floatfloat32FLOAT
    doublefloat64DOUBLE
    char / varchar / stringutf8BYTE_ARRAYUTF8
    decimaldecimal128FIXED_LEN_BYTE_ARRAYDECIMAL(scale, precision)
    structstructParquet Group
    mapmapParquet Map
    arraylistParquet List
    jsonutf8BYTE_ARRAYUTF8
    variantutf8BYTE_ARRAYUTF8
    bitmapbinaryBYTE_ARRAY
    quantile_statebinaryBYTE_ARRAY
    hllbinaryBYTE_ARRAY

    Note: In versions 2.1.11 and 3.0.7, you can specify the parquet.enable_int96_timestamps property to determine whether Doris's datetimev2 type uses Parquet's INT96 storage or INT64. INT96 is used by default. However, INT96 has been deprecated in the Parquet standard and is only used for compatibility with some older systems (such as versions before Hive 4.0).