Skip to main content

MaxCompute Catalog

MaxCompute is an enterprise-level SaaS (Software as a Service) cloud data warehouse on Alibaba Cloud. Through the open storage SDK provided by MaxCompute, Doris can retrieve MaxCompute table information and perform queries and writes.

Applicable Scenarios

ScenarioDescription
Data IntegrationRead MaxCompute data and write to Doris internal tables.
Data Write-backUsing INSERT command to write data into MaxCompute Table. (Supported since version 4.1.0)

Usage Notes

  1. Starting from version 2.1.7, MaxCompute Catalog is developed based on the Open Storage SDK. Prior to this version, it was developed based on the Tunnel API.

  2. There are certain limitations when using the Open Storage SDK. Please refer to the Usage Limitations section in this document.

  3. Before Doris version 3.1.3, a Project in MaxCompute corresponds to a Database in Doris. In version 3.1.3, you can introduce the MaxCompute schema hierarchy through the mc.enable.namespace.schema parameter.

Configuring Catalog

Syntax

CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'type' = 'max_compute',
{McRequiredProperties},
{McOptionalProperties},
{CommonProperties}
);
  • {McRequiredProperties}

    Property NameDescriptionSupported Doris Version
    mc.default.projectThe name of the MaxCompute project to access. You can create and manage projects in the MaxCompute Project List.
    mc.access_keyAccessKey. You can create and manage it in the Alibaba Cloud Console.
    mc.secret_keySecretKey. You can create and manage it in the Alibaba Cloud Console.
    mc.regionThe region where MaxCompute is activated. You can find the corresponding Region from the Endpoint.Before 2.1.7 (exclusive)
    mc.endpointThe region where MaxCompute is activated. Please refer to the "How to Obtain Endpoint and Quota" section below for configuration.2.1.7 (inclusive) and later
  • {McOptionalProperties}

    Property NameDefault ValueDescriptionSupported Doris Version
    mc.tunnel_endpointNoneRefer to "Custom Service Address" in the appendix.Before 2.1.7 (exclusive)
    mc.odps_endpointNoneRefer to "Custom Service Address" in the appendix.Before 2.1.7 (exclusive)
    mc.quotapay-as-you-goQuota name. Please refer to the "How to Obtain Endpoint and Quota" section below for configuration.2.1.7 (inclusive) and later
    mc.split_strategybyte_sizeSets the split partitioning method. Can be set to partition by byte size byte_size or by row count row_count.2.1.7 (inclusive) and later
    mc.split_byte_size268435456The file size each split reads, in bytes. Default is 256MB. Only effective when "mc.split_strategy" = "byte_size".2.1.7 (inclusive) and later
    mc.split_row_count1048576Number of rows each split reads. Only effective when "mc.split_strategy" = "row_count".2.1.7 (inclusive) and later
    mc.split_cross_partitionfalseWhether the generated splits cross partitions.2.1.8 (inclusive) and later
    mc.connect_timeout10sConnection timeout for MaxCompute.2.1.8 (inclusive) and later
    mc.read_timeout120sRead timeout for MaxCompute.2.1.8 (inclusive) and later
    mc.retry_count4Number of retries after timeout.2.1.8 (inclusive) and later
    mc.datetime_predicate_push_downtrueWhether to allow predicate push-down for timestamp/timestamp_ntz types. Doris loses precision (9 -> 6) when syncing these two types. Therefore, if the original data precision is higher than 6 digits, predicate push-down may lead to inaccurate results.2.1.9/3.0.5 (inclusive) and later
    mc.account_formatnameThe account systems of Alibaba Cloud International and China sites are inconsistent. For international site users, if you encounter errors like user 'RAM$xxxxxx:xxxxx' is not a valid aliyun account, you can set this parameter to id.3.0.9/3.1.1 (inclusive) and later
    mc.enable.namespace.schemafalseWhether to support MaxCompute schema hierarchy. See: https://help.aliyun.com/zh/maxcompute/user-guide/schema-related-operations3.1.3 (inclusive) and later
  • {CommonProperties}

    The CommonProperties section is used to fill in common properties. Please refer to the "Common Properties" section in Catalog Overview.

Supported MaxCompute Versions

Only the public cloud version of MaxCompute is supported. For private cloud version support, please contact Doris community support.

Supported MaxCompute Tables

  • Supports reading partitioned tables, clustered tables, and materialized views.

  • Does not support reading MaxCompute external tables, logical views, or Delta Tables.

Hierarchy Mapping

  • When mc.enable.namespace.schema is false

    DorisMaxCompute
    CatalogN/A
    DatabaseProject
    TableTable
  • When mc.enable.namespace.schema is true

    DorisMaxCompute
    CatalogProject
    DatabaseSchema
    TableTable

Column Type Mapping

MaxCompute TypeDoris TypeComment
booleanboolean
tinytinyint
tinyinttinyint
smallintsmallint
intint
bigintbigint
floatfloat
doubledouble
decimal(P, S)decimal(P, S)1 <= P <= 38, 0 <= scale <= 18
char(N)char(N)
varchar(N)varchar(N)
stringstring
datedate
datetimedatetime(3)Fixed mapping to precision 3. You can specify the timezone via SET [GLOBAL] time_zone = 'Asia/Shanghai'.
timestamp_ntzdatetime(6)MaxCompute's timestamp_ntz precision is 9, while Doris's DATETIME maximum precision is only 6, so the extra part is truncated when reading data.
timestampdatetime(6)Supported since 2.1.9/3.0.5. MaxCompute's timestamp precision is 9, while Doris's DATETIME maximum precision is only 6, so the extra part is truncated when reading data.
arrayarray
mapmap
structstruct
otherUNSUPPORTED

Basic Example

CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.default.project' = 'project',
'mc.access_key' = 'sk',
'mc.secret_key' = 'ak',
'mc.endpoint' = 'http://service.cn-beijing-vpc.MaxCompute.aliyun-inc.com/api'
);

If using a version before 2.1.7 (exclusive), please use the following statement. (It is recommended to upgrade to 2.1.8 or later)

CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.region' = 'cn-beijing',
'mc.default.project' = 'project',
'mc.access_key' = 'ak',
'mc.secret_key' = 'sk',
'mc.odps_endpoint' = 'http://service.cn-beijing.maxcompute.aliyun-inc.com/api',
'mc.tunnel_endpoint' = 'http://dt.cn-beijing.maxcompute.aliyun-inc.com'
);

With Schema support:

CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.region' = 'cn-beijing',
'mc.default.project' = 'project',
'mc.access_key' = 'ak',
'mc.secret_key' = 'sk',
'mc.odps_endpoint' = 'http://service.cn-beijing.maxcompute.aliyun-inc.com/api',
'mc.tunnel_endpoint' = 'http://dt.cn-beijing.maxcompute.aliyun-inc.com',
'mc.enable.namespace.schema' = 'true'
);

Query Operations

Basic Query

-- 1. switch to catalog, use database and query
SWITCH mc_ctl;
USE mc_ctl;
SELECT * FROM mc_tbl LIMIT 10;

-- 2. use mc database directly
USE mc_ctl.mc_db;
SELECT * FROM mc_tbl LIMIT 10;

-- 3. use full qualified name to query
SELECT * FROM mc_ctl.mc_db.mc_tbl LIMIT 10;

Write Operations

Starting from version 4.1.0, Doris supports write operations to MaxCompute tables. You can use standard INSERT statements to write data from other data sources directly to MaxCompute tables through Doris.

note
  • This is an experimental feature, supported since version 4.1.0.
  • Supports writing to both partitioned and non-partitioned tables.
  • Does not support writing to clustered tables, transactional tables, Delta Tables, or external tables.

INSERT INTO

The INSERT operation appends data to the target table.

Example:

INSERT INTO mc_tbl values (val1, val2, val3, val4);
INSERT INTO mc_tbl SELECT col1, col2 FROM internal.db1.tbl1;

INSERT INTO mc_tbl(col1, col2) values (val1, val2);
INSERT INTO mc_tbl(col1, col2, partition_col1, partition_col2) values (1, 2, "beijing", "2023-12-12");

-- Write to specified partition (you can specify only some partition columns, with remaining partitions written dynamically)
INSERT INTO mc_tbl PARTITION(ds='20250201') SELECT id, name FROM source_tbl;
INSERT INTO mc_tbl PARTITION(ds='20250101', region='bj') VALUES (1, 'v1'), (2, 'v2');

INSERT OVERWRITE

INSERT OVERWRITE completely overwrites the existing data in the table with new data.

INSERT OVERWRITE TABLE mc_tbl VALUES(val1, val2, val3, val4);
INSERT OVERWRITE TABLE mc_tbl(col1, col2) SELECT col1, col2 FROM internal.db1.tbl1;

-- Write to specified partition
INSERT OVERWRITE TABLE mc_tbl PARTITION(ds='20250101') VALUES (10, 'new1');

CTAS

You can create a MaxCompute table and write data using the CTAS statement:

CREATE TABLE mc_tbl AS SELECT * FROM other_table;

Database and Table Management

Starting from version 4.1.0, Doris supports creating and dropping MaxCompute databases and tables.

note
  • This is an experimental feature, supported since version 4.1.0.
  • Supports creating and dropping partitioned and non-partitioned tables.
  • Does not support creating clustered tables, transactional tables, Delta Tables, or external tables.

This feature is only available when the mc.enable.namespace.schema property is set to true.

Creating and Dropping Databases

You can switch to the corresponding Catalog using the SWITCH statement and execute the CREATE DATABASE statement:

SWITCH mc;
CREATE DATABASE [IF NOT EXISTS] mc_schema;

You can also create using the fully qualified name:

CREATE DATABASE [IF NOT EXISTS] mc.mc_schema;

Drop database:

DROP DATABASE [IF EXISTS] mc.mc_schema;
caution

For MaxCompute Database, after deletion, all tables under it will also be deleted.

Creating and Dropping Tables

  • Create

    Doris supports creating partitioned or non-partitioned tables in MaxCompute.

    Example:

    CREATE TABLE mc_schema.mc_tbl1 (
    bool_col BOOLEAN,
    int_col INT,
    bigint_col BIGINT,
    float_col FLOAT,
    double_col DOUBLE,
    decimal_col DECIMAL(18,6),
    string_col STRING,
    varchar_col VARCHAR(200),
    char_col CHAR(50),
    date_col DATE,
    datetime_col DATETIME,
    arr_col ARRAY<STRING>,
    map_col MAP<STRING, STRING>,
    struct_col STRUCT<f1:STRING, f2:INT>
    );

    CREATE TABLE mc_schema.mc_tbl2 (
    id INT,
    val STRING,
    ds STRING,
    region STRING
    )
    PARTITION BY (ds, region)();
  • Drop

    You can drop a MaxCompute table using the DROP TABLE statement. Currently, dropping a table will also delete the data, including partition data.

    Example:

    DROP TABLE [IF EXISTS] mc_tbl;

Appendix

How to Obtain Endpoint and Quota (Applicable for Doris 2.1.7 and Later)

  1. If using exclusive resource groups for data transfer service

    Please refer to the "2. Authorization" section in the "Using Exclusive Data Service Resource Groups" chapter in this document to enable the corresponding permissions. In the "Quota Management" list, view and copy the corresponding QuotaName, and specify "mc.quota" = "QuotaName". At this point, you can choose either VPC or public network to access MaxCompute, but VPC has guaranteed bandwidth while public network bandwidth resources are limited.

  2. If using pay-as-you-go

    Please refer to the "Using Open Storage (Pay-as-you-go)" section in this document to enable the Open Storage (Storage API) switch and grant permissions to the user corresponding to the AK and SK. In this case, mc.quota is the default value pay-as-you-go, and you do not need to specify this value additionally. With pay-as-you-go, you can only use VPC to access MaxCompute and cannot access via public network. Only prepaid users can access MaxCompute via public network.

  3. Configure mc.endpoint according to the "Region Endpoint Reference Table" in the Alibaba Cloud Endpoints documentation

    Users accessing via VPC need to configure mc.endpoint according to the "VPC Network Endpoint" column in the "Regional Endpoint Reference Table (Alibaba Cloud VPC Network Connection Method)" table. Users accessing via public network can choose from the "Classic Network Endpoint" column in the "Regional Endpoint Reference Table (Alibaba Cloud Classic Network Connection Method)" table, or the "External Network Endpoint" column in the "Regional Endpoint Reference Table (External Network Connection Method)" table to configure mc.endpoint.

Custom Service Address (Applicable for Versions Before Doris 2.1.7)

In versions before Doris 2.1.7, the Tunnel SDK is used to interact with MaxCompute, so the following two endpoint properties are required:

  • mc.odps_endpoint: MaxCompute Endpoint, used to retrieve MaxCompute metadata (database and table information).

  • mc.tunnel_endpoint: Tunnel Endpoint, used to read MaxCompute data.

By default, MaxCompute Catalog generates endpoints based on mc.region and mc.public_access.

The generated format is as follows:

mc.public_accessmc.odps_endpointmc.tunnel_endpoint
falsehttp://service.{mc.region}.maxcompute.aliyun-inc.com/apihttp://dt.{mc.region}.maxcompute.aliyun-inc.com
truehttp://service.{mc.region}.maxcompute.aliyun.com/apihttp://dt.{mc.region}.maxcompute.aliyun.com

Users can also specify mc.odps_endpoint and mc.tunnel_endpoint individually to customize the service address, which is suitable for some privately deployed MaxCompute environments.

For configuring MaxCompute Endpoint and Tunnel Endpoint, please refer to Endpoints for Different Regions and Network Connection Methods.

Resource Usage Control

Users can adjust the table-level request concurrency by adjusting the two Session Variables parallel_pipeline_task_num and num_scanner_threads to control resource consumption in the data transfer service. The corresponding concurrency equals max(parallel_pipeline_task_num * be num * num_scanner_threads).

Note:

  1. This method can only control the concurrent request count for a single table within a single Query, and cannot control resource usage across multiple SQL statements.

  2. Reducing concurrency means increasing the Query execution time.