MaxCompute Catalog
MaxCompute is an enterprise-level SaaS (Software as a Service) cloud data warehouse on Alibaba Cloud. Through the open storage SDK provided by MaxCompute, Doris can retrieve MaxCompute table information and perform queries and writes.
Applicable Scenarios
| Scenario | Description |
|---|---|
| Data Integration | Read MaxCompute data and write to Doris internal tables. |
| Data Write-back | Using INSERT command to write data into MaxCompute Table. (Supported since version 4.1.0) |
Usage Notes
-
Starting from version 2.1.7, MaxCompute Catalog is developed based on the Open Storage SDK. Prior to this version, it was developed based on the Tunnel API.
-
There are certain limitations when using the Open Storage SDK. Please refer to the
Usage Limitationssection in this document. -
Before Doris version 3.1.3, a Project in MaxCompute corresponds to a Database in Doris. In version 3.1.3, you can introduce the MaxCompute schema hierarchy through the
mc.enable.namespace.schemaparameter.
Configuring Catalog
Syntax
CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'type' = 'max_compute',
{McRequiredProperties},
{McOptionalProperties},
{CommonProperties}
);
-
{McRequiredProperties}Property Name Description Supported Doris Version mc.default.projectThe name of the MaxCompute project to access. You can create and manage projects in the MaxCompute Project List. mc.access_keyAccessKey. You can create and manage it in the Alibaba Cloud Console. mc.secret_keySecretKey. You can create and manage it in the Alibaba Cloud Console. mc.regionThe region where MaxCompute is activated. You can find the corresponding Region from the Endpoint. Before 2.1.7 (exclusive) mc.endpointThe region where MaxCompute is activated. Please refer to the "How to Obtain Endpoint and Quota" section below for configuration. 2.1.7 (inclusive) and later -
{McOptionalProperties}Property Name Default Value Description Supported Doris Version mc.tunnel_endpointNone Refer to "Custom Service Address" in the appendix. Before 2.1.7 (exclusive) mc.odps_endpointNone Refer to "Custom Service Address" in the appendix. Before 2.1.7 (exclusive) mc.quotapay-as-you-goQuota name. Please refer to the "How to Obtain Endpoint and Quota" section below for configuration. 2.1.7 (inclusive) and later mc.split_strategybyte_sizeSets the split partitioning method. Can be set to partition by byte size byte_sizeor by row countrow_count.2.1.7 (inclusive) and later mc.split_byte_size268435456The file size each split reads, in bytes. Default is 256MB. Only effective when "mc.split_strategy" = "byte_size".2.1.7 (inclusive) and later mc.split_row_count1048576Number of rows each split reads. Only effective when "mc.split_strategy" = "row_count".2.1.7 (inclusive) and later mc.split_cross_partitionfalseWhether the generated splits cross partitions. 2.1.8 (inclusive) and later mc.connect_timeout10sConnection timeout for MaxCompute. 2.1.8 (inclusive) and later mc.read_timeout120sRead timeout for MaxCompute. 2.1.8 (inclusive) and later mc.retry_count4Number of retries after timeout. 2.1.8 (inclusive) and later mc.datetime_predicate_push_downtrueWhether to allow predicate push-down for timestamp/timestamp_ntztypes. Doris loses precision (9 -> 6) when syncing these two types. Therefore, if the original data precision is higher than 6 digits, predicate push-down may lead to inaccurate results.2.1.9/3.0.5 (inclusive) and later mc.account_formatnameThe account systems of Alibaba Cloud International and China sites are inconsistent. For international site users, if you encounter errors like user 'RAM$xxxxxx:xxxxx' is not a valid aliyun account, you can set this parameter toid.3.0.9/3.1.1 (inclusive) and later mc.enable.namespace.schemafalseWhether to support MaxCompute schema hierarchy. See: https://help.aliyun.com/zh/maxcompute/user-guide/schema-related-operations 3.1.3 (inclusive) and later -
{CommonProperties}The CommonProperties section is used to fill in common properties. Please refer to the "Common Properties" section in Catalog Overview.
Supported MaxCompute Versions
Only the public cloud version of MaxCompute is supported. For private cloud version support, please contact Doris community support.
Supported MaxCompute Tables
-
Supports reading partitioned tables, clustered tables, and materialized views.
-
Does not support reading MaxCompute external tables, logical views, or Delta Tables.
Hierarchy Mapping
-
When
mc.enable.namespace.schemais falseDoris MaxCompute Catalog N/A Database Project Table Table -
When
mc.enable.namespace.schemais trueDoris MaxCompute Catalog Project Database Schema Table Table
Column Type Mapping
| MaxCompute Type | Doris Type | Comment |
|---|---|---|
| boolean | boolean | |
| tiny | tinyint | |
| tinyint | tinyint | |
| smallint | smallint | |
| int | int | |
| bigint | bigint | |
| float | float | |
| double | double | |
| decimal(P, S) | decimal(P, S) | 1 <= P <= 38, 0 <= scale <= 18 |
| char(N) | char(N) | |
| varchar(N) | varchar(N) | |
| string | string | |
| date | date | |
| datetime | datetime(3) | Fixed mapping to precision 3. You can specify the timezone via SET [GLOBAL] time_zone = 'Asia/Shanghai'. |
| timestamp_ntz | datetime(6) | MaxCompute's timestamp_ntz precision is 9, while Doris's DATETIME maximum precision is only 6, so the extra part is truncated when reading data. |
| timestamp | datetime(6) | Supported since 2.1.9/3.0.5. MaxCompute's timestamp precision is 9, while Doris's DATETIME maximum precision is only 6, so the extra part is truncated when reading data. |
| array | array | |
| map | map | |
| struct | struct | |
| other | UNSUPPORTED |
Basic Example
CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.default.project' = 'project',
'mc.access_key' = 'sk',
'mc.secret_key' = 'ak',
'mc.endpoint' = 'http://service.cn-beijing-vpc.MaxCompute.aliyun-inc.com/api'
);
If using a version before 2.1.7 (exclusive), please use the following statement. (It is recommended to upgrade to 2.1.8 or later)
CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.region' = 'cn-beijing',
'mc.default.project' = 'project',
'mc.access_key' = 'ak',
'mc.secret_key' = 'sk',
'mc.odps_endpoint' = 'http://service.cn-beijing.maxcompute.aliyun-inc.com/api',
'mc.tunnel_endpoint' = 'http://dt.cn-beijing.maxcompute.aliyun-inc.com'
);
With Schema support:
CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.region' = 'cn-beijing',
'mc.default.project' = 'project',
'mc.access_key' = 'ak',
'mc.secret_key' = 'sk',
'mc.odps_endpoint' = 'http://service.cn-beijing.maxcompute.aliyun-inc.com/api',
'mc.tunnel_endpoint' = 'http://dt.cn-beijing.maxcompute.aliyun-inc.com',
'mc.enable.namespace.schema' = 'true'
);
Query Operations
Basic Query
-- 1. switch to catalog, use database and query
SWITCH mc_ctl;
USE mc_ctl;
SELECT * FROM mc_tbl LIMIT 10;
-- 2. use mc database directly
USE mc_ctl.mc_db;
SELECT * FROM mc_tbl LIMIT 10;
-- 3. use full qualified name to query
SELECT * FROM mc_ctl.mc_db.mc_tbl LIMIT 10;
Write Operations
Starting from version 4.1.0, Doris supports write operations to MaxCompute tables. You can use standard INSERT statements to write data from other data sources directly to MaxCompute tables through Doris.
- This is an experimental feature, supported since version 4.1.0.
- Supports writing to both partitioned and non-partitioned tables.
- Does not support writing to clustered tables, transactional tables, Delta Tables, or external tables.
INSERT INTO
The INSERT operation appends data to the target table.
Example:
INSERT INTO mc_tbl values (val1, val2, val3, val4);
INSERT INTO mc_tbl SELECT col1, col2 FROM internal.db1.tbl1;
INSERT INTO mc_tbl(col1, col2) values (val1, val2);
INSERT INTO mc_tbl(col1, col2, partition_col1, partition_col2) values (1, 2, "beijing", "2023-12-12");
-- Write to specified partition (you can specify only some partition columns, with remaining partitions written dynamically)
INSERT INTO mc_tbl PARTITION(ds='20250201') SELECT id, name FROM source_tbl;
INSERT INTO mc_tbl PARTITION(ds='20250101', region='bj') VALUES (1, 'v1'), (2, 'v2');
INSERT OVERWRITE
INSERT OVERWRITE completely overwrites the existing data in the table with new data.
INSERT OVERWRITE TABLE mc_tbl VALUES(val1, val2, val3, val4);
INSERT OVERWRITE TABLE mc_tbl(col1, col2) SELECT col1, col2 FROM internal.db1.tbl1;
-- Write to specified partition
INSERT OVERWRITE TABLE mc_tbl PARTITION(ds='20250101') VALUES (10, 'new1');
CTAS
You can create a MaxCompute table and write data using the CTAS statement:
CREATE TABLE mc_tbl AS SELECT * FROM other_table;
Database and Table Management
Starting from version 4.1.0, Doris supports creating and dropping MaxCompute databases and tables.
- This is an experimental feature, supported since version 4.1.0.
- Supports creating and dropping partitioned and non-partitioned tables.
- Does not support creating clustered tables, transactional tables, Delta Tables, or external tables.
This feature is only available when the
mc.enable.namespace.schemaproperty is set totrue.
Creating and Dropping Databases
You can switch to the corresponding Catalog using the SWITCH statement and execute the CREATE DATABASE statement:
SWITCH mc;
CREATE DATABASE [IF NOT EXISTS] mc_schema;
You can also create using the fully qualified name:
CREATE DATABASE [IF NOT EXISTS] mc.mc_schema;
Drop database:
DROP DATABASE [IF EXISTS] mc.mc_schema;
For MaxCompute Database, after deletion, all tables under it will also be deleted.
Creating and Dropping Tables
-
Create
Doris supports creating partitioned or non-partitioned tables in MaxCompute.
Example:
CREATE TABLE mc_schema.mc_tbl1 (
bool_col BOOLEAN,
int_col INT,
bigint_col BIGINT,
float_col FLOAT,
double_col DOUBLE,
decimal_col DECIMAL(18,6),
string_col STRING,
varchar_col VARCHAR(200),
char_col CHAR(50),
date_col DATE,
datetime_col DATETIME,
arr_col ARRAY<STRING>,
map_col MAP<STRING, STRING>,
struct_col STRUCT<f1:STRING, f2:INT>
);
CREATE TABLE mc_schema.mc_tbl2 (
id INT,
val STRING,
ds STRING,
region STRING
)
PARTITION BY (ds, region)(); -
Drop
You can drop a MaxCompute table using the
DROP TABLEstatement. Currently, dropping a table will also delete the data, including partition data.Example:
DROP TABLE [IF EXISTS] mc_tbl;
Appendix
How to Obtain Endpoint and Quota (Applicable for Doris 2.1.7 and Later)
-
If using exclusive resource groups for data transfer service
Please refer to the "2. Authorization" section in the "Using Exclusive Data Service Resource Groups" chapter in this document to enable the corresponding permissions. In the "Quota Management" list, view and copy the corresponding
QuotaName, and specify"mc.quota" = "QuotaName". At this point, you can choose either VPC or public network to access MaxCompute, but VPC has guaranteed bandwidth while public network bandwidth resources are limited. -
If using pay-as-you-go
Please refer to the "Using Open Storage (Pay-as-you-go)" section in this document to enable the Open Storage (Storage API) switch and grant permissions to the user corresponding to the AK and SK. In this case,
mc.quotais the default valuepay-as-you-go, and you do not need to specify this value additionally. With pay-as-you-go, you can only use VPC to access MaxCompute and cannot access via public network. Only prepaid users can access MaxCompute via public network. -
Configure
mc.endpointaccording to the "Region Endpoint Reference Table" in the Alibaba Cloud Endpoints documentationUsers accessing via VPC need to configure
mc.endpointaccording to the "VPC Network Endpoint" column in the "Regional Endpoint Reference Table (Alibaba Cloud VPC Network Connection Method)" table. Users accessing via public network can choose from the "Classic Network Endpoint" column in the "Regional Endpoint Reference Table (Alibaba Cloud Classic Network Connection Method)" table, or the "External Network Endpoint" column in the "Regional Endpoint Reference Table (External Network Connection Method)" table to configuremc.endpoint.
Custom Service Address (Applicable for Versions Before Doris 2.1.7)
In versions before Doris 2.1.7, the Tunnel SDK is used to interact with MaxCompute, so the following two endpoint properties are required:
-
mc.odps_endpoint: MaxCompute Endpoint, used to retrieve MaxCompute metadata (database and table information). -
mc.tunnel_endpoint: Tunnel Endpoint, used to read MaxCompute data.
By default, MaxCompute Catalog generates endpoints based on mc.region and mc.public_access.
The generated format is as follows:
mc.public_access | mc.odps_endpoint | mc.tunnel_endpoint |
|---|---|---|
| false | http://service.{mc.region}.maxcompute.aliyun-inc.com/api | http://dt.{mc.region}.maxcompute.aliyun-inc.com |
| true | http://service.{mc.region}.maxcompute.aliyun.com/api | http://dt.{mc.region}.maxcompute.aliyun.com |
Users can also specify mc.odps_endpoint and mc.tunnel_endpoint individually to customize the service address, which is suitable for some privately deployed MaxCompute environments.
For configuring MaxCompute Endpoint and Tunnel Endpoint, please refer to Endpoints for Different Regions and Network Connection Methods.
Resource Usage Control
Users can adjust the table-level request concurrency by adjusting the two Session Variables parallel_pipeline_task_num and num_scanner_threads to control resource consumption in the data transfer service. The corresponding concurrency equals max(parallel_pipeline_task_num * be num * num_scanner_threads).
Note:
-
This method can only control the concurrent request count for a single table within a single Query, and cannot control resource usage across multiple SQL statements.
-
Reducing concurrency means increasing the Query execution time.