Skip to main content

MaxCompute Catalog

MaxCompute is an enterprise-level SaaS (Software as a Service) cloud data warehouse on Alibaba Cloud. Through the open storage SDK provided by MaxCompute, Doris can access MaxCompute table information and perform queries.

Applicable Scenarios​

ScenarioDescription
Data IntegrationRead MaxCompute data and write it to Doris internal tables.
Data Write-backNot supported.

Notes​

  1. Starting from version 2.1.7, the MaxCompute Catalog is developed based on the open storage SDK. Before this, it was developed based on the Tunnel API.

  2. There are certain restrictions on the use of the open storage SDK. Please refer to the Usage Restrictions section in this document.

  3. A Project in MaxCompute is equivalent to a Database in Doris.

Configuring Catalog​

Syntax​

CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'type' = 'max_compute',
{McRequiredProperties},
{McOptionalProperties},
{CommonProperties}
);
  • {McRequiredProperties}

    Property NameDescriptionSupported Doris Version
    mc.default.projectThe name of the MaxCompute project you want to access. You can create and manage it in the MaxCompute Project List.
    mc.access_keyAccessKey. You can create and manage it in the Alibaba Cloud Console.
    mc.secret_keySecretKey. You can create and manage it in the Alibaba Cloud Console.
    mc.regionThe region where MaxCompute is enabled. You can find the corresponding region from the Endpoint.Before 2.1.7
    mc.endpointThe region where MaxCompute is enabled. Please refer to the section below on how to obtain Endpoint and Quota for configuration.2.1.7 and later
  • {McOptionalProperties}

    Property NameDefault ValueDescriptionSupported Doris Version
    mc.tunnel_endpointNoneRefer to the appendix on Custom Service Address.Before 2.1.7
    mc.odps_endpointNoneRefer to the appendix on Custom Service Address.Before 2.1.7
    mc.quotapay-as-you-goQuota name. Please refer to the section on how to obtain Endpoint and Quota for configuration.2.1.7 and later
    mc.split_strategybyte_sizeSets the split strategy. Can be set to byte_size (split by byte size) or row_count (split by number of rows).2.1.7 and later
    mc.split_byte_size268435456The file size (in bytes) read by each split. Default is 256 MB. Effective only when "mc.split_strategy" = "byte_size".2.1.7 and later
    mc.split_row_count1048576The number of rows read by each split. Effective only when "mc.split_strategy" = "row_count".2.1.7 and later
    mc.split_cross_partitionfalseWhether the generated split crosses partitions.2.1.8 and later
    mc.connect_timeout10sTimeout for connecting to MaxCompute.2.1.8 and later
    mc.read_timeout120sTimeout for reading from MaxCompute.2.1.8 and later
    mc.retry_count4Number of retries after a timeout.2.1.8 and later
  • {CommonProperties}

The CommonProperties section is used to fill in common properties. Please refer to the Catalog Overview section on Common Properties.

Supported MaxCompute Versions​

Only the public cloud version of MaxCompute is supported. For support with the private cloud version, please contact the Doris community.

Supported MaxCompute Formats​

  • Supports reading partitioned tables, clustered tables, and materialized views.

  • Does not support reading MaxCompute external tables, logical views, or Delta Tables.

Column Type Mapping​

MaxCompute TypeDoris TypeComment
boleanboolean
tinytinyint
tinyinttinyint
smallintsmallint
intint
bigintbigint
floatfloat
doubledouble
decimal(P, S)decimal(P, S)
char(N)char(N)
varchar(N)varchar(N)
stringstring
datedate
datetimedatetime(3)Fixed mapping to precision 3. You can specify the time zone using SET [GLOBAL] time_zone = 'Asia/Shanghai'.
timestamp_ntzdatetime(6)The precision of MaxCompute's timestamp_ntz is 9, but Doris' DATETIME supports a maximum precision of 6. Therefore, the extra part will be directly truncated when reading data.
arrayarray
mapmap
structstruct
otherUNSUPPORTED

Examples​

CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.default.project' = 'project',
'mc.access_key' = 'sk',
'mc.secret_key' = 'ak',
'mc.endpoint' = 'http://service.cn-beijing-vpc.MaxCompute.aliyun-inc.com/api'
);

If you are using a version earlier than 2.1.7 (exclusive), please use the following statements. (It is recommended to upgrade to 2.1.8 or later for usage)

CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.region' = 'cn-beijing',
'mc.default.project' = 'project',
'mc.access_key' = 'ak',
'mc.secret_key' = 'sk'
'mc.odps_endpoint' = 'http://service.cn-beijing.maxcompute.aliyun-inc.com/api',
'mc.tunnel_endpoint' = 'http://dt.cn-beijing.maxcompute.aliyun-inc.com'
);

Query Operations​

Basic Query​

-- 1. Switch to catalog, use database, and query
SWITCH mc_ctl;
USE mc_ctl;
SELECT * FROM mc_tbl LIMIT 10;

-- 2. Use mc database directly
USE mc_ctl.mc_db;
SELECT * FROM mc_tbl LIMIT 10;

-- 3. Use fully qualified name to query
SELECT * FROM mc_ctl.mc_db.mc_tbl LIMIT 10;

Appendix​

How to Obtain Endpoint and Quota (For Doris 2.1.7 and Later)​

  1. If using a dedicated resource group for Data Transmission Service (DTS)

    Refer to the documentation under the section "Use Dedicated Data Service Resource Groups", specifically "2. Authorization", to enable the required permissions. Then, navigate to the "Quota Management" list to view and copy the corresponding QuotaName, and specify it using "mc.quota" = "QuotaName". At this point, you can choose to access MaxCompute via VPC or public network. However, VPC provides guaranteed bandwidth, while public network bandwidth is limited.

  2. If using pay-as-you-go

    Refer to the documentation under the section "Using Open Storage (Pay-As-You-Go)" to enable the Open Storage (Storage API) switch and grant permissions to the user corresponding to the Ak and SK. In this case, mc.quota defaults to pay-as-you-go, and no additional value needs to be specified. When using the pay-as-you-go model, MaxCompute can only be accessed via VPC, and public network access is not available. Only prepaid users can access MaxCompute via the public network.

  3. Configure mc.endpoint based on the Alibaba Cloud Endpoints Documentation

    For users accessing via VPC, refer to the "VPC Network Endpoint" column in the "Regional Endpoint Table (Alibaba Cloud VPC Network Connection Method)" to configure mc.endpoint.

    For users accessing via the public network, you can choose from the "Classic Network Endpoint" column in the "Regional Endpoint Table (Alibaba Cloud Classic Network Connection Method)", or the "External Network Endpoint" column in the "Regional Endpoint Table (External Network Connection Method)" to configure mc.endpoint.

Custom Service Address (For Doris Versions before 2.1.7)​

In Doris versions before 2.1.7, the Tunnel SDK is used to interact with MaxCompute. Therefore, the following two endpoint properties need to be configured:

  • mc.odps_endpoint: MaxCompute Endpoint, used to retrieve MaxCompute metadata (e.g., database and table information).
  • mc.tunnel_endpoint: Tunnel Endpoint, used to read MaxCompute data.

By default, the MaxCompute Catalog generates endpoints based on the values of mc.region and mc.public_access.

The generated endpoint formats are as follows:

mc.public_accessmc.odps_endpointmc.tunnel_endpoint
falsehttp://service.{mc.region}.maxcompute.aliyun-inc.com/apihttp://dt.{mc.region}.maxcompute.aliyun-inc.com
truehttp://service.{mc.region}.maxcompute.aliyun.com/apihttp://dt.{mc.region}.maxcompute.aliyun.com

Users can also manually specify mc.odps_endpoint and mc.tunnel_endpoint to customize the service addresses. This is particularly useful for private deployments of MaxCompute environments.

For details on configuring MaxCompute Endpoint and Tunnel Endpoint, refer to the documentation on Endpoints for Different Regions and Network Connection Methods.