MaxCompute Catalog
MaxCompute is an enterprise-level SaaS (Software as a Service) cloud data warehouse on Alibaba Cloud. Through the open storage SDK provided by MaxCompute, Doris can access MaxCompute table information and perform queries.
Applicable Scenariosβ
Scenario | Description |
---|---|
Data Integration | Read MaxCompute data and write it to Doris internal tables. |
Data Write-back | Not supported. |
Notesβ
-
Starting from version 2.1.7, the MaxCompute Catalog is developed based on the open storage SDK. Before this, it was developed based on the Tunnel API.
-
There are certain restrictions on the use of the open storage SDK. Please refer to the
Usage Restrictions
section in this document. -
A
Project
in MaxCompute is equivalent to aDatabase
in Doris.
Configuring Catalogβ
Syntaxβ
CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'type' = 'max_compute',
{McRequiredProperties},
{McOptionalProperties},
{CommonProperties}
);
-
{McRequiredProperties}
Property Name Description Supported Doris Version mc.default.project
The name of the MaxCompute project you want to access. You can create and manage it in the MaxCompute Project List. mc.access_key
AccessKey. You can create and manage it in the Alibaba Cloud Console. mc.secret_key
SecretKey. You can create and manage it in the Alibaba Cloud Console. mc.region
The region where MaxCompute is enabled. You can find the corresponding region from the Endpoint. Before 2.1.7 mc.endpoint
The region where MaxCompute is enabled. Please refer to the section below on how to obtain Endpoint and Quota for configuration. 2.1.7 and later -
{McOptionalProperties}
Property Name Default Value Description Supported Doris Version mc.tunnel_endpoint
None Refer to the appendix on Custom Service Address
.Before 2.1.7 mc.odps_endpoint
None Refer to the appendix on Custom Service Address
.Before 2.1.7 mc.quota
pay-as-you-go
Quota name. Please refer to the section on how to obtain Endpoint and Quota for configuration. 2.1.7 and later mc.split_strategy
byte_size
Sets the split strategy. Can be set to byte_size
(split by byte size) orrow_count
(split by number of rows).2.1.7 and later mc.split_byte_size
268435456
The file size (in bytes) read by each split. Default is 256 MB. Effective only when "mc.split_strategy" = "byte_size"
.2.1.7 and later mc.split_row_count
1048576
The number of rows read by each split. Effective only when "mc.split_strategy" = "row_count"
.2.1.7 and later mc.split_cross_partition
false
Whether the generated split crosses partitions. 2.1.8 and later mc.connect_timeout
10s
Timeout for connecting to MaxCompute. 2.1.8 and later mc.read_timeout
120s
Timeout for reading from MaxCompute. 2.1.8 and later mc.retry_count
4
Number of retries after a timeout. 2.1.8 and later -
{CommonProperties}
The CommonProperties section is used to fill in common properties. Please refer to the Catalog Overview section on Common Properties.
Supported MaxCompute Versionsβ
Only the public cloud version of MaxCompute is supported. For support with the private cloud version, please contact the Doris community.
Supported MaxCompute Formatsβ
-
Supports reading partitioned tables, clustered tables, and materialized views.
-
Does not support reading MaxCompute external tables, logical views, or Delta Tables.
Column Type Mappingβ
MaxCompute Type | Doris Type | Comment |
---|---|---|
bolean | boolean | |
tiny | tinyint | |
tinyint | tinyint | |
smallint | smallint | |
int | int | |
bigint | bigint | |
float | float | |
double | double | |
decimal(P, S) | decimal(P, S) | |
char(N) | char(N) | |
varchar(N) | varchar(N) | |
string | string | |
date | date | |
datetime | datetime(3) | Fixed mapping to precision 3. You can specify the time zone using SET [GLOBAL] time_zone = 'Asia/Shanghai' . |
timestamp_ntz | datetime(6) | The precision of MaxCompute's timestamp_ntz is 9, but Doris' DATETIME supports a maximum precision of 6. Therefore, the extra part will be directly truncated when reading data. |
array | array | |
map | map | |
struct | struct | |
other | UNSUPPORTED |
Examplesβ
CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.default.project' = 'project',
'mc.access_key' = 'sk',
'mc.secret_key' = 'ak',
'mc.endpoint' = 'http://service.cn-beijing-vpc.MaxCompute.aliyun-inc.com/api'
);
If you are using a version earlier than 2.1.7 (exclusive), please use the following statements. (It is recommended to upgrade to 2.1.8 or later for usage)
CREATE CATALOG mc_catalog PROPERTIES (
'type' = 'max_compute',
'mc.region' = 'cn-beijing',
'mc.default.project' = 'project',
'mc.access_key' = 'ak',
'mc.secret_key' = 'sk'
'mc.odps_endpoint' = 'http://service.cn-beijing.maxcompute.aliyun-inc.com/api',
'mc.tunnel_endpoint' = 'http://dt.cn-beijing.maxcompute.aliyun-inc.com'
);
Query Operationsβ
Basic Queryβ
-- 1. Switch to catalog, use database, and query
SWITCH mc_ctl;
USE mc_ctl;
SELECT * FROM mc_tbl LIMIT 10;
-- 2. Use mc database directly
USE mc_ctl.mc_db;
SELECT * FROM mc_tbl LIMIT 10;
-- 3. Use fully qualified name to query
SELECT * FROM mc_ctl.mc_db.mc_tbl LIMIT 10;
Appendixβ
How to Obtain Endpoint and Quota (For Doris 2.1.7 and Later)β
-
If using a dedicated resource group for Data Transmission Service (DTS)
Refer to the documentation under the section "Use Dedicated Data Service Resource Groups", specifically "2. Authorization", to enable the required permissions. Then, navigate to the "Quota Management" list to view and copy the corresponding
QuotaName
, and specify it using"mc.quota" = "QuotaName"
. At this point, you can choose to access MaxCompute via VPC or public network. However, VPC provides guaranteed bandwidth, while public network bandwidth is limited. -
If using
pay-as-you-go
Refer to the documentation under the section "Using Open Storage (Pay-As-You-Go)" to enable the Open Storage (Storage API) switch and grant permissions to the user corresponding to the Ak and SK. In this case,
mc.quota
defaults topay-as-you-go
, and no additional value needs to be specified. When using the pay-as-you-go model, MaxCompute can only be accessed via VPC, and public network access is not available. Only prepaid users can access MaxCompute via the public network. -
Configure
mc.endpoint
based on the Alibaba Cloud Endpoints DocumentationFor users accessing via VPC, refer to the "VPC Network Endpoint" column in the "Regional Endpoint Table (Alibaba Cloud VPC Network Connection Method)" to configure
mc.endpoint
.For users accessing via the public network, you can choose from the "Classic Network Endpoint" column in the "Regional Endpoint Table (Alibaba Cloud Classic Network Connection Method)", or the "External Network Endpoint" column in the "Regional Endpoint Table (External Network Connection Method)" to configure
mc.endpoint
.
Custom Service Address (For Doris Versions before 2.1.7)β
In Doris versions before 2.1.7, the Tunnel SDK is used to interact with MaxCompute. Therefore, the following two endpoint properties need to be configured:
mc.odps_endpoint
: MaxCompute Endpoint, used to retrieve MaxCompute metadata (e.g., database and table information).mc.tunnel_endpoint
: Tunnel Endpoint, used to read MaxCompute data.
By default, the MaxCompute Catalog generates endpoints based on the values of mc.region
and mc.public_access
.
The generated endpoint formats are as follows:
mc.public_access | mc.odps_endpoint | mc.tunnel_endpoint |
---|---|---|
false | http://service.{mc.region}.maxcompute.aliyun-inc.com/api | http://dt.{mc.region}.maxcompute.aliyun-inc.com |
true | http://service.{mc.region}.maxcompute.aliyun.com/api | http://dt.{mc.region}.maxcompute.aliyun.com |
Users can also manually specify mc.odps_endpoint
and mc.tunnel_endpoint
to customize the service addresses. This is particularly useful for private deployments of MaxCompute environments.
For details on configuring MaxCompute Endpoint and Tunnel Endpoint, refer to the documentation on Endpoints for Different Regions and Network Connection Methods.