Skip to main content

AWS Glue

This document describes the parameter configuration when using AWS Glue Catalog to access Iceberg tables or Hive tables through CREATE CATALOG.

Supported Glue Catalog Types

AWS Glue Catalog currently supports three types of Catalogs:

Catalog TypeType Identifier (type)Description
HiveglueCatalog for connecting to Hive Metastore
IcebergglueCatalog for connecting to Iceberg table format
IcebergrestCatalog for connecting to Iceberg table format via Glue Rest Catalog

This documentation provides detailed parameter descriptions for each type to facilitate user configuration.

Common Parameters Overview

Parameter NameDescriptionRequiredDefault Value
glue.regionAWS Glue region, e.g., us-east-1YesNone
glue.endpointAWS Glue endpoint, e.g., https://glue.us-east-1.amazonaws.comYesNone
glue.access_keyAWS Access Key IDYesEmpty
glue.secret_keyAWS Secret Access KeyYesEmpty
glue.catalog_idGlue Catalog ID (not supported yet)NoEmpty
glue.role_arnIAM Role ARN for accessing Glue (supported since 3.1.2+)NoEmpty
glue.external_idIAM External ID for accessing Glue (supported since 3.1.2+)NoEmpty

Authentication Parameters

Accessing Glue requires authentication information, supporting the following two methods:

  1. Access Key Authentication

    Authenticate access to Glue through Access Key provided by glue.access_key and glue.secret_key.

  2. IAM Role Authentication (supported since 3.1.2+)

    Authenticate access to Glue through IAM Role provided by glue.role_arn.

    This method requires Doris to be deployed on AWS EC2, and the EC2 instance needs to be bound to an IAM Role that has permission to access Glue.

    If access through External ID is required, you need to configure glue.external_id as well.

Notes:

  • At least one of the two methods must be configured. If both methods are configured, Access Key authentication takes priority.

Example:

```sql
CREATE CATALOG hive_glue_catalog PROPERTIES (
'type' = 'hms',
'hive.metastore.type' = 'glue',
'glue.region' = 'us-east-1',
'glue.endpoint' = 'https://glue.us-east-1.amazonaws.com',
-- Using Access Key authentication
'glue.access_key' = '<YOUR_ACCESS_KEY>',
'glue.secret_key' = '<YOUR_SECRET_KEY>'
-- Or using IAM Role authentication
-- 'glue.role_arn' = '<YOUR_ROLE_ARN>',
-- 'glue.external_id' = '<YOUR_EXTERNAL_ID>'
);
```

Hive Glue Catalog

Hive Glue Catalog is used to access Hive tables through AWS Glue's Hive Metastore compatible interface. Configuration as follows:

Parameter NameDescriptionRequiredDefault Value
typeFixed as hmsYesNone
hive.metastore.typeFixed as glueYesNone
glue.regionAWS Glue region, e.g., us-east-1YesNone
glue.endpointAWS Glue endpoint, e.g., https://glue.us-east-1.amazonaws.comYesNone
glue.access_keyAWS Access Key IDNoEmpty
glue.secret_keyAWS Secret Access KeyNoEmpty
glue.catalog_idGlue Catalog ID (not supported yet)NoEmpty
glue.role_arnIAM Role ARN for accessing GlueNoEmpty
glue.external_idIAM External ID for accessing GlueNoEmpty

Example

CREATE CATALOG hive_glue_catalog PROPERTIES (
'type' = 'hms',
'hive.metastore.type' = 'glue',
'glue.region' = 'us-east-1',
'glue.endpoint' = 'https://glue.us-east-1.amazonaws.com',
'glue.access_key' = 'YOUR_ACCESS_KEY',
'glue.secret_key' = 'YOUR_SECRET_KEY'
);

Iceberg Glue Catalog

Iceberg Glue Catalog accesses Glue through Glue Client. Configuration as follows:

Parameter NameDescriptionRequiredDefault Value
typeFixed as icebergYesNone
iceberg.catalog.typeFixed as glueYesNone
warehouseIceberg data warehouse path, e.g., s3://my-bucket/iceberg-warehouse/Yess3://doris
glue.regionAWS Glue region, e.g., us-east-1YesNone
glue.endpointAWS Glue endpoint, e.g., https://glue.us-east-1.amazonaws.comYesNone
glue.access_keyAWS Access Key IDNoEmpty
glue.secret_keyAWS Secret Access KeyNoEmpty
glue.catalog_idGlue Catalog ID (not supported yet)NoEmpty
glue.role_arnIAM Role ARN for accessing Glue (not supported yet)NoEmpty
glue.external_idIAM External ID for accessing Glue (not supported yet)NoEmpty

Example

CREATE CATALOG iceberg_glue_catalog PROPERTIES (
'type' = 'iceberg',
'iceberg.catalog.type' = 'glue',
'glue.region' = 'us-east-1',
'glue.endpoint' = 'https://glue.us-east-1.amazonaws.com',
'glue.access_key' = '<YOUR_ACCESS_KEY>',
'glue.secret_key' = '<YOUR_SECRET_KEY>'
);

Iceberg Glue Rest Catalog

Iceberg Glue Rest Catalog accesses Glue through Glue Rest Catalog interface. Currently only supports Iceberg tables stored in AWS S3 Table Bucket. Configuration as follows:

Parameter NameDescriptionRequiredDefault Value
typeFixed as icebergYesNone
iceberg.catalog.typeFixed as restYesNone
iceberg.rest.uriGlue Rest service endpoint, e.g., https://glue.ap-east-1.amazonaws.com/icebergYesNone
warehouseIceberg data warehouse path, e.g., <account_id>:s3tablescatalog/<bucket_name>YesNone
iceberg.rest.sigv4-enabledEnable V4 signature format, fixed as trueYesNone
iceberg.rest.signing-nameSignature type, fixed as glueYesEmpty
iceberg.rest.access-key-idAccess Key for accessing Glue (also used for accessing S3 Bucket)YesEmpty
iceberg.rest.secret-access-keySecret Key for accessing Glue (also used for accessing S3 Bucket)YesEmpty
iceberg.rest.signing-regionAWS Glue region, e.g., us-east-1YesEmpty

Example

CREATE CATALOG glue_s3 PROPERTIES (
'type' = 'iceberg',
'iceberg.catalog.type' = 'rest',
'iceberg.rest.uri' = 'https://glue.<region>.amazonaws.com/iceberg',
'warehouse' = '<acount_id>:s3tablescatalog/<s3_table_bucket_name>',
'iceberg.rest.sigv4-enabled' = 'true',
'iceberg.rest.signing-name' = 'glue',
'iceberg.rest.access-key-id' = '<ak>',
'iceberg.rest.secret-access-key' = '<sk>',
'iceberg.rest.signing-region' = '<region>'
);

Permission Policies

Depending on usage scenarios, they can be divided into read-only and read-write policies.

1. Read-Only Permissions

Only allows reading database and table information from Glue Catalog.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueCatalogReadOnly",
"Effect": "Allow",
"Action": [
"glue:GetCatalog",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartitions"
],
"Resource": [
"arn:aws:glue:<region>:<account-id>:catalog",
"arn:aws:glue:<region>:<account-id>:database/*",
"arn:aws:glue:<region>:<account-id>:table/*/*"
]
}
]
}

2. Read-Write Permissions

Based on read-only permissions, allows creating/modifying/deleting databases and tables.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueCatalogReadWrite",
"Effect": "Allow",
"Action": [
"glue:GetCatalog",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartitions",
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:DeleteDatabase",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable"
],
"Resource": [
"arn:aws:glue:<region>:<account-id>:catalog",
"arn:aws:glue:<region>:<account-id>:database/*",
"arn:aws:glue:<region>:<account-id>:table/*/*"
]
}
]
}

Notes

  1. Placeholder Replacement

    • <region> → Your AWS region (e.g., us-east-1).
    • <account-id> → Your AWS account ID (12-digit number).
  2. Principle of Least Privilege

    • If only querying, do not grant write permissions.
    • Can replace * with specific database/table ARNs to further restrict permissions.
  3. S3 Permissions

    • The above policies only involve Glue Catalog.
    • If you need to read data files, additional S3 permissions are required (such as s3:GetObject, s3:ListBucket, etc.).