Hive Catalog

By connecting to Hive Metastore or metadata services compatible with Hive Metastore, Doris can automatically retrieve Hive database and table information for data querying.

In addition to Hive, many other systems use Hive Metastore to store metadata. Therefore, through the Hive Catalog, we can access not only Hive tables but also other table formats that use Hive Metastore for metadata storage, such as Iceberg and Hudi.

Applicable Scenarios

Scenario	Description
Query Acceleration	Use Doris's distributed computing engine to directly access Hive data for query acceleration.
Data Integration	Read Hive data and write it to Doris internal tables, or perform ZeroETL operations using the Doris computing engine.
Data Write-back	Process data from any source supported by Doris and write it back to Hive tables.

Configuring Catalog

Syntax

CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
    'type'='hms', -- required
    'hive.metastore.type' = '<hive_metastore_type>', -- optional
    'hive.version' = '<hive_version>', -- optional
    'fs.defaultFS' = '<fs_defaultfs>', -- optional
    {MetaStoreProperties},
    {StorageProperties},
    {HiveProperties},
    {CommonProperties}
);

<hive_metastore_type>

Specify the type of Hive Metastore.
- hms: Standard Hive Metastore service.
- glue: Access AWS Glue metadata service using Hive Metastore compatible interface.
- dlf: Access Alibaba Cloud DLF metadata service using Hive Metastore compatible interface.
<fs_defaultfs>

This parameter is required when writing data from Doris to tables in this Hive Catalog. Example:

'fs.defaultFS' = 'hdfs://namenode:port'
{MetaStoreProperties}

The MetaStoreProperties section is for entering connection and authentication information for the Metastore metadata service. Refer to the "Supported Metadata Services" section for details.
{StorageProperties}

The StorageProperties section is for entering connection and authentication information related to the storage system. Refer to the "Supported Storage Systems" section for details.
{HiveProperties}

HiveProperties section is for entering properties related to Hive Catalog.
- get_schema_from_table: The default value is false. By default, Doris will obtain the table schema information from the Hive Metastore. However, in some cases, compatibility issues may occur, such as the error Storage schema reading not supported. In this case, you can set this parameter to true, and the table schema will be obtained directly from the Table object. But please note that this method will cause the default value information of the column to be ignored. This property is supported since version 2.1.10 and 3.0.6.
- hive.recursive_directories: Whether to recurse into subdirectories when listing partition directories. This parameter has been supported since version 3.0.2. Before version 4.0, this parameter defaulted to false; in later versions, it defaults to true. The partition paths of some Hive tables may not match the partition information in the table schema. Setting this parameter to true is necessary to retrieve data files in subdirectories.
- hive.ignore_absent_partitions: Whether to ignore non-existent partitions. Defaults to true. If set to false, the query will report an error when encountering non-existent partitions. This parameter has been supported since version 3.0.2.
{CommonProperties}

The CommonProperties section is for entering common attributes. Please see the "Common Properties" section in the Catalog Overview.

Supported Hive Versions

Supports Hive 1.x, 2.x, 3.x, and 4.x.

Hive transactional tables are supported from version 3.x onwards. For details, refer to the "Hive Transactional Tables" section.

Supported Metadata Services

Note: The service types and parameters supported by different Doris versions are slightly different. Please refer to the [Examples] section.

Supported Storage Systems

To create Hive tables and write data through Doris, you need to explicitly add the fs.defaultFS property in the Catalog attributes. If the Catalog is created only for querying, this parameter can be omitted.

The service types and parameters supported by different Doris versions are slightly different. Please refer to the [Examples] section.

Supported Data Formats

Hive
Hudi
- Parquet
- ORC
Iceberg
- Parquet
- ORC

Column Type Mapping

Hive Type	Doris Type	Comment
boolean	boolean
tinyint	tinyint
smallint	smallint
int	int
bigint	bigint
date	date
timestamp	datetime(6)	Mapped to datetime with precision 6
float	float
double	double
decimal(P, S)	decimal(P, S)	Defaults to decimal(9, 0) if precision not specified
char(N)	char(N)
varchar(N)	varchar(N)
string	string
binary	string/varbinary	Controlled by the `enable.mapping.varbinary` property of Catalog (supported since 4.0.3). The default is `false`, which maps to `string`; when `true`, it maps to `varbinary` type.
array	array
map	map
struct	struct
other	unsupported

Examples

Hive Metastore

Version 3.1+

Access HMS and HDFS services without Kerberos authentication

CREATE CATALOG hive_hms_hdfs_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9383',
    'fs.defaultFS' = 'hdfs://127.0.0.1:8520',
    'hadoop.username' = 'doris'
);

Access HMS and HDFS services with Kerberos authentication enabled

CREATE CATALOG hive_hms_hdfs_kerberos_test_catalog PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://127.0.0.1:9583',
    'hive.metastore.client.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hive.metastore.client.keytab' = '/keytabs/hive-presto-master.keytab',
    'hive.metastore.service.principal' = 'hive/hadoop-master@LABS.TERADATA.COM',
    'hive.metastore.sasl.enabled ' = 'true',
    'hive.metastore.authentication.type' = 'kerberos',
    'fs.defaultFS' = 'hdfs://127.0.0.1:8520',
    'hadoop.security.auth_to_local' = 'RULE:[2:\$1@\$0](.*@LABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERLABS.TERADATA.COM)s/@.*//
    RULE:[2:\$1@\$0](.*@OTHERREALM.COM)s/@.*//
    DEFAULT',
    'hadoop.security.authentication' = 'kerberos',
    'hadoop.kerberos.principal' = 'hive/presto-master.docker.cluster@LABS.TERADATA.COM',
    'hadoop.kerberos.keytab' = '/keytabs/hive-presto-master.keytab'
);

CREATE CATALOG hive_viewfs PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
    'fs.defaultFS' = 'viewfs://your-cluster',
    'fs.viewfs.mounttable.your-cluster.link./ns1' = 'hdfs://your-nameservice/',
    'fs.viewfs.mounttable.your-cluster.homedir' = '/ns1',
    'dfs.nameservices' = 'your-nameservice',
    'dfs.ha.namenodes.your-nameservice' = 'nn1,nn2',
    'dfs.namenode.rpc-address.your-nameservice.nn1' = '172.21.0.2:8088',
    'dfs.namenode.rpc-address.your-nameservice.nn2' = '172.21.0.3:8088',
    'dfs.client.failover.proxy.provider.your-nameservice' = 'org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);