Skip to main content

Delta Lake Catalog

Delta Lake Catalog uses the Trino Connector compatibility framework to access Delta Lake tables through the Delta Lake Connector.

This feature is experimental and has been supported since version 3.0.1.

Application Scenarios​

ScenarioDescription
Data IntegrationRead Delta Lake data and write it into Doris internal tables.
Data WritebackNot supported.

Environment Preparation​

Compile the Delta Lake Connector Plugin​

JDK 17 is required.

$ git clone https://github.com/apache/doris-thirdparty.git
$ cd doris-thirdparty
$ git checkout trino-435
$ cd plugin/trino-delta-lake
$ mvn clean install -DskipTest
$ cd ../../lib/trino-hdfs
$ mvn clean install -DskipTest

After compiling, you will find the trino-delta-lake-435 directory under trino/plugin/trino-delta-lake/target/ and the hdfs directory under trino/lib/trino-hdfs/target/.

You can also directly download the precompiled trino-delta-lake-435-20240724.tar.gz and hdfs.tar.gz, then extract them.

Deploy the Delta Lake Connector​

Place the trino-delta-lake-435/ directory in the connectors/ directory of all FE and BE deployment paths(If it does not exist, you can create it manually) and extract hdfs.tar.gz into the trino-delta-lake-435/ directory.

β”œβ”€β”€ bin
β”œβ”€β”€ conf
β”œβ”€β”€ connectors
β”‚ β”œβ”€β”€ trino-delta-lake-435
β”‚ β”‚ β”œβ”€β”€ hdfs
...

After deployment, it is recommended to restart the FE and BE nodes to ensure the Connector is loaded correctly.

Configuring Catalog​

Syntax​

CREATE CATALOG [IF NOT EXISTS] catalog_name
PROPERTIES (
'type' = 'trino-connector', -- required
'trino.connector.name' = 'delta_lake', -- required
{TrinoProperties},
{CommonProperties}
);
  • {TrinoProperties}

    The TrinoProperties section is used to specify properties that will be passed to the Trino Connector. These properties use the trino. prefix. In theory, all properties supported by Trino are also supported here. For more information about Delta Lake, refer to the Trino documentation.

  • {CommonProperties}

    The CommonProperties section is used to specify general properties. Please refer to the Catalog Overview under the "Common Properties" section.

Supported Delta Lake Versions​

For more information about Delta Lake, refer to the Trino documentation.

Supported Metadata Services​

For more information about Delta Lake, refer to the Trino documentation.

Supported Storage Systems​

For more information about Delta Lake, refer to the Trino documentation.

Column Type Mapping​

Delta Lake TypeTrino TypeDoris TypeComment
booleanbooleanboolean
intintint
bytetinyinttinyint
shortsmallintsmallint
longbigintbigint
floatrealfloat
doubledoubledouble
decimal(P, S)decimal(P, S)decimal(P, S)
stringvarcharstring
bianryvarbinarystring
datedatedate
timestamp_ntztimestamp(N)datetime(N)
timestamptimestamp with time zone(N)datetime(N)
arrayarrayarray
mapmapmap
structrowstruct

Examples​

CREATE CATALOG delta_lake_hms properties ( 
'type' = 'trino-connector',
'trino.connector.name' = 'delta_lake',
'trino.hive.metastore' = 'thrift',
'trino.hive.metastore.uri'= 'thrift://ip:port',
'trino.hive.config.resources'='/path/to/core-site.xml,/path/to/hdfs-site.xml'
);

Query Operations​

After configuring the Catalog, you can query the table data in the Catalog using the following methods:

-- 1. Switch to the catalog, use the database, and query
SWITCH delta_lake_ctl;
USE delta_lake_db;
SELECT * FROM delta_lake_tbl LIMIT 10;

-- 2. Use the Delta Lake database directly
USE delta_lake_ctl.delta_lake_db;
SELECT * FROM delta_lake_tbl LIMIT 10;

-- 3. Use the fully qualified name to query
SELECT * FROM delta_lake_ctl.delta_lake_db.delta_lake_tbl LIMIT 10;