Regression Testing

Basic Concepts

Suite: A test case (the file name)
Group: A test set (the directory that the test case belongs to)
Action: An encapsulated test action, such as sql_action for SQL execution, test_action for result verification, and streamLoad_action for data ingestion.

Steps

Get the clusters ready
Modify the configuration files ${DORIS_HOME}/regression-test/conf/regression-conf.groovy, set items, such as JDBC URL and user
Create the test case files and write the test cases
If a test case includes a qt Action, you also need to create the relevant data files. For example, the case suites/demo/qt_action.groovy will require a TSV file data/demo/qt_action.out for output verification.
Run ${DORIS_HOME}/run-regression-test.sh to test all cases, or run ${DORIS_HOME}/run-regression-test.sh --run <suiteName> to test a few cases. For more examples, please refer to the "Startup Script Examples" section on this page.

Directory Structure

Key files and directories to pay attention to:

run-regression-test.sh: Startup script
regression-conf.groovy: Default configurations for regression test
data: where the the input and output data is
suites: where test cases are

./${DORIS_HOME}
    |-- **run-regression-test.sh**           Startup script for regression test
    |-- regression-test
    |   |-- plugins                          Plug-ins
    |   |-- conf
    |   |   |-- logback.xml                  Log configurations
    |   |   |-- **regression-conf.groovy**   Default configurations
    |   |
    |   |-- framework                        framework source code for regression test
    |   |-- **data**                         Input and output data files of test cases
    |   |   |-- demo                         Input and output data files of demo
    |   |   |-- correctness                  Input and output data files of correctness test cases
    |   |   |-- performance                  Input and output data files of performance test cases
    |   |   |-- utils                        Input and output data files of other utilities
    |   |
    |   |-- **suites**                       Test cases for regression testing
    |       |-- demo                         Demo of test cases
    |       |-- correctness                  Test cases of correctness tests
    |       |-- performance                  Test cases of performance tests
    |       |-- utils                        Other utilities
    |
    |-- output
        |-- regression-test
            |-- log                          Logs of regression testing

Default Configurations of the Framework

Modify the JDBC and FE configurations based on the case.

/* ============ Key parts to pay attention to ============ */
// The default database, which will be created if the user does not create a database
defaultDb = "regression_test"

// JDBC configurations
jdbcUrl = "jdbc:mysql://127.0.0.1:9030/?"
jdbcUser = "root"
jdbcPassword = ""

// FE address configurations, used for Stream Load
feHttpAddress = "127.0.0.1:8030"
feHttpUser = "root"
feHttpPassword = ""

/* ============ Configurations that usually do not require modifications ============ */

// DORIS_HOME is loaded by run-regression-test.sh
// which is java -DDORIS_HOME=./

// Set the path for the test cases
suitePath = "${DORIS_HOME}/regression-test/suites"
// Set the path for the input and output data
dataPath = "${DORIS_HOME}/regression-test/data"
// Set the path for the plug-ins
pluginPath = "${DORIS_HOME}/regression-test/plugins"

// By default, all groups will be read. The groups are separated by comma. For example, "demo,performance".
// This is dynamically specified and overwritten via run-regression-test.sh --run -g
testGroups = ""
// By default, all suites will be read. This is dynamically specified and overwritten via run-regression-test.sh --run -s
testSuites = ""
// Directories specified in this parameter will be loaded by default. It can be dynamically specified and overwritten via run-regression-test.sh --run -d
testDirectories = ""

// Groups specified in this parameter will be excluded. It can be dynamically specified and overwritten via run-regression-test.sh --run -xg
excludeGroups = ""
// Suites specified in this parameter will be excluded. It can be dynamically specified and overwritten via run-regression-test.sh --run -xs
excludeSuites = ""
// Directories specified in this parameter will be excluded. It can be dynamically specified and overwritten via run-regression-test.sh --run -xd
excludeDirectories = ""

// Other self-defined configurations
customConf1 = "test_custom_conf_value"

Steps to Write Test Cases

Enter the ${DORIS_HOME}/regression-test directory
Choose a directory based on the test. For correctness test, place the test in suites/correctness; for performance test, put the test case in suites/performance.
Create a groovy test case file, and add a few Action for testing.

Action

Action refers to a test action, which is defined by DSL and provided in the test framework.

sql action

A sql_action is used to commit SQL and obtain results. If the query fails, an error will be thrown.

Parameters:

String sql: the input SQL string
return List<List<Object>>: the query result. If it is DDL/DML, a result of one row and one column will be returned and the value will be updateRowCount.

You can find the following sample code in ${DORIS_HOME}/regression-test/suites/demo/sql_action.groovy:

suite("sql_action", "demo") {
    // execute sql and ignore result
    sql "show databases"

    // execute sql and get result, outer List denote rows, inner List denote columns in a single row
    List<List<Object>> tables = sql "show tables"

    // assertXxx() will invoke junit5's Assertions.assertXxx() dynamically
    assertTrue(tables.size() >= 0) // test rowCount >= 0

    // syntax error
    try {
        sql "a b c d e"
        throw new IllegalStateException("Should be syntax error")
    } catch (java.sql.SQLException t) {
        assertTrue(true)
    }

    def testTable = "test_sql_action1"

    try {
        sql "DROP TABLE IF EXISTS ${testTable}"

        // multi-line sql
        def result1 = sql """
                        CREATE TABLE IF NOT EXISTS ${testTable} (
                            id int
                        )
                        DISTRIBUTED BY HASH(id) BUCKETS 1
                        PROPERTIES (
                          "replication_num" = "1"
                        ) 
                        """

        // DDL/DML return 1 row and 1 column, the only value is update row count
        assertTrue(result1.size() == 1)
        assertTrue(result1[0].size() == 1)
        assertTrue(result1[0][0] == 0, "Create table should update 0 rows")

        def result2 = sql "INSERT INTO test_sql_action1 values(1), (2), (3)"
        assertTrue(result2.size() == 1)
        assertTrue(result2[0].size() == 1)
        assertTrue(result2[0][0] == 3, "Insert should update 3 rows")
    } finally {
        /**
         * try_xxx(args) means:
         *
         * try {
         *    return xxx(args)
         * } catch (Throwable t) {
         *     // do nothing
         *     return null
         * }
         */
        try_sql("DROP TABLE IF EXISTS ${testTable}")

        // you can see the error sql will not throw exception and return
        try {
            def errorSqlResult = try_sql("a b c d e f g")
            assertTrue(errorSqlResult == null)
        } catch (Throwable t) {
            assertTrue(false, "Never catch exception")
        }
    }

    // order_sql(sqlStr) equals to sql(sqlStr, isOrder=true)
    // sort result by string dict
    def list = order_sql """
                select 2
                union all
                select 1
                union all
                select null
                union all
                select 15
                union all
                select 3
                """

    assertEquals(null, list[0][0])
    assertEquals(1, list[1][0])
    assertEquals(15, list[2][0])
    assertEquals(2, list[3][0])
    assertEquals(3, list[4][0])
}

qt action

A qt_action is used to commit SQL and verify the output results based on the .out TSV file.

String sql: the input SQL string
return void

You can find the following sample code in ${DORIS_HOME}/regression-test/suites/demo/qt_action.groovy:

suite("qt_action", "demo") {
    /**
     * qt_xxx sql equals to quickTest(xxx, sql) in which xxx is tag.
     * The result will be compared to the relevant file: ${DORIS_HOME}/regression_test/data/qt_action.out.
     *
     * If you want to generate .out tsv file, you can run -genOut or -forceGenOut.
     * e.g
     *   ${DORIS_HOME}/run-regression-test.sh --run qt_action -genOut
     *   ${DORIS_HOME}/run-regression-test.sh --run qt_action -forceGenOut
     */
    qt_select "select 1, 'beijing' union all select 2, 'shanghai'"

    qt_select2 "select 2"

    // Order result by string dict then compare to .out file.
    // order_qt_xxx sql equals to quickTest(xxx, sql, true).
    order_qt_union_all  """
                select 2
                union all
                select 1
                union all
                select null
                union all
                select 15
                union all
                select 3
                """
}

test action

A test_action is to test with more complicated verification rules, such as to verify the number of rows, execution time, and the existence of errors

Parameters:

String sql: the input SQL string
List<List<Object>> result: provide a list to check if the output is equal to that list
Iterator<Object> resultIterator: provide an iterator to check if the output is equal to that iterator
String resultFile: provide a file URI (a relative path of a local file or a http(s) path) to check if the output is equal to the http response stream. The format is similar to that of the .out file, but there is no header or comment.
String exception: check if the error thrown includes certain strings
long rowNum: check the number of rows
long time: check if the execution time is shorter than this value, which is measured in millisecond
Closure<List<List<Object>>, Throwable, Long, Long> check: self-defined callback validation function, in which you can input the query result, error thrown, and response time. When there is a callback validation function, other verification methods will fail.

You can find the following sample code in ${DORIS_HOME}/regression-test/suites/demo/test_action.groovy:

suite("test_action", "demo") {
    test {
        sql "abcdefg"
        // check exception message contains
        exception "errCode = 2, detailMessage = Syntax error"
    }

    test {
        sql """
            select *
            from (
                select 1 id
                union all
                select 2
            ) a
            order by id"""

        // multi check condition

        // check return 2 rows
        rowNum 2
        // execute time must <= 5000 millisecond
        time 5000
        // check result, must be 2 rows and 1 column, the first row is 1, second is 2
        result(
            [[1], [2]]
        )
    }

    test {
        sql "a b c d e f g"

        // other check will not work because already declared a check callback
        exception "aaaaaaaaa"

        // callback
        check { result, exception, startTime, endTime ->
            // assertXxx() will invoke junit5's Assertions.assertXxx() dynamically
            assertTrue(exception != null)
        }
    }

    test {
        sql  """
                select 2
                union all
                select 1
                union all
                select null
                union all
                select 15
                union all
                select 3
                """

        check { result, ex, startTime, endTime ->
            // same as order_sql(sqlStr)
            result = sortRows(result)

            assertEquals(null, result[0][0])
            assertEquals(1, result[1][0])
            assertEquals(15, result[2][0])
            assertEquals(2, result[3][0])
            assertEquals(3, result[4][0])
        }
    }

    // execute sql and order query result, then compare to iterator
    def selectValues = [1, 2, 3, 4]
    test {
        order true
        sql selectUnionAll(selectValues)
        resultIterator(selectValues.iterator())
    }

    // compare to data/demo/test_action.csv
    test {
        order true
        sql selectUnionAll(selectValues)

        // you can set to http://xxx or https://xxx
        // and compare to http response body
        resultFile "test_action.csv"
    }
}

explain action

An explain_action is used to check if the returned string contains certain strings.

Parameters:

String sql: the SQL that needs to be explained
String contains: check if certain strings are included. You can check for multiple strings at a time.
String notContains: check if certain strings are not included. You can check for multiple strings at a time.
Closure<String> check: self-defined callback validation function, by which you can obtain the returned string. When there is a callback validation function, other verification methods will fail.
Closure<String, Throwable, Long, Long> check: self-defined callback validation function, by which you can obtain the error thrown and response time

You can find the following sample code in ${DORIS_HOME}/regression-test/suites/demo/explain_action.groovy:

suite("explain_action", "demo") {
    explain {
        sql("select 100")

        // contains("OUTPUT EXPRS:<slot 0> 100\n") && contains("PARTITION: UNPARTITIONED\n")
        contains "OUTPUT EXPRS:<slot 0> 100\n"
        contains "PARTITION: UNPARTITIONED\n"
    }

    explain {
        sql("select 100")

        // contains(" 100\n") && !contains("abcdefg") && !("1234567")
        contains " 100\n"
        notContains "abcdefg"
        notContains "1234567"
    }

    explain {
        sql("select 100")
        // simple callback
        check { explainStr -> explainStr.contains("abcdefg") || explainStr.contains(" 100\n") }
    }

    explain {
        sql("a b c d e")
        // callback with exception and time
        check { explainStr, exception, startTime, endTime ->
            // assertXxx() will invoke junit5's Assertions.assertXxx() dynamically
            assertTrue(exception != null)
        }
    }
}

streamLoad action

A streamLoad_action is used to ingest data.

Parameters:

String db: database, set to the defaultDb in regression-conf.groovy
String table: table name
String file: the path of the file to be loaded. It can be a relative path under the data directory, or a HTTP URL.
Iterator<List<Object>> inputIterator: the iterator to be loaded
String inputText: the text to be loaded (rarely used)
InputStream inputStream: the stream to be loaded (rarely used)
long time: check if the exection time is shorter than this value, which is measure in millisecond
void set(String key, String value): set the header of the HTTP request for Stream Load, such as label and columnSeparator.
Closure<String, Throwable, Long, Long> check: self-defined callback validation function, by which you can obtain the returned result, error thrown, and response time. When there is a callback validation function, other verification methods will fail.

You can find the following sample code in ${DORIS_HOME}/regression-test/suites/demo/streamLoad_action.groovy:

suite("streamLoad_action", "demo") {

    def tableName = "test_streamload_action1"

    sql """
            CREATE TABLE IF NOT EXISTS ${tableName} (
                id int,
                name varchar(255)
            )
            DISTRIBUTED BY HASH(id) BUCKETS 1
            PROPERTIES (
              "replication_num" = "1"
            ) 
        """

    streamLoad {
        // you can skip declare db, because a default db already specify in ${DORIS_HOME}/conf/regression-conf.groovy
        // db 'regression_test'
        table tableName

        // default label is UUID:
        // set 'label' UUID.randomUUID().toString()

        // default column_separator is specify in doris fe config, usually is '\t'.
        // this line change to ','
        set 'column_separator', ','

        // relate to ${DORIS_HOME}/regression-test/data/demo/streamload_input.csv.
        // also, you can stream load a http stream, e.g. http://xxx/some.csv
        file 'streamload_input.csv'

        time 10000 // limit inflight 10s

        // stream load action will check result, include Success status, and NumberTotalRows == NumberLoadedRows
    }


    // stream load 100 rows
    def rowCount = 100
    // range: [0, rowCount)
    // or rangeClosed: [0, rowCount]
    def rowIt = range(0, rowCount)
            .mapToObj({i -> [i, "a_" + i]}) // change Long to List<Long, String>
            .iterator()

    streamLoad {
        table tableName
        // also, you can upload a memory iterator
        inputIterator rowIt

        // if declared a check callback, the default check condition will ignore.
        // So you must check all condition
        check { result, exception, startTime, endTime ->
            if (exception != null) {
                throw exception
            }
            log.info("Stream load result: ${result}".toString())
            def json = parseJson(result)
            assertEquals("success", json.Status.toLowerCase())
            assertEquals(json.NumberTotalRows, json.NumberLoadedRows)
            assertTrue(json.NumberLoadedRows > 0 && json.LoadBytes > 0)
        }
    }
}

Other Actions

Other actions include thread, lazyCheck, events, connect, and selectUnionAll. You can find examples for them in ${DORIS_HOME}/regression-test/suites/demo.

Startup Script Examples

# View parameter descriptions for scripts
./run-regression-test.sh h

# View parameter descriptions for framework
./run-regression-test.sh --run -h

# Test all cases
./run-regression-test.sh 

# Delete test framework compilation results and test logs
./run-regression-test.sh --clean

# Test the suite named "sql_action". Currently, the test case file suffix is the same as the suiteName, so the corresponding test case file in this case is sql_action.groovy.
./run-regression-test.sh --run sql_action 

# Test the suite whose name contains 'sql'. **Remember to use single quotation marks.**
./run-regression-test.sh --run '*sql*' 

# Test the "demo" and "perfomance" group
./run-regression-test.sh --run -g 'demo,performance'

# Test the suite "sql_action" in the "demo" group
./run-regression-test.sh --run -g demo -s sql_action

# Test the suite "sql_action" in the demo directory
./run-regression-test.sh --run -d demo -s sql_action

# Test the suites under the demo directory except the "sql_action" suite
./run-regression-test.sh --run -d demo -xs sql_action

# Exclude the test cases in the demo directory
./run-regression-test.sh --run -xd demo

# Exclude the test cases in the "demo" group
./run-regression-test.sh --run -xg demo

# Self-defined configurations
./run-regression-test.sh --run -conf a=b

# Parallel execution
./run-regression-test.sh --run -parallel 5 -suiteParallel 10 -actionParallel 20

Generate `.out` File Based on the Query Result

# Automatically generate an .out file for the sql_action test case based on the query result. Ignore this if the .out file already exists.
./run-regression-test.sh --run sql_action -genOut

# Automatically generate an .out file for the sql_action test case based on the query result. If an .out file already exist, it will be overwritten.
./run-regression-test.sh --run sql_action -forceGenOut

Suite Plug-in

Sometimes there might be a need to expand the suite classes, but it is not easy to change the source code of suite classes. In this case, you might do that by plug-ins. The default directory for plug-ins is ${DORIS_HOME}/regression-test/plugins, in whic you can define how to expand suite classes with a groovy script. For example, this is how to add a testPlugin function for log printing to plugin_example.groovy:

import org.apache.doris.regression.suite.Suite

// Register `testPlugin` function to Suite,
// and invoke in ${DORIS_HOME}/regression-test/suites/demo/test_plugin.groovy
Suite.metaClass.testPlugin = { String info /* param */ ->

    // Which suite invoke current function?
    Suite suite = delegate as Suite

    // Function body
    suite.getLogger().info("Test plugin: suiteName: ${suite.name}, info: ${info}".toString())

    // Optional return value
    return "OK"
}

logger.info("Added 'testPlugin' function to Suite")

After adding the testPlugin function, you can use it in a regular test case. For example, in ${DORIS_HOME}/regression-test/suites/demo/test_plugin.groovy:

suite("test_plugin", "demo") {
    // register testPlugin function in ${DORIS_HOME}/regression-test/plugins/plugin_example.groovy
    def result = testPlugin("message from suite")
    assertEquals("OK", result)
}

CI/CD Support

TeamCity

You can use TeamCity to recognize Service Message via stdout. If you start the regression test framework using the --teamcity parameter, the TeamCity Service Message will be printed in stdout. TeamCity will automatically read the event logs in stdout, and show Tests in the current pipeline, in which there will be the test and logs. Therefore, you only need to configure the following command to start the regression test framework. In the following snippet, -Dteamcity.enableStdErr=false means to print error logs to stdout, too, so that the logs can be chronologically organized.

JAVA_OPTS="-Dteamcity.enableStdErr=${enableStdErr}" ./run-regression-test.sh --teamcity --run

E2E Test with External Data Sources

Doris supports queries on external data sources, so the regression testing framework allows users to build external data sources using Docker Compose, so that they can run end-to-end tests with external data sources.

Preparation

To begin with, modify CONTAINER_UID in docker/thirdparties/custom_settings.env. For example, doris-10002-18sda1-. The follow-up startup scripts will replace the corresponding names in Docker Compose to ensure consistency across multiple containers environment.

Before starting the containers, you need to check the network configuration of your server/cloud host to ensure that the mapping relationship between hostname and host IP address (hostname -i) is configured in /etc/hosts.

It should look like: 10.0.0.46 iZj6cbwlx5pl6y0681t6scZ iZj6cbwlx5pl6y0681t6scZ, which are IP address (output of hostname -i command, usually the IP of eth0), hostname (output of hostname command), and alias (same as the previous one).
Start the Container

So far, Doris has supported Docker Compose for data sources including Elasticsearch, MySQL, PostgreSQL, Hive, SQLServer, Oracle, Iceberg, Hudi, and Trino. The relevant files can be found in the directory docker/thirdparties/docker-compose.

By default, you can use the following command to start the Docker containers for all external data sources:

(Note that for Hive and Hudi containers, you also need to download the pre-built data files. See the relevant documentation of Hive and Hudi.)
```
cd docker/thirdparties && sh run-thirdparties-docker.sh
```
Executing this command requires root or sudo privilege. If the command returns, that means all containers are started. You can check on them by inputing a docker ps -a command.

During the container startup process, you can use the docker logs -f <container-name> command to view the container logs.

To stop all containers, you can use the following command:
```
cd docker/thirdparties && sh run-thirdparties-docker.sh --stop
```
To start or stop some specific components, you can use the following commands:
```
cd docker/thirdparties
# Start MySQL
sh run-thirdparties-docker.sh -c mysql
# Start MySQL, PostgreSQL, Iceberg
sh run-thirdparties-docker.sh -c mysql,pg,iceberg
# Stop MySQL, PostgreSQL, Iceberg
sh run-thirdparties-docker.sh -c mysql,pg,iceberg --stop
```
1. MySQL
  
  You can find the MySQL-related Docker Compose files in docker/thirdparties/docker-compose/mysql .
  - mysql-5.7.yaml.tpl: Docker Compose file template, no modification required. The default username and password is root/123456
  - mysql-5.7.env: Configuration file, in which you can configure the external port of the MySQL container (default number: 3316).
  - init/: SQL files in this directory will be automatically executed once the container is created.
  - data/: The container will be mounted to this local data directory after it is started. The run-thirdparties-docker.sh script will clear and rebuild this directory every time it is started.
2. PostgreSQL
  
  You can find the PostgreSQL-related Docker Compose files in docker/thirdparties/docker-compose/postgresql .
  - postgresql-14.yaml.tpl: Docker Compose file template, no modification required. The default username and password is postgres/123456
  - postgresql-14.env: Configuration file, in which you can configure the external port of the PostgreSQL container (default number: 5442).
  - init/: SQL files in this directory will be automatically executed once the container is created. By default, that includes the creation of database, table, and some data input.
  - data/: The container will be mounted to this local data directory after it is started. The run-thirdparties-docker.sh script will clear and rebuild this directory every time it is started.
3. Hive
  
  You can find the Hive-related Docker Compose files in docker/thirdparties/docker-compose/hive, supporting both Hive2 and Hive3.
  - hive-2x.yaml.tpl, hive-3x.yaml.tpl: Docker Compose file templates, no modification required.
  - hadoop-hive.env.tpl, hadoop-hive-2x.env.tpl and hadoop-hive-3x.env.tpl: Configuration file templates, no modification required.
  - hive-2x_settings.env: Hive2 initialization configuration script, which will be automatically called when run-thirdparties-docker.sh starts. You can modify four external ports in this file: FS_PORT, HMS_PORT, HS_PORT, and PG_PORT, which correspond to hive2HdfsPort, hive2HmsPort, hive2ServerPort, and hive2PgPort in regression-conf.groovy. The first two are for Hadoop's defaultFs and Hive metastore ports, with defaults of 8020 and 9083.
  - hive-3x_settings.env: Hive3 initialization configuration script, which will be automatically called when run-thirdparties-docker.sh starts. You can modify four external ports in this file: FS_PORT, HMS_PORT, HS_PORT, and PG_PORT, which correspond to hive3HdfsPort, hive3HmsPort, hive3ServerPort, and hive3PgPort in regression-conf.groovy. The first two are for Hadoop's defaultFs and Hive metastore ports, with defaults of 8320 and 9383.
  - The scripts/ directory will be mounted to the container once it is started. Files in this directory require no modifications. Note that you need to download the pre-built files before you start the container:
    
    Download files from https://doris-regression-hk.oss-cn-hongkong.aliyuncs.com/regression/datalake/pipeline_data/tpch1.db.tar.gz to the scripts/ directory and decompress.
4. Elasticsearch
  
  You can find three Docker images (for Elasticsearch 6, Elasticsearch 7, and Elasticsearch 8) in docker/thirdparties/docker-compose/elasticsearch/ .
  - es.yaml.tpl: Docker Compose file template, three versions included, no modifications required.
  - es.env: Configuration file, in which you need to configure the port number for Elasticsearch.
  - scripts : In this directory, you can find the initialization script after the image is started.
5. Oracle
  
  You can find the image for Oracle 11 in docker/thirdparties/docker-compose/oracle/.
  - oracle-11.yaml.tpl: Docker Compose file template, no modifications required.
  - oracle-11.env: Configure the external port for Oracle (default number: 1521).
6. SQLServer
  
  You can find the image for SQLServer 2022 in docker/thirdparties/docker-compose/sqlserver/ .
  - sqlserver.yaml.tpl: Docker Compose file template, no modifications required.
  - sqlserver.env: Configure the external port for SQLServer (default number: 1433).
7. ClickHouse
  
  You can find the image for ClickHouse 22 in docker/thirdparties/docker-compose/clickhouse/.
  - clickhouse.yaml.tpl: Docker Compose file template, no modifications required.
  - clickhouse.env: Configure the external port for ClickHouse (default number: 8123).
8. Iceberg
  
  You can find the image of Iceberg + Spark + Minio in docker/thirdparties/docker-compose/iceberg/.
  - iceberg.yaml.tpl: Docker Compose file template, no modifications required.
  - entrypoint.sh.tpl: The template of the initialization script after the image is started, no modifications required.
  - spark-defaults.conf.tpl: Configuration file template for Spark, no modifications required.
  - iceberg.env: Configuration file for external ports. You need to modify every external port to avoid conflicts.
  After the image is started, execute the following command to start spark-sql:
  
  docker exec -it doris-xx-spark-iceberg spark-sql
  
  In this command, doris-xx-spark-iceberg is the container name.
  
  Execution examples for spark-sql iceberg:
```
create database db1;
show databases;
create table db1.test1(k1 bigint, k2 bigint, k3 string) partitioned by (k1);
insert into db1.test1 values(1,2,'abc');
select * from db1.test1;
quit;
```
  You can also access by spark-shell:
```
docker exec -it doris-xx-spark-iceberg spark-shell

spark.sql(s"create database db1")
spark.sql(s"show databases").show()
spark.sql(s"create table db1.test1(k1 bigint, k2 bigint, k3 string) partitioned by (k1)").show()
spark.sql(s"show tables from db1").show()
spark.sql(s"insert into db1.test1 values(1,2,'abc')").show()
spark.sql(s"select * from db1.test1").show()
:q
```
  For more usage guide, see Tabular Documentation.
9. Hudi
  
  You can find the Hudi-related Docker Compose file in docker/thirdparties/docker-compose/hudi.
  - hudi.yaml.tpl: Docker Compose file template, no modifications required.
  - hadoop.env: Configuration file template, no modifications required.
  - The scripts/ directory will be mounted to the container once it is started. Files in this directory require no modifications. Note that you need to download the pre-built files before you start the container:
    
    Download files from https://doris-build-hk-1308700295.cos.ap-hongkong.myqcloud.com/regression/load/hudi/hudi_docker_compose_attached_file.zip to the scripts/ directory and decompress.
  - Before starting, add the following configurations to /etc/hosts to avoid UnknownHostException:
```
127.0.0.1 adhoc-1
127.0.0.1 adhoc-2
127.0.0.1 namenode
127.0.0.1 datanode1
127.0.0.1 hiveserver
127.0.0.1 hivemetastore
127.0.0.1 sparkmaster
```
  After starting, you can execute the following command to start a Hive query:
```
docker exec -it adhoc-2 /bin/bash

beeline -u jdbc:hive2://hiveserver:10000 \
--hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \
--hiveconf hive.stats.autogather=false

show tables;
show partitions stock_ticks_mor_rt;
select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
exit;
```
  You can also access by spark-shell:
```
docker exec -it adhoc-1 /bin/bash

$SPARK_INSTALL/bin/spark-shell \
  --jars /var/scripts/hudi_docker_compose_attached_file/jar/hoodie-hive-sync-bundle.jar \
  --master local[2] \
  --driver-class-path $HADOOP_CONF_DIR \
  --conf spark.sql.hive.convertMetastoreParquet=false \
  --deploy-mode client \
  --driver-memory 1G \
  --executor-memory 3G \
  --num-executors 1

spark.sql("show tables").show(100, false)
spark.sql("select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG'").show(100, false)
spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG'").show(100, false)
spark.sql("select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG'").show(100, false)
spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG'").show(100, false)
:q
```
  For more usage guide, see Hudi Documentation.
10. Trino You can find the Trino-related Docker Compose file in docker/thirdparties/docker-compose/trino.
  
  Template files:
  - gen_env.sh.tpl: This is used to generate HDFS-related port numbers, no modifications required, but you can change the port numbers in the case of port conflicts.
  - hive.properties.tpl: This is used to configure the Trino catalog, no modifications required.
  - trino_hive.env.tpl: Environment configurations of Hive, no modifications required.
  - trino_hive.yaml.tpl: Docker Compose file, no modifications required.
    
    After the Trino Docker is started, a Trino + Hive Catalog will be configured, and then Trino will have two catalogs.
  1. Hive
  2. TPCH (self-contained in Trino Docker)
  For more usage guide, see Trino Documentation.
Run the regression test

Regression test for external data sources is disabled by default. You can enable it by modifying the following configurations in regression-test/conf/regression-conf.groovy :
- enableJdbcTest: This is to enable test for JDBC external tables. For this purpose, you need to start the MySQL and PostgreSQL containers.
- mysql_57_port and pg_14_port are the external port of MySQL and PostgreSQL, respectively. Default port numbers: 3316 and 5442.
- enableHiveTest: This is to enable test for Hive external tables. For this purpose, you need to start the Hive container.
- hive2HmsPort is the external port for Hive2 metastore. Default number: 9083.
- hive2HdfsPort is the external port for Hive2 HDFS namenode. Default number: 8020.
- enableEsTest: This is to enable test for Elasticsearch external tables. For this purpose, you need to start the Elasticsearch container.
- es_6_port: Port for Elasticsearch 6.
- es_7_port: Port for Elasticsearch 7.
- es_8_port: Port for Elasticsearch 8.

Basic Concepts​

Steps​

Directory Structure​

Default Configurations of the Framework​

Steps to Write Test Cases​

Action​

sql action​

qt action​

test action​

explain action​

streamLoad action​

Other Actions​

Startup Script Examples​

Generate .out File Based on the Query Result​

Suite Plug-in​

CI/CD Support​

TeamCity​

E2E Test with External Data Sources​

Basic Concepts

Steps

Directory Structure

Default Configurations of the Framework

Steps to Write Test Cases

Action

sql action

qt action

test action

explain action

streamLoad action

Other Actions

Startup Script Examples

Generate `.out` File Based on the Query Result

Suite Plug-in

CI/CD Support

TeamCity

E2E Test with External Data Sources