TPC-DS Benchmark
TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) is a benchmark test that focuses on decision support and aims to evaluate the performance of data warehousing and analytics systems. It was developed by the Transaction Processing Performance Council (TPC) organization to compare the capabilities of different systems in handling complex queries and large-scale data analysis.
The design goal of TPC-DS is to simulate complex decision support workloads in the real world. It tests the performance of systems through a series of complex queries and data operations, including joins, aggregations, sorting, filtering, subqueries, and more. These query patterns cover various scenarios ranging from simple to complex, such as report generation, data mining, and OLAP (Online Analytical Processing).
This document mainly introduces the performance of Doris on the TPC-DS 1000G test set.
On 99 queries on the TPC-DS standard test data set, we conducted a comparison test based on Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode) and Apache Doris 2.1.7-rc03 versions. The performance of the integrated storage and computing mode in version 3.x is based on version 2.1.x
1. Hardware Environmentβ
Hardware | Configuration Instructions |
---|---|
Number of Machines | 4 Aliyun Virtual Machine (1FEοΌ3BEs) |
CPU | Intel Xeon (Ice Lake) Platinum 8369B 32C |
Memory | 128G |
Disk | Enterprise SSD (PL0) |
2. Software Environmentβ
- Doris Deployed 3BEs and 1FE
- Kernel Version: Linux version 5.15.0-101-generic
- OS version: Ubuntu 20.04 LTS (Focal Fossa)
- Doris software version: Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode), Apache Doris 2.1.7-rc03
- JDK: openjdk version "17.0.2"
3. Test Data Volumeβ
The TPC-DS 1000G data generated by the simulation of the entire test are respectively imported into Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode) and Apache Doris 2.1.7-rc03 for testing. The following is the relevant description and data volume of the table.
TPC-DS Table Name | Rows |
---|---|
customer_demographics | 1,920,800 |
reason | 65 |
warehouse | 20 |
date_dim | 73,049 |
catalog_sales | 1,439,980,416 |
call_center | 42 |
inventory | 783,000,000 |
catalog_returns | 143,996,756 |
household_demographics | 7,200 |
customer_address | 6,000,000 |
income_band | 20 |
catalog_page | 30,000 |
item | 300,000 |
web_returns | 71,997,522 |
web_site | 54 |
promotion | 1,500 |
web_sales | 720,000,376 |
store | 1,002 |
web_page | 3,000 |
time_dim | 86,400 |
store_returns | 287,999,764 |
store_sales | 2,879,987,999 |
ship_mode | 20 |
customer | 12,000,000 |
4. Test SQLβ
TPC-DS 99 test query statements : TPC-DS-Query-SQL
5. Test Resultsβ
Here we use Apache Doris 3.0.3-rc03 (Compute-Storage Coupled Mode) and Apache Doris 2.1.7-rc03 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:
Query | Apache Doris 3.0.3-rc03 Compute-Storage Coupled Mode (ms) | Apache Doris 2.1.7-rc03 (ms) |
---|---|---|
query01 | 580 | 630 |
query02 | 5540 | 4930 |
query03 | 350 | 360 |
query04 | 10790 | 11070 |
query05 | 710 | 620 |
query06 | 230 | 220 |
query07 | 590 | 550 |
query08 | 350 | 330 |
query09 | 7520 | 6830 |
query10 | 390 | 370 |
query11 | 6560 | 6960 |
query12 | 120 | 100 |
query13 | 780 | 790 |
query14 | 13200 | 13470 |
query15 | 400 | 510 |
query16 | 410 | 520 |
query17 | 1300 | 1310 |
query18 | 650 | 560 |
query19 | 250 | 200 |
query20 | 110 | 100 |
query21 | 110 | 80 |
query22 | 1570 | 2300 |
query23 | 37180 | 38240 |
query24 | 7470 | 8340 |
query25 | 920 | 780 |
query26 | 200 | 200 |
query27 | 550 | 530 |
query28 | 7300 | 5940 |
query29 | 920 | 940 |
query30 | 300 | 270 |
query31 | 2000 | 1890 |
query32 | 70 | 60 |
query33 | 400 | 350 |
query34 | 760 | 750 |
query35 | 1290 | 1370 |
query36 | 460 | 530 |
query37 | 80 | 60 |
query38 | 5450 | 7520 |
query39 | 760 | 560 |
query40 | 140 | 150 |
query41 | 50 | 50 |
query42 | 110 | 100 |
query43 | 1170 | 1150 |
query44 | 2120 | 2020 |
query45 | 280 | 430 |
query46 | 1390 | 1250 |
query47 | 2160 | 2660 |
query48 | 660 | 630 |
query49 | 810 | 730 |
query50 | 1570 | 1640 |
query51 | 6030 | 6430 |
query52 | 120 | 110 |
query53 | 280 | 250 |
query54 | 1540 | 1280 |
query55 | 130 | 110 |
query56 | 300 | 290 |
query57 | 1240 | 1480 |
query58 | 260 | 240 |
query59 | 10120 | 7760 |
query60 | 370 | 380 |
query61 | 560 | 540 |
query62 | 920 | 740 |
query63 | 230 | 210 |
query64 | 1660 | 5790 |
query65 | 4800 | 4900 |
query66 | 400 | 480 |
query67 | 24190 | 27320 |
query68 | 1400 | 1600 |
query69 | 1170 | 380 |
query70 | 3160 | 3480 |
query71 | 440 | 460 |
query72 | 4090 | 3160 |
query73 | 660 | 660 |
query74 | 5720 | 5990 |
query75 | 4560 | 4610 |
query76 | 1800 | 1590 |
query77 | 330 | 300 |
query78 | 16300 | 17970 |
query79 | 3160 | 3040 |
query80 | 590 | 570 |
query81 | 540 | 460 |
query82 | 320 | 270 |
query83 | 230 | 220 |
query84 | 130 | 130 |
query85 | 780 | 520 |
query86 | 660 | 760 |
query87 | 6200 | 8000 |
query88 | 5620 | 5560 |
query89 | 400 | 430 |
query90 | 150 | 150 |
query91 | 160 | 150 |
query92 | 50 | 40 |
query93 | 2380 | 2440 |
query94 | 290 | 340 |
query95 | 410 | 350 |
query96 | 680 | 660 |
query97 | 4870 | 5020 |
query98 | 200 | 190 |
query99 | 1940 | 1560 |
Total | 251620 | 261320 |
6. Environmental Preparationβ
Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE is recommended).
7. Data Preparationβ
7.1 Download and Install TPC-DS Data Generation Toolβ
Execute the following script to download and compile the tpcds-tools tool.
sh bin/build-tpcds-dbgen.sh
7.2 Generating the TPC-DS Test Setβ
Execute the following script to generate the TPC-H dataset:
sh bin/gen-tpcds-data.sh -s 1000
Note 1: Check the script help via
sh gen-tpcds-data.sh -h
.Note 2: The data will be generated under the
tpcds-data/
directory with the suffix.dat
. The total file size is about 1000GB and may need a few minutes to an hour to generate.Note 3: A standard test data set of 100G is generated by default.
7.3 Create Tableβ
7.3.1 Prepare the doris-cluster.conf
Fileβ
Before import the script, you need to write the FEβs ip port and other information in the doris-cluster.conf
file.
The file is located under ${DORIS_HOME}/tools/tpcds-tools/conf/
.
The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:
# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
export DB='tpcds'
Execute the Following Script to Generate and Create TPC-H Tableβ
sh bin/create-tpcds-tables.sh -s 1000
Or copy the table creation statement in create-tpcds-tables and excute it in Doris.
7.4 Import Dataβ
Please perform data import with the following command:
sh bin/load-tpcds-data.sh
7.5 Query Testβ
7.5.1 Executing Query Scriptsβ
Execute the above test SQL or execute the following command
sh bin/run-tpcds-queries.sh -s 1000
7.5.2 Single SQL Executionβ
You can also retrieve the latest SQL from the code repository. The address for the latest test query statements of TPC-DS.