Star Schema Benchmark
Star Schema Benchmark(SSB) 是一个轻量级的数仓场景下的性能测试集。SSB 基于 TPC-H 提供了一个简化版的星型模型数据集,主要用于测试在星型模型下,多表关联查询的性能表现。
本文档主要介绍 Apache Doris 在 SSB SF1000 测试集上的性能表现。
在 SSB 标准测试数据集上的 13 个查询上,我们对 Apache Doris 进行了测试。
1. 硬件环境
| 硬件 | 配置说明 |
|---|---|
| 机器数量 | 4 台阿里云g9i实例(1 个 FE,3 个 BE) |
| CPU | Intel® Xeon® Granite Rapids 32 核 |
| 内存 | 128G |
| 磁盘 | 阿里云 ESSD (PL0) |
2. 软件环境
- Doris 部署 3BE 1FE
- 内核版本:Linux version 5.15.0-101-generic
- 操作系统版本:Ubuntu 20.04 LTS (Focal Fossa)
- JDK:openjdk 17.0.2
3. 测试数据量
| SSB 表名 | 行数 | 备注 |
|---|---|---|
| lineorder | 5,999,989,709 | 商品订单明细表表 |
| customer | 30,000,000 | 客户信息表 |
| part | 2,000,000 | 零件信息表 |
| supplier | 2,000,000 | 供应商信息表 |
| dates | 2,556 | 日期表 |
| lineorder_flat | 5,999,989,709 | 数据展平后的宽表 |
4. 标准 SSB 测试结果
| Query | Doris 2.1.11 (ms) | Doris 3.1.4 (ms) | Doris 4.0.5 (ms) | Doris 4.1.0 (ms) |
|---|---|---|---|---|
| Total | 13270 | 11591 | 12495 | 10934 |
| q1.1 | 140 | 179 | 151 | 126 |
| q1.2 | 70 | 105 | 114 | 82 |
| q1.3 | 70 | 96 | 107 | 79 |
| q2.1 | 1520 | 1066 | 1263 | 1096 |
| q2.2 | 1630 | 1425 | 1311 | 1293 |
| q2.3 | 1250 | 1086 | 1199 | 1008 |
| q3.1 | 2470 | 2020 | 2174 | 2142 |
| q3.2 | 1450 | 1165 | 1484 | 1395 |
| q3.3 | 870 | 847 | 1080 | 314 |
| q3.4 | 130 | 167 | 148 | 68 |
| q4.1 | 2860 | 2485 | 2517 | 2427 |
| q4.2 | 520 | 597 | 563 | 563 |
| q4.3 | 290 | 353 | 384 | 341 |
5. 环境准备
请先参照 官方文档 进行 Apache Doris 的安装部署,以获得一个正常运行中的 Doris 集群(至少包含 1 FE 1 BE,推荐 1 FE 3 BE)。
6. 数据准备
6.1 下载安装 SSB 数据生成工具。
执行以下脚本下载并编译 ssb-tools 工具。
sh bin/build-ssb-dbgen.sh
安装成功后,将在 ssb-dbgen/ 目录下生成 dbgen 二进制文件。
6.2 生成 SSB 测试集
执行以下脚本生成 SSB 数据集:
sh bin/gen-ssb-data.sh -s 1000
注 1:通过
sh gen-ssb-data.sh -h查看脚本帮助。注 2:数据会以
.tbl为后缀生成在ssb-data/目录下。文件总大小约 600GB。生成时间可能在数分钟到 1 小时不等。注 3:默认生成 SF100 的标准测试数据集
6.3 建表
6.3.1 准备 doris-cluster.conf 文件
在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 doris-cluster.conf 文件中。
文件位置在 ${DORIS_HOME}/tools/ssb-tools/conf/ 目录下。
文件内容包括 FE 的 ip,HTTP 端口,用户名,密码以及待导入数据的 DB 名称:
# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where SSB tables located
export DB='ssb'
6.3.2 执行以下脚本生成创建 SSB 表
sh bin/create-ssb-tables.sh -s 1000
或者复制 create-ssb-tables.sql 和 create-ssb-flat-table.sql 中的建表语句,在 MySQL 客户端中执行。
6.4 导入数据
我们使用以下命令完成 SSB 测试集所有数据导入及 SSB FLAT 宽表数据合成并导入到表里。
sh bin/load-ssb-data.sh
6.5 检查导入数据
select count(*) from part;
select count(*) from customer;
select count(*) from supplier;
select count(*) from dates;
select count(*) from lineorder;
select count(*) from lineorder_flat;
6.6 查询测试
SSB-FlAT 查询语句:ssb-flat-queries
标准 SSB 查询语句:ssb-queries
6.6.1 SSB 标准测试 SQL
--Q1.1
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_year = 1993
AND lo_discount BETWEEN 1 AND 3
AND lo_quantity < 25;
--Q1.2
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_yearmonth = 'Jan1994'
AND lo_discount BETWEEN 4 AND 6
AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT
SUM(lo_extendedprice * lo_discount) AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_weeknuminyear = 6
AND d_year = 1994
AND lo_discount BETWEEN 5 AND 7
AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_category = 'MFGR#12'
AND s_region = 'AMERICA'
GROUP BY d_year, p_brand
ORDER BY p_brand;
--Q2.2
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
AND s_region = 'ASIA'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
--Q2.3
SELECT SUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand = 'MFGR#2239'
AND s_region = 'EUROPE'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;
--Q3.1
SELECT
c_nation,
s_nation,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_region = 'ASIA'
AND s_region = 'ASIA'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_nation, s_nation, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.2
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_nation = 'UNITED STATES'
AND s_nation = 'UNITED STATES'
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.3
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_year >= 1992
AND d_year <= 1997
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q3.4
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND (
c_city = 'UNITED KI1'
OR c_city = 'UNITED KI5'
)
AND (
s_city = 'UNITED KI1'
OR s_city = 'UNITED KI5'
)
AND d_yearmonth = 'Dec1997'
GROUP BY c_city, s_city, d_year
ORDER BY d_year ASC, REVENUE DESC;
--Q4.1
SELECT
d_year,
c_nation,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, c_nation
ORDER BY d_year, c_nation;
--Q4.2
SELECT
d_year,
s_nation,
p_category,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region = 'AMERICA'
AND s_region = 'AMERICA'
AND (
d_year = 1997
OR d_year = 1998
)
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
)
GROUP BY d_year, s_nation, p_category
ORDER BY d_year, s_nation, p_category;
--Q4.3
SELECT
d_year,
s_city,
p_brand,
SUM(lo_revenue - lo_supplycost) AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND s_nation = 'UNITED STATES'
AND (
d_year = 1997
OR d_year = 1998
)
AND p_category = 'MFGR#14'
GROUP BY d_year, s_city, p_brand
ORDER BY d_year, s_city, p_brand;