Skip to main content

Diagnostic Tools

Overview​

Efficient and effective performance diagnostic tools are crucial for database system tuning, as they determine whether problematic business SQL queries can be quickly identified, and subsequently, performance bottlenecks can be rapidly pinpointed and resolved, ensuring that the database system meets its Service Level Agreements (SLAs).

Currently, Doris considers SQL queries with execution times exceeding 5 seconds as slow SQL by default. This threshold can be configured via config.qe_slow_log_ms. Doris currently offers the following three diagnostic channels to help quickly identify slow SQL queries with performance issues:

Doris Manager Logs​

The log module in Doris Manager provides a slow SQL filtering function. Users can view slow SQL by selecting the fe.audit.log on a specific FE node. By simply entering slow_query in the search box, the historical slow SQL information of the current system will be displayed on the page, as shown in the figure below:

Doris Manager Monitoring and Logging

Audit Log​

Currently, Doris FE provides four types of Audit Logs, including slow_query, query, load, and stream_load. Besides accessing the logs through the log page on the cluster where the Manager service is installed and deployed, Audit Logs can also be directly obtained by accessing the fe/log/fe.audit.log file on the node where FE is located.

By directly searching for the slow_query tag in fe.audit.log, you can quickly filter out slow-executing SQL queries, as shown below:

2024-07-18 11:23:13,042 [slow_query] |Client=127.0.0.1:63510|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=11603|ScanBytes=236667379712|ScanRows=13649979418|ReturnRows=100|StmtId=1689|QueryId=91ff336304f14182-9ca537eee75b3856|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select     c_name,     c_custkey,     o_orderkey,     o_orderdate,     o_totalprice,     sum(l_quantity) from     customer,     orders,     lineitem where     o_orderkey  in  (         select             l_orderkey         from             lineitem         group  by             l_orderkey  having                 sum(l_quantity)  >  300     )     and  c_custkey  =  o_custkey     and  o_orderkey  =  l_orderkey group  by     c_name,     c_custkey,     o_orderkey,     o_orderdate,     o_totalprice order  by     o_totalprice  desc,     o_orderdate limit  100|CpuTimeMS=918556|ShuffleSendBytes=3267419|ShuffleSendRows=89668|SqlHash=b4e1de9f251214a30188180f37907f7d|peakMemoryBytes=38720935552|SqlDigest=d41d8cd98f00b204e9800998ecf8427e|cloudClusterName=UNKNOWN|TraceId=|WorkloadGroup=normal|FuzzyVariables=|scanBytesFromLocalStorage=0|scanBytesFromRemoteStorage=0
2024-07-18 11:23:33,043 [slow_query] |Client=127.0.0.1:26672|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=8978|ScanBytes=334985555968|ScanRows=10717654374|ReturnRows=100|StmtId=1815|QueryId=6e1fae453cb04d9a-b1e5f94d9cea1885|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_name order by numwait desc, s_name limit 100|CpuTimeMS=990127|ShuffleSendBytes=59208164|ShuffleSendRows=3651504|SqlHash=f8a30e4182d72cce3eff6cb385005b1f|peakMemoryBytes=10495660672|SqlDigest=d41d8cd98f00b204e9800998ecf8427e|cloudClusterName=UNKNOWN|TraceId=|WorkloadGroup=normal|FuzzyVariables=|scanBytesFromLocalStorage=0|scanBytesFromRemoteStorage=0
2024-07-18 11:23:41,044 [slow_query] |Client=127.0.0.1:26684|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=8514|ScanBytes=334986551296|ScanRows=10717654374|ReturnRows=100|StmtId=1833|QueryId=4f91483464ce4aa8-beeed7dcb8675bc8|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_name order by numwait desc, s_name limit 100|CpuTimeMS=925841|ShuffleSendBytes=59223190|ShuffleSendRows=3651602|SqlHash=f8a30e4182d72cce3eff6cb385005b1f|peakMemoryBytes=10505123104|SqlDigest=d41d8cd98f00b204e9800998ecf8427e|cloudClusterName=UNKNOWN|TraceId=|WorkloadGroup=normal|FuzzyVariables=|scanBytesFromLocalStorage=0|scanBytesFromRemoteStorage=0
2024-07-18 11:23:49,044 [slow_query] |Client=127.0.0.1:10748|User=root|Ctl=internal|Db=tpch_sf1000|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=8660|ScanBytes=334987673600|ScanRows=10717654374|ReturnRows=100|StmtId=1851|QueryId=4599cb1bab204f80-ac430dd78b45e3da|IsQuery=true|isNereids=true|feIp=172.21.0.10|Stmt=select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_name order by numwait desc, s_name limit 100|CpuTimeMS=932664|ShuffleSendBytes=59223178|ShuffleSendRows=3651991|SqlHash=f8a30e4182d72cce3eff6cb385005b1f|peakMemoryBytes=10532849344|SqlDigest=d41d8cd98f00b204e9800998ecf8427e|cloudClusterName=UNKNOWN|TraceId=|WorkloadGroup=normal|FuzzyVariables=|scanBytesFromLocalStorage=0|scanBytesFromRemoteStorage=0

The slow SQL obtained through fe.audit.log allows users to easily access detailed information such as execution time, number of rows scanned, number of rows returned, and the SQL statement itself, laying the foundation for further reproducing and locating performance issues.

audit_log System Table​

Starting from Doris version 2.1, the audit_log system table is provided under the __internal_schema database for users to view the execution status of SQL queries. Before using it, the global configuration set global enable_audit_plugin=true; needs to be enabled (this switch is disabled by default).

mysql> use __internal_schema;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-----------------------------+
| Tables_in___internal_schema |
+-----------------------------+
| audit_log |
| column_statistics |
| histogram_statistics |
| partition_statistics |
+-----------------------------+
4 rows in set (0.00 sec)

mysql> desc audit_log;
+-------------------+--------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-------+---------+-------+
| query_id | varchar(48) | Yes | true | NULL | |
| time | datetime | Yes | true | NULL | |
| client_ip | varchar(128) | Yes | true | NULL | |
| user | varchar(128) | Yes | false | NULL | NONE |
| catalog | varchar(128) | Yes | false | NULL | NONE |
| db | varchar(128) | Yes | false | NULL | NONE |
| state | varchar(128) | Yes | false | NULL | NONE |
| error_code | int | Yes | false | NULL | NONE |
| error_message | text | Yes | false | NULL | NONE |
| query_time | bigint | Yes | false | NULL | NONE |
| scan_bytes | bigint | Yes | false | NULL | NONE |
| scan_rows | bigint | Yes | false | NULL | NONE |
| return_rows | bigint | Yes | false | NULL | NONE |
| stmt_id | bigint | Yes | false | NULL | NONE |
| is_query | tinyint | Yes | false | NULL | NONE |
| frontend_ip | varchar(128) | Yes | false | NULL | NONE |
| cpu_time_ms | bigint | Yes | false | NULL | NONE |
| sql_hash | varchar(128) | Yes | false | NULL | NONE |
| sql_digest | varchar(128) | Yes | false | NULL | NONE |
| peak_memory_bytes | bigint | Yes | false | NULL | NONE |
| stmt | text | Yes | false | NULL | NONE |
+-------------------+--------------+------+-------+---------+-------+

Through the audit_log internal table, users can query detailed SQL execution information and perform detailed statistical analysis such as slow query filtering.

Summary​

Doris Manager logs, audit logs, and the audit_log system table provide capabilities such as automatic or manual filtering of slow SQL queries, as well as fine-grained statistical analysis of SQL execution information. These tools offer powerful support for systematic performance diagnosis and tuning.