Descriptionβ
The EXPLAIN statement displays Doris's query execution plan for a given query. Doris's query optimizer aims to create an efficient plan using statistical data, data characteristics, and features like HASH JOIN, partitioning, and bucketing. However, due to theoretical and practical constraints, the plan may sometimes under perform.
To improve performance, it's essential to analyze the current plan. This article teaches how to use the EXPLAIN statement for optimization.
{EXPLAIN | DESC} [VERBOSE] <query_block>
Required Parametersβ
<query_block>
This is the query statement for which you want the explain plan.
Optional Parametersβ
[VERBOSE]
Whether to display detailed information is determined by the VERBOSE
specification. With VERBOSE
, comprehensive details are shown, including specifics on each operator, the tuple IDs they use, and detailed descriptions for each tuple. Without it, concise information is provided.
Return Resultsβ
Basic Conceptsβ
To better understand the information displayed by EXPLAIN
, let's introduce a few core concepts of the Doris execution plan.
Name | Explanation |
---|
PLAN | Execution plan. A query is translated into an execution plan by the execution planner, which is then executed by the execution engine. |
FRAGMENT | Execution fragment. Since Doris is a distributed execution engine, a complete execution plan is divided into multiple single-node execution fragments. A FRAGMENT table represents a complete single-node execution fragment. Multiple FRAGMENTS combine to form a complete PLAN. |
PLAN NODE | Operator. The smallest unit of the execution plan. A FRAGMENT consists of multiple operators. Each operator is responsible for a specific execution logic, such as aggregation, joins, etc. |
Return Result Structureβ
The result of the Doris EXPLAIN
statement is a complete PLAN. Within the PLAN, FRAGMENTS are ordered from back to front based on the execution sequence. Within each FRAGMENT, operators (PLAN NODES) are also ordered from back to front based on the execution sequence.
An example is provided below:
+
| Explain String(Nereids Planner) |
+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS: |
| cnt[
| cnt[
| PARTITION: UNPARTITIONED |
| |
| HAS_COLO_PLAN_NODE: false |
| |
| VRESULT SINK |
| MYSQL_PROTOCAL |
| |
| 7:VEXCHANGE |
| offset: 0 |
| distribute expr lists: |
| |
| PLAN FRAGMENT 1 |
| |
| PARTITION: RANDOM |
| |
| HAS_COLO_PLAN_NODE: false |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 07 |
| UNPARTITIONED |
| |
| 6:VHASH JOIN(354) |
| | join op: INNER JOIN(BROADCAST)[] |
| | equal join conjunct: cnt[
| | cardinality=1 |
| | vec output tuple id: 8 |
| | vIntermediate tuple ids: 7 |
| | hash output slot ids: 5 7 |
| | distribute expr lists: |
| | distribute expr lists: |
| | |
| |
| | offset: 0 |
| | distribute expr lists: |
| | |
| 5:VEXCHANGE |
| offset: 0 |
| distribute expr lists: |
| |
| PLAN FRAGMENT 2 |
| ... |
| |
| PLAN FRAGMENT 3 |
| ... |
+
Operators are linked to their child nodes with dashed lines. When an operator has multiple children, they are arranged vertically, representing a right-to-left order. In the example above, operator 6 (VHASH JOIN) has operator 5 (EXCHANGE) as its left child and operator 4 (EXCHANGE) as its right child.
Fragment Field Descriptionsβ
Name | Description |
---|
PARTITION | Displays the data distribution of the current Fragment |
HAS_COLO_PLAN_NODE | Indicates if the fragment contains colocate operators |
Sink | The method of fragment data output, see the table below for details |
Sink Methods
Name | Description |
---|
STREAM DATA SINK | Outputs data to the next Fragment. It includes two lines of information. First line: The downstream EXCHANGE NODE to which data is sent. Second line: The method of data distribution. - UNPARTITIONED means each downstream instance receives the full data set. This typically occurs in broadcast joins or when single-instance logic is required, such as global limit or order by. - RANDOM means each downstream instance receives a random subset of data without repetition. - HASH_PARTITIONED uses the listed slots as keys to hash and send data shards to the same downstream instance. This is often used upstream of partition hash joins or the second stage of two-phase aggregations. |
RESULT SINK | Sends result data to the FE. The first line indicates the protocol used for data transmission, currently supporting MySQL and arrow protocols. |
OLAP TABLE SINK | Writes data to an OLAP table. |
MultiCastDataSinks | A multicast operator that contains multiple STREAM DATA SINKs. Each STREAM DATA SINK sends the full data set to its downstream. |
When using VERBOSE mode, Tuple information is output. Tuple information describes the SLOT details within a row of data, including SLOT type, nullable status, etc.
The output contains multiple TupleDescriptors, each containing multiple SlotDescriptors. An example is shown below:
Tuples:
TupleDescriptor{id=0, tbl=t1}
SlotDescriptor{id=0, col=c1, colUniqueId=0, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
SlotDescriptor{id=2, col=c3, colUniqueId=2, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
TupleDescriptorβ
Name | Description |
---|
id | The id of the tuple descriptor |
tbl | The corresponding table of the tuple, or null if not applicable |
SlotDescriptorβ
Name | Description |
---|
id | The id of the slot descriptor |
col | The corresponding column of the slot, or left blank if not applicable |
colUniqueId | The unique id of the corresponding column, or -1 if not applicable |
type | The type of the slot |
nullable | Indicates if the corresponding data can be null |
isAutoIncrement | Indicates if the column is auto-incremented |
subColPath | The sub-column path within the column, currently only applies to variant types |
Operator Descriptionsβ
Operator Listβ
Name | Description |
---|
AGGREGATE | Aggregation operator |
ANALYTIC | Window function operator |
ASSERT NUMBER OF ROWS | Operator to check the number of downstream output rows |
EXCHANGE | Data exchange receiver operator |
MERGING-EXCHANGE | Data exchange receiver with sorting and row limit functionality |
HASH JOIN | Hash join operator |
NESTED LOOP JOIN | Nested loop join operator |
PartitionTopN | Intra-partition data pre-filtering operator |
REPEAT_NODE | Data replication operator |
DataGenScanNode | Table-valued function operator |
EsScanNode | ES table scan operator |
HIVE_SCAN_NODE | Hive table scan operator |
HUDI_SCAN_NODE | Hudi table scan operator |
ICEBERG_SCAN_NODE | Iceberg table scan operator |
PAIMON_SCAN_NODE | Paimon table scan operator |
JdbcScanNode | Jdbc table scan operator |
OlapScanNode | Olap table scan operator |
SELECT | Filtering operator |
UNION | Set union operator |
EXCEPT | Set difference operator |
INTERSECT | Set intersection operator |
SORT | Sorting operator |
TOP-N | Sort and return top N results operator |
TABLE FUNCTION NODE | Table function operator (lateral view) |
Common Fieldsβ
Name | Description |
---|
limit | Limits the number of output rows |
offset | Number of rows to skip before outputting |
conjuncts | Filters the results of the current node. Executed before projections. |
projections | Projection operations after the current operator. Executed after conjuncts. |
project output tuple id | The output tuple after projection. The slot arrangement within the data tuple can be seen via tuple desc. |
cardinality | Estimated row count by the optimizer |
distribute expr lists | The original data distribution method for the child nodes of the current node |
Expression's slot id | The specific slot corresponding to the slot id can be found in the tuple list in verbose mode. This list provides information such as slot type and nullable attributes. Represented as [#5] after the expression. |
AGGREGATEβ
Name | Description |
---|
(Aggregation Phase) | The aggregation phase is represented by two terms. The first term can be either update (local aggregation) or merge (global aggregation). The second term indicates whether the current data is serialized (serialize) or has completed final calculations (finalize). |
STREAMING | Only local aggregation operators in multi-stage aggregation truncation have this flag. Indicates that the current aggregation node may use STREAMING mode, where input data is passed directly to the next stage of aggregation without actual computation. |
output | The output of the current aggregation operator. All local pre-aggregation functions are prefixed with partial. |
group by | The key for aggregation |
ANALYTICβ
Name | Description |
---|
functions | The name of the current window function |
partition by | Corresponds to the partition by clause in the over clause of the window function. Windowing expression. |
order by | Sorting expression and order within the window |
window | Window range |
ASSERT NUMBER OF ROWSβ
Name | Description |
---|
EQ | The downstream output must match this row count constraint |
HASH JOINβ
Name | Description |
---|
join op | Type of join |
equal join conjunct | Equality condition in the join condition |
other join predicates | Conditions in the join condition, excluding equality |
mark join predicates | Conditions used in mark join |
other predicates | Filtering predicates after join execution |
runtime filters | Generated runtime filters |
output slot ids | List of final output slots |
hash output slot ids | List of output slots after hash join execution, but before other join conditions are applied |
isMarkJoin | Indicates whether it is a mark join |
NESTED LOOP JOINβ
Name | Description |
---|
join op | Type of join operation |
join conjuncts | Conditions for joining |
mark join predicates | Conditions used in mark join |
predicates | Filter predicates after join |
runtime filters | Generated runtime filters |
output slot ids | List of final output slots |
isMarkJoin | Whether it is a mark join |
PartitionTopNβ
Name | Description |
---|
functions | Window functions applying grouped filter optimization |
has global limit | Presence of a global limit on the number of rows |
partition limit | Limit on the number of rows within each partition |
partition topn phase | Current phase: TWO_PHASE_GLOBAL_PTOPN for global phase after shuffling by partition key, TWO_PHASE_LOCAL_PTOPN for local phase before shuffling by partition key |
REPEAT_NODEβ
Name | Description |
---|
repeat | Number of repetitions for each row and corresponding slot ids for aggregation columns |
exprs | List of expressions for output data after repetition |
DataGenScanNodeβ
Name | Description |
---|
table value function | Table function name |
EsScanNodeβ
Name | Description |
---|
SORT COLUMN | Columns for sorting results |
LOCAL_PREDICATES | Filters executed within Doris |
REMOTE_PREDICATES | Filters executed within ES |
ES index/type | ES index and type for querying |
HIVE_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
HUDI_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
hudiNativeReadSplits | Number of splits read using native method |
ICEBERG_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
icebergPredicatePushdown | Filters pushed down to iceberg API |
PAIMON_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
paimonNativeReadSplits | Number of splits read using native method |
NESTED LOOP JOINβ
Name | Description |
---|
join op | Type of join operation |
join conjuncts | Conditions for joining |
mark join predicates | Conditions used in mark join |
predicates | Filter predicates after join |
runtime filters | Generated runtime filters |
output slot ids | List of final output slots |
isMarkJoin | Whether it is a mark join |
PartitionTopNβ
Name | Description |
---|
functions | Window functions applying grouped filter optimization |
has global limit | Presence of a global limit on the number of rows |
partition limit | Limit on the number of rows within each partition |
partition topn phase | Current phase: TWO_PHASE_GLOBAL_PTOPN for global phase after shuffling by partition key, TWO_PHASE_LOCAL_PTOPN for local phase before shuffling by partition key |
REPEAT_NODEβ
Name | Description |
---|
repeat | Number of repetitions for each row and corresponding slot ids for aggregation columns |
exprs | List of expressions for output data after repetition |
DataGenScanNodeβ
Name | Description |
---|
table value function | Table function name |
EsScanNodeβ
Name | Description |
---|
SORT COLUMN | Columns for sorting results |
LOCAL_PREDICATES | Filters executed within Doris |
REMOTE_PREDICATES | Filters executed within ES |
ES index/type | ES index and type for querying |
HIVE_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
HUDI_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
hudiNativeReadSplits | Number of splits read using native method |
ICEBERG_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
icebergPredicatePushdown | Filters pushed down to iceberg API |
PAIMON_SCAN_NODEβ
Name | Description |
---|
inputSplitNum | Number of scan splits |
totalFileSize | Total file size being scanned |
scanRanges | Information on scan splits |
partition | Number of partitions being scanned |
backends | Specific data info for each BE to scan |
cardinality | Estimated number of rows by optimizer |
avgRowSize | Estimated average row size by optimizer |
numNodes | Number of BEs used by the current operator |
pushdown agg | Aggregations pushed down to scan |
paimonNativeReadSplits | Number of splits read using native method |
JdbcScanNodeβ
Name | Description |
---|
TABLE | JDBC-side table name to scan |
QUERY | Query used for scanning |
OlapScanNodeβ
Name | Description |
---|
TABLE | Table being scanned. Parentheses indicate the name of the hit synchronized materialized view. |
SORT INFO | Present when SCAN pre-sorting is planned. Indicates partial pre-sorting and pre-truncation of SCAN output. |
SORT LIMIT | Present when SCAN pre-sorting is planned. Indicates the truncation length for pre-truncation. |
TOPN OPT | Present when TOP-N Runtime Filter is planned. |
PREAGGREGATION | Indicates whether pre-aggregation is enabled. Relevant for MOR aggregation and primary key models. ON means data at the storage layer satisfies upper-layer needs without extra aggregation. OFF means extra aggregation is performed. |
partitions | Number of partitions currently scanned, total partitions, and list of scanned partition names. |
tablets | Number of tablets scanned and total tablets in the table. |
tabletList | List of tablets scanned. |
avgRowSize | Estimated row size by the optimizer. |
numNodes | Number of BEs assigned to the current scan. |
pushAggOp | Results are returned by reading zonemap metadata. Supports MIN, MAX, COUNT aggregation information. |
Name | Description |
---|
constant exprs | List of constant expressions to be included in the output. |
child exprs | Children's outputs projected through this expression list as input to the set operator. |
Name | Description |
---|
child exprs | Children's outputs projected through this expression list as input to the set operator. |
INTERSECTβ
Name | Description |
---|
child exprs | Children's outputs projected through this expression list as input to the set operator. |
Name | Description |
---|
order by | Sorting key and specific sort order. |
TABLE FUNCTION NODEβ
Name | Description |
---|
table function | Name of the table function used. |
lateral view tuple id | Tuple ID corresponding to newly generated columns. |
output slot id | List of slot IDs for columns output after column pruning. |
Name | Description |
---|
order by | Sorting key and specific sort order. |
TOPN OPT | Present when TOP-N runtime filter optimization is hit. |
OPT TWO PHASE | Present when TOP-N deferred materialization is hit. |