Skip to main content

Spill Query Intermediate Results to Disk

Why spilling is needed

The Doris compute layer is based on an MPP architecture. All compute tasks run in BE memory, and data exchange between nodes also relies on memory, so memory management directly affects query stability. As more users migrate ETL data processing, multi-table materialized view processing, and complex ad-hoc queries to Doris, a single node often cannot hold all intermediate state in memory.

Spilling (spill to disk) writes intermediate state such as aggregation state and sort temporary data to disk, allowing memory-constrained queries to continue executing. It brings three benefits:

BenefitDescription
ScalabilityHandles datasets that exceed a single node's memory limit
StabilityReduces query errors or process crashes caused by out-of-memory conditions
FlexibilityRuns more complex queries without adding hardware

Operators that currently support spilling: Hash Join, Aggregation, Sort, and CTE.

Note

Spilling adds disk I/O, and query latency may increase significantly. Increase the session variable query_timeout accordingly, and mount a dedicated disk or use SSDs for the spill directory to reduce the impact on normal load and query workloads.

Query spilling is disabled by default.

How spilling is triggered

Doris uses a reserve memory mechanism to control when spilling happens. The flow is as follows:

  1. During execution, Doris estimates the memory required to process each Block and requests it from the unified memory manager.
  2. The global memory allocator checks whether this request exceeds the memory limit of the Query, Workload Group, or process.
  3. When the limit is exceeded, the request fails. Doris suspends the current Query and triggers spilling on its largest operator.
  4. After spilling completes, the Query continues to execute.

Memory management hierarchy

Doris memory management has three levels: process level -> Workload Group level -> Query level. Spilling behavior is constrained by all three.

Process-level memory (BE)

The mem_limit parameter in be.conf controls the upper memory limit of the entire BE process. When memory usage exceeds this threshold, Doris cancels the Query that is currently requesting memory, and an asynchronous background task kills some Queries or releases caches.

Two common problem scenarios:

  • Mixed deployment: When BE shares a host with FE, Kafka, HDFS, or other processes, the actual available memory may be far smaller than mem_limit. The memory release mechanism then fails, and the OS OOM Killer is triggered.
  • Containerized deployment: In K8s or Cgroup environments, Doris automatically detects the container's memory configuration, so no manual adjustment is needed.

Workload Group memory

ParameterDescription
max_memory_percentThe maximum percentage of process memory this Workload Group can use. Exceeding it triggers spilling or kills queries.
min_memory_percentThe minimum percentage of memory guaranteed to this Workload Group. When memory is tight, the system allocates by this value to ensure other groups have enough memory.
memory_low_watermarkLow watermark of memory usage. Default is 80%.
memory_high_watermarkHigh watermark of memory usage. Default is 95%. When this value is exceeded, reserve memory requests fail and spilling is triggered.

Constraints: the sum of min_memory_percent across all Workload Groups must not exceed 100%, and a single group's min_memory_percent must not be greater than its max_memory_percent.

Query-level memory

Static memory allocation

exec_mem_limit is set before a Query runs through a session variable and cannot be modified dynamically during execution.

Upgrade note

The default value of exec_mem_limit was 2 GB before version 3.1, and was changed to 100 GB in version 3.1 and later, where it actually takes effect on the BE side. Before upgrading to version 3.1 or later, explicitly set this parameter to 100g to prevent existing queries from being canceled due to exceeding the limit or from triggering unexpected spilling.

Slot-based dynamic memory allocation

Under static allocation, users often cannot accurately estimate the memory a single Query needs and tend to set it too high (such as half of the process memory), which breaks fine-grained control. The Workload Group slot mechanism solves this problem.

How it works:

  • When a Workload Group has max_memory_percent and max_concurrency set, BE memory is logically divided into max_concurrency slots, and each slot has memory = max_memory_percent x mem_limit / max_concurrency.
  • By default, each Query takes 1 slot. To use more memory, modify the session variable query_slot_count.
  • When a Query takes more slots, the number of Queries the Workload Group can run concurrently decreases automatically, and new Queries enter the queue.

Possible values of slot_memory_policy:

ValueDescription
noneDefault. Not enabled. A Query uses as much memory as possible, and spilling is triggered when the Workload Group limit is reached.
fixedMemory per Query = workload group mem_limit x query_slot_count / max_concurrency. Fixed allocation by concurrency.
dynamicMemory per Query = workload group mem_limit x query_slot_count / sum(running query slots). Idle slot memory is dynamically reallocated to large running queries.

Both fixed and dynamic are hard limits. Exceeding them triggers spilling or kills the Query, and they also override the statically allocated exec_mem_limit. When setting slot_memory_policy, configure max_concurrency carefully; otherwise out-of-memory issues may occur.

Enable query spilling

Step 1: Configure BE spill paths

Add the following configuration to be.conf. The BE must be restarted after the change for it to take effect:

spill_storage_root_path=/mnt/disk1/spilltest/doris/be/storage;/mnt/disk2/doris-spill;/mnt/disk3/doris-spill
spill_storage_limit=100%
ParameterDescription
spill_storage_root_pathStorage path for spill files. Defaults to the same value as storage_root_path. A dedicated disk path is recommended.
spill_storage_limitMaximum disk usage for spill files. Supports an absolute value (such as 100G or 1T) or a percentage (default 20%). If a dedicated disk is used, this can be set to 100%.

Step 2: Configure FE session variables

SET enable_spill = true;
SET exec_mem_limit = 10g;
SET query_timeout = 3600;
VariableDescription
enable_spillWhether to enable spilling. Default is false. When enabled, spilling is triggered automatically when memory is tight.
exec_mem_limitMaximum memory available to a single Query.
query_timeoutSpilling increases query latency, so increase the timeout accordingly (in seconds).

Step 3: Configure Workload Group (optional)

Adjust max_memory_percent to prevent a single Workload Group from exhausting process memory:

ALTER WORKLOAD GROUP normal PROPERTIES ('max_memory_percent'='90%');

Enable slot-based dynamic memory allocation so that large queries spill first:

ALTER WORKLOAD GROUP normal PROPERTIES ('slot_memory_policy'='dynamic');

Monitor spilling status

Audit log

The FE Audit Log adds the following fields to record spill read and write volumes:

SpillWriteBytesToLocalStorage=503412182|SpillReadBytesFromLocalStorage=503412182
FieldDescription
SpillWriteBytesToLocalStorageTotal data written to disk during spilling (bytes).
SpillReadBytesFromLocalStorageTotal data read from disk during spilling (bytes).

Query Profile

After a query triggers spilling, counters prefixed with Spill appear in the Profile. Using HashJoin Build HashTable as an example:

PARTITIONED_HASH_JOIN_SINK_OPERATOR  (id=4  ,  nereids_id=179):(ExecTime:  6sec351ms)
- Spilled: true
- CloseTime: 528ns
- ExecTime: 6sec351ms
- InitTime: 5.751us
- InputRows: 6.001215M (6001215)
- MemoryUsage: 0.00
- MemoryUsagePeak: 554.42 MB
- MemoryUsageReserved: 1024.00 KB
- OpenTime: 2.267ms
- PendingFinishDependency: 0ns
- SpillBuildTime: 2sec437ms
- SpillInMemRow: 0
- SpillMaxRowsOfPartition: 68.569K (68569)
- SpillMinRowsOfPartition: 67.455K (67455)
- SpillPartitionShuffleTime: 836.302ms
- SpillPartitionTime: 131.839ms
- SpillTotalTime: 5sec563ms
- SpillWriteBlockBytes: 714.13 MB
- SpillWriteBlockCount: 1.344K (1344)
- SpillWriteFileBytes: 244.40 MB
- SpillWriteFileTime: 350.754ms
- SpillWriteFileTotalCount: 32
- SpillWriteRows: 6.001215M (6001215)
- SpillWriteSerializeBlockTime: 4sec378ms
- SpillWriteTaskCount: 417
- SpillWriteTaskWaitInQueueCount: 0
- SpillWriteTaskWaitInQueueTime: 8.731ms
- SpillWriteTime: 5sec549ms

Spilled: true indicates that the operator has triggered spilling.

System table backend_active_tasks

Two new columns are added to information_schema.backend_active_tasks to view the spill volume of in-progress queries in real time:

ColumnDescription
SPILL_WRITE_BYTES_TO_LOCAL_STORAGESpill data already written to disk by the current query (bytes).
SPILL_READ_BYTES_FROM_LOCAL_STORAGESpill data already read from disk by the current query (bytes).
SELECT * FROM information_schema.backend_active_tasks;

Sample output:

+-------+------------+-------------------+-----------------------------------+--------------+------------------+-----------+------------+----------------------+---------------------------+--------------------+-------------------+------------+------------------------------------+-------------------------------------+
| BE_ID | FE_HOST | WORKLOAD_GROUP_ID | QUERY_ID | TASK_TIME_MS | TASK_CPU_TIME_MS | SCAN_ROWS | SCAN_BYTES | BE_PEAK_MEMORY_BYTES | CURRENT_USED_MEMORY_BYTES | SHUFFLE_SEND_BYTES | SHUFFLE_SEND_ROWS | QUERY_TYPE | SPILL_WRITE_BYTES_TO_LOCAL_STORAGE | SPILL_READ_BYTES_FROM_LOCAL_STORAGE |
+-------+------------+-------------------+-----------------------------------+--------------+------------------+-----------+------------+----------------------+---------------------------+--------------------+-------------------+------------+------------------------------------+-------------------------------------+
| 10009 | 10.16.10.8 | 1 | 6f08c74afbd44fff-9af951270933842d | 13612 | 11025 | 12002430 | 1960955904 | 733243057 | 70113260 | 0 | 0 | SELECT | 508110119 | 26383070 |
| 10009 | 10.16.10.8 | 1 | 871d643b87bf447b-865eb799403bec96 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | SELECT | 0 | 0 |
+-------+------------+-------------------+-----------------------------------+--------------+------------------+-----------+------------+----------------------+---------------------------+--------------------+-------------------+------------+------------------------------------+-------------------------------------+

Performance reference (TPC-DS 10TB)

The following data comes from a single-concurrency test on Alibaba Cloud servers, which verifies that spilling allows all 99 TPC-DS queries to finish in an extreme scenario where the ratio of memory to data is about 1:52.

Test environment:

  • 1 FE: 16 vCPU cores, 32 GiB memory (ecs.c6.4xlarge)
  • 3 BE: 16 vCPU cores, 64 GiB memory (ecs.g6.4xlarge)
  • Test data: TPC-DS 10TB, mounted through an Alibaba Cloud DLF Catalog

Total time: 28,102.386 seconds

QueryTime (ms)QueryTime (ms)QueryTime (ms)
query129092query3484055query673939554
query2130003query3569885query68183648
query396119query36148662query6911031
query41199097query3721598query70137901
query5212719query38164746query71166454
query662259query395874query722859001
query7209154query4051602query7392015
query862433query41563query74336694
query9579371query4293005query75838989
query1054260query4367769query76174235
query11560169query4479527query77174525
query1226084query4526575query781956786
query13228756query46134991query79162259
query141137097query47161873query80602088
query1527509query48153657query8116184
query1684806query49259387query8256292
query17288164query50141421query8326211
query1894770query51158056query8411906
query19124955query5291392query8557739
query2030970query5389497query8634350
query214333query54124118query87173631
query229890query5582584query88449003
query231757755query56152110query89113799
query24399553query5783417query9030825
query25291474query58259580query9112239
query2679832query59177125query9226695
query27175894query60161729query93275828
query28647497query61258058query9456464
query291299597query6239619query9564932
query3011434query6391258query9648102
query31106665query64234882query97597371
query3233481query65278610query98112399
query33146101query6690246query9964472

In the future, spilling support will be extended to more operators (such as Window Function and Intersect), and spill performance will continue to be optimized.