Skip to main content

Load FAQ

General Load FAQ

Error "[DATA_QUALITY_ERROR] Encountered unqualified data"

Problem Description: Data quality error during loading.

Solution:

  • Stream Load and Insert Into operations will return an error URL, while for Broker Load you can check the error URL through the Show Load command.
  • Use a browser or curl command to access the error URL to view the specific data quality error reasons.
  • Use the strict_mode and max_filter_ratio parameters to control the acceptable error rate.

Error "[E-235] Failed to init rowset builder"

Problem Description: Error -235 occurs when the load frequency is too high and data hasn't been compacted in time, exceeding version limits.

Solution:

  • Increase the batch size of data loading and reduce loading frequency.
  • Increase the max_tablet_version_num parameter in be.conf, it is recommended not to exceed 5000.

Error "[E-238] Too many segments in rowset"

Problem Description: Error -238 occurs when the number of segments under a single rowset exceeds the limit.

Common Causes:

  • The bucket number configured during table creation is too small.
  • Data skew occurs; consider using more balanced bucket keys.

Error "Transaction commit successfully, BUT data will be visible later"

Problem Description: Data load is successful but temporarily not visible.

Cause: Usually due to transaction publish delay caused by system resource pressure.

Error "Failed to commit kv txn [...] Transaction exceeds byte limit"

Problem Description: In shared-nothing mode, too many partitions and tablets are involved in a single load, exceeding the transaction size limit.

Solution:

  • Load data by partition in batches to reduce the number of partitions involved in a single load.
  • Optimize table structure to reduce the number of partitions and tablets.

Extra "\r" in the last column of CSV file

Problem Description: Usually caused by Windows line endings.

Solution: Specify the correct line delimiter: -H "line_delimiter:\r\n"

CSV data with quotes imported as null

Problem Description: CSV data with quotes becomes null after import.

Solution: Use the trim_double_quotes parameter to remove double quotes around fields.

Stream Load

Reasons for Slow Loading

  • Bottlenecks in CPU, IO, memory, or network card resources.
  • Slow network between client machine and BE machines, can be initially diagnosed through ping latency from client to BE machines.
  • Webserver thread count bottleneck, too many concurrent Stream Loads on a single BE (exceeding be.conf webserver_num_workers configuration) may cause thread count bottleneck.
  • Memtable Flush thread count bottleneck, check BE metrics doris_be_flush_thread_pool_queue_size to see if queuing is severe. Can be resolved by increasing the be.conf flush_thread_num_per_store parameter.

Handling Special Characters in Column Names

When column names contain special characters, use single quotes with backticks to specify the columns parameter:

curl --location-trusted -u root:"" \
-H 'columns:`@coltime`,colint,colvar' \
-T a.csv \
-H "column_separator:," \
http://127.0.0.1:8030/api/db/loadtest/_stream_load

Routine Load

Major Bug Fixes

Issue DescriptionTrigger ConditionsImpact ScopeTemporary SolutionAffected VersionsFixed VersionsFix PR
When at least one job times out while connecting to Kafka, it affects the import of other jobs, slowing down global Routine Load imports.At least one job times out while connecting to Kafka.Shared-nothing and shared-storageStop or manually pause the job to resolve the issue.<2.1.9 <3.0.52.1.9 3.0.5#47530
User data may be lost after restarting the FE Master.The job's offset is set to OFFSET_END, and the FE is restarted.Shared-storageChange the consumption mode to OFFSET_BEGINNING.3.0.2-3.0.43.0.5#46149
A large number of small transactions are generated during import, causing compaction to fail and resulting in continuous -235 errors.Doris consumes data too quickly, or Kafka data flow is in small batches.Shared-nothing and shared-storagePause the Routine Load job and execute the following command: ALTER ROUTINE LOAD FOR jobname FROM kafka ("property.enable.partition.eof" = "false");<2.1.8 <3.0.42.1.8 3.0.4#45528, #44949, #39975
Kafka third-party library destructor hangs, causing data consumption to fail.Kafka topic deletion (possibly other conditions).Shared-nothing and shared-storageRestart all BE nodes.<2.1.8 <3.0.42.1.8 3.0.4#44913
Routine Load scheduling hangs.Timeout occurs when FE aborts a transaction in Meta Service.Shared-storageRestart the FE node.<3.0.23.0.2#41267
Routine Load restart issue.Restarting BE nodes.Shared-nothing and shared-storageManually resume the job.<2.1.7 <3.0.22.1.7 3.0.2#3727

Default Configuration Optimizations

Optimization ContentApplied VersionsCorresponding PR
Increased the timeout duration for Routine Load.2.1.7 3.0.3#42042, #40818
Adjusted the default value of max_batch_interval.2.1.8 3.0.3#42491
Removed the restriction on max_batch_interval.2.1.5 3.0.0#29071
Adjusted the default values of max_batch_rows and max_batch_size.2.1.5 3.0.0#36632

Observability Optimizations

Optimization ContentApplied VersionsCorresponding PR
Added observability-related metrics.3.0.5#48209, #48171, #48963

Error "failed to get latest offset"

Problem Description: Routine Load cannot get the latest Kafka offset.

Common Causes:

  • Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name.
  • Timeout caused by third-party library bug, error: java.util.concurrent.TimeoutException: Waited X seconds

Error "failed to get partition meta: Local:'Broker transport failure"

Problem Description: Routine Load cannot get Kafka Topic Partition Meta.

Common Causes:

  • Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name.
  • If using domain names, try configuring domain name mapping in /etc/hosts

Error "Broker: Offset out of range"

Problem Description: The consumed offset doesn't exist in Kafka, possibly because it has been cleaned up by Kafka.

Solution:

  • Need to specify a new offset for consumption, for example, set offset to OFFSET_BEGINNING.
  • Need to set appropriate Kafka log cleanup parameters based on import speed: log.retention.hours, log.retention.bytes, etc.