Release 3.0.3
Dear community members, the Apache Doris 3.0.3 version was officially released on December 02, 2024, this version further enhances the performance and stability of the system.
Quick Download: https://doris.apache.org/download/
GitHub Release: https://github.com/apache/doris/releases
Behavioral Changesβ
- Prohibited column updates on MOW tables with synchronous materialized views. #40190
- Adjusted the default parameters of RoutineLoad to improve import efficiency. #42968
- When StreamLoad fails, the return value of LoadedRows is adjusted to 0. #41946 #42291
- Adjusted the default memory limit of Segment cache to 5%. #42308 #42436
New Featuresβ
-
Introduced the session variable
enable_cooldown_replica_affinity
to control the affinity of cold and hot tiered replicas. #42677 -
Added
table$partition
syntax for querying partition information of Hive tables. #40774 -
Supported creation of Hive tables in Text format. #41860 #42175
Asynchronous Materialized Viewsβ
- Introduced new materialized view attribute
use_for_rewrite
. Whenuse_for_rewrite
is set to false, the materialized view does not participate in transparent rewriting. #40332
Query Optimizerβ
- Supported correlated non-aggregate subqueries. #42236
Query Executionβ
- Added functions
ngram_search
,normal_cdf
,to_iso8601
,from_iso8601_date
,SESSION_USER()
,last_query_id
. #38226 #40695 #41075 #41600 #39575 #40739 - The
aes_encrypt
andaes_decrypt
functions support GCM mode. #40004 - Profile outputs the changed session variable values. #41016 #41318
Semi-structured Data Managementβ
- Added array functions
array_match_all
andarray_match_any
. #40605 #43514 - The array function
array_agg
supports nesting ARRAY/MAP/STRUCT within ARRAY. #42009 - Added approximate aggregate statistical functions
approx_top_k
andapprox_top_sum
. #44082
Improvementsβ
Storageβ
- Supported
bitmap_empty
as the default value. #40364 - Introduced the session variable
insert_timeout
to control the timeout of DELETE statements. #41063 - Improved some error message prompts. #41048 #39631
- Improved the priority scheduling of replica repair. #41076
- Enhanced the robustness of timezone handling when creating tables. #41926 #42389
- Checked the validity of partition expressions when creating tables. #40158
- Supported Unicode-encoded column names in DELETE operations. #39381
Compute-Storage Decoupledβ
- Supported ARM architecture deployment in storage and compute separation mode. #42467 #43377
- Optimized the eviction strategy and lock competition of file cache, improving hit rate and high concurrency point query performance. #42451 #43201 #41818 #43401
- S3 storage vault supported
use_path_style
, solving the problem of using custom domain names for object storage. #43060 #43343 #43330 - Optimized storage and compute separation configuration and deployment, preventing misoperations in different modes. #43381 #43522 #43434 #40764 #43891
- Optimized observability and provided an interface for deleting specified segment file cache. #38489 #42896 #41037 #43412
- Optimized Meta-service operation and maintenance interface: RPC rate limiting and tablet metadata correction. #42413 #43884 #41782 #43460
Lakehouseβ
-
Paimon Catalog supported Alibaba Cloud DLF and OSS-HDFS storage. #41247 #42585
- View Documentation
-
Supported reading of Hive tables in OpenCSV format. #42257 #42942
-
Optimized the performance of accessing the
information_schema.columns
table in External Catalog. #41659 #41962 -
Used the new Max Compute open storage API to access Max Compute data sources. #41614
-
Optimized the scheduling policy of the JNI part of Paimon tables, making scan tasks more balanced. #43310
-
Optimized the read performance of small ORC files. #42004 #43467
-
Supported reading of parquet files in brotli compressed format. #42177
-
Added
file_cache_statistics
table under theinformation_schema
library to view metadata cache statistics. #42160
Query Optimizerβ
- Optimization: When queries only differ in comments, the same SQL Cache can be reused. #40049
- Optimization: Improved the stability of statistical information when data is frequently updated. #43865 #39788 #43009 #40457 #42409 #41894
- Optimization: Enhanced the stability of constant folding. #42910 #41164 #39723 #41394 #42256 #40441
- Optimization: Column pruning can generate better execution plans. #41719 #41548
Query Executionβ
- Optimized the memory usage of the sort operator. #39306
- Optimized the performance of computations on ARM. #38888 #38759
- Optimized the computational performance of a series of functions. #40366 #40821 #40670 #41206 #40162
- Used SSE instructions to optimize the performance of the
match_ipv6_subnet
function. #38755 - Supported automatic creation of new partitions during insert overwrite. #38628 #42645
- Added the status of each PipelineTask in Profile. #42981
- IP type supported runtime filter. #39985
Semi-structured Data Managementβ
- Output the real SQL of prepared statements in audit logs. #43321
- The filebeat doris output plugin supports fault tolerance and progress reporting. #36355
- Optimized the performance of inverted index queries. #41547 #41585 #41567 #41577 #42060 #42372
- The array function
array overlaps
supports acceleration using inverted indexes. #41571 - The IP function
is_ip_address_in_range
supports acceleration using inverted indexes. #41571 - Optimized the CAST performance of the VARIANT data type. #41775 #42438 #43320
- Optimized the CPU resource consumption of the Variant data type. #42856 #43062 #43634
- Optimized the metadata and execution memory resource consumption of the Variant data type. #42448 #43326 #41482 #43093 #43567 #43620
Permissionsβ
- Added a new configuration item
ldap_group_filter
in LDAP for custom group filtering. #43292
Otherβ
- Supported displaying connection count information by user in FE monitoring items. #39200
Bug Fixesβ
Storageβ
- Fixed the issue with using IPv6 hostnames. #40074
- Fixed the inaccurate display of broker/s3 load progress. #43535
- Fixed the issue where queries might hang from FE. #41303 #42382
- Fixed the issue of duplicate auto-increment IDs under exceptional circumstances. #43774 #43983
- Fixed occasional NPE issues with groupcommit. #43635
- Fixed the inaccurate calculation of auto bucket. #41675 #41835
- Fixed the issue where FE might not correctly plan multi-table flows after restart. #41677 #42290
Compute-Storage Decoupledβ
- Fixed the issue that MOW primary key tables with large delete bitmaps might cause coredump. #43088 #43457 #43479 #43407 #43297 #43613 #43615 #43854 #43968 #44074 #41793 #42142
- Fixed the issue that segment files, when being a multiple of 5MB, would fail to upload objects. #43254
- Fixed the issue that the default retry policy of aws sdk did not take effect. #43575 #43648
- Fixed the issue that altering storage vault could continue execution even when the wrong type was specified. #43489 #43352 #43495
- Fixed the issue that tablet_id might be 0 during the delayed commit process of large transactions. #42043 #42905
- Fixed the issue that constant folding RCP and FE forwarding SQL might not be executed in the expected computation group. #43110 #41819 #41846
- Fixed the issue that meta-service did not strictly check instance_id upon receiving RPC. #43253 #43832
- Fixed the issue that FE follower information_schema version did not update in time. #43496
- Fixed the issue of atomicity in file cache rename and inaccurate metrics. #42869 #43504 #43220
Lakehouseβ
- Prohibited implicit conversion predicates from being pushed down to JDBC data sources to avoid inconsistent query results. #42102
- Fixed some read issues with high-version Hive transactional tables. #42226
- Fixed the issue that the Export command might cause deadlocks. #43083 #43402
- Fixed the issue of being unable to query Hive views created by Spark. #43552
- Fixed the issue that Hive partition paths containing special characters led to incorrect partition pruning. #42906
- Fixed the issue that Iceberg Catalog could not use AWS Glue. #41084
Asynchronous Materialized Viewsβ
- Fixed the issue that asynchronous materialized views might not refresh after the base table is rebuilt. #41762
Query Optimizerβ
- Fixed the issue that partition pruning results might be incorrect when using multi-column range partitioning. #43332
- Fixed the issue of incorrect calculation results in some limit offset scenarios. #42576
Query Executionβ
- Fixed the issue that hash join with array types larger than 4G could cause BE Core. #43861
- Fixed the issue that is null predicate operations might yield incorrect results in some scenarios. #43619
- Fixed the issue that bitmap types might produce incorrect output results in hash join. #43718
- Fixed some issues where function results were calculated incorrectly. #40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
- Fixed some issues with JSON type parsing. #39937
- Fixed issues with varchar and char types in runtime filter operations. #43758 #43919
- Fixed some issues with the use of decimal256 in scalar and aggregate functions. #42136 #42356
- Fixed the issue that arrow flight reported
Reach limit of connections
errors upon connection. #39127 - Fixed the issue of incorrect memory usage statistics for BE in k8s environments. #41123
Semi-structured Data Managementβ
- Adjusted the default values of
segment_cache_fd_percentage
andinverted_index_fd_number_limit_percent
. #42224 - logstash now supports group_commit. #40450
- Fixed the issue of coredump when building index. #43246 #43298
- Fixed issues with variant index. #43375 #43773
- Fixed potential fd and memory leaks under abnormal compaction circumstances. #42374
- Inverted index match null now correctly returns null instead of false. #41786
- Fixed the issue of coredump when ngram bloomfilter index bf_size is set to 65536. #43645
- Fixed the issue of potential coredump during complex data type JOINs. #40398
- Fixed the issue of coredump with TVF JSON data. #43187
- Fixed the precision issue of bloom filter calculations for dates and times. #43612
- Fixed the issue of coredump with IPv6 type storage. #43251
- Fixed the issue of coredump when using VARIANT type with light_schema_change disabled. #40908
- Improved cache performance for high-concurrency point queries. #44077
- Fixed the issue that bloom filter indexes were not synchronized when columns were deleted. #43378
- Fixed instability issues with es catalog under special circumstances such as mixed array and scalar data. #40314 #40385 #43399 #40614
- Fixed coredump issues caused by abnormal regular pattern matching. #43394
Permissionsβ
- Fixed several issues where permissions were not properly restricted after authorization. #43193 #41723 #42107 #43306
- Enhanced several permission checks. #40688 #40533 #41791 #42106
Otherβ
-
Supplemented missing audit log fields in audit log tables and files. #43303