Async Materialized View Overview

An async materialized view is a query acceleration solution that combines the flexibility of a view with the high performance of a physical table. By precomputing and storing query results, subsequent queries can hit the materialized result set directly, avoiding the overhead of repeatedly executing complex SQL.

Quick Tour

Before using async materialized views, confirm the following points:

Does the business need query acceleration, ETL simplification, lakehouse external table acceleration, or write optimization?
Does the query follow the SPJG (SELECT-PROJECT-JOIN-GROUP-BY) pattern?
Can the data tolerate eventual consistency (non-real-time synchronization)?
Is the Catalog of the base table within the supported scope (Internal/Hive/Iceberg/Paimon/Hudi/JDBC/ES)?
Is partitioned incremental refresh needed to lower refresh costs?

Use Cases

Async materialized views mainly target the following four scenarios:

Scenario	Value
Query acceleration and concurrency improvement	Significantly improves query speed, enhances concurrency, and reduces resource consumption
Simplified ETL workflow	Simplifies the extract, transform, and load (ETL) process and improves development efficiency
Accelerated lakehouse external table queries	Significantly improves query speed against external data sources in a lakehouse architecture
Improved write efficiency	Reduces resource contention, optimizes the data write process, and ensures consistency and integrity

Limitations

Before using async materialized views, be aware of the following limitations:

Data consistency: An async materialized view is eventually consistent with the base table, but it cannot synchronize in real time and does not guarantee real-time consistency.
Window functions: Currently, queries that contain window functions are not supported for transparent rewriting into a form based on materialized views.
Multi-table scenarios: If the materialized view joins more tables than the query involves (for example, the query only involves t1 and t2, while the materialized view contains t1, t2, and t3), transparent rewriting is currently not supported.
Operators not supported by transparent rewriting: If the materialized view contains UNION ALL, LIMIT, ORDER BY, or CROSS JOIN, the materialized view can still be built normally, but it cannot be used for transparent rewriting.
Data type restrictions: Creating a materialized view does not currently support the VARBINARY type.

Principles

An async materialized view is essentially an internal table of type MTMV. When the materialized view is created, the system also registers a refresh task. When the task runs, it writes the latest data into the materialized view through an INSERT OVERWRITE statement.

Refresh Mechanism

Unlike the real-time incremental refresh of synchronous materialized views, async materialized views provide more flexible refresh options:

Refresh Mode	Description	Applicable Scenarios
Full refresh	Recomputes all data referenced by the materialized view definition SQL and writes it into the materialized view in full	Small data volume or when overall consistency must be guaranteed
Partitioned incremental refresh	Intelligently identifies changes in base table partition data and refreshes only the affected partitions	Large data volume with changes concentrated in specific partitions

Full refresh: Ensures that the materialized view data is fully consistent with the base table, but may consume more compute resources and time.
Partitioned incremental refresh: Significantly reduces the resources and time required for refresh, while ensuring eventual consistency.

Transparent Rewriting

Transparent rewriting is an important means by which a database optimizes query performance. When processing a user query, the system automatically rewrites the SQL to hit a suitable materialized view, thereby improving execution efficiency and reducing computation costs. The entire process is transparent to the user and requires no intervention.

Doris async materialized views use a transparent rewriting algorithm based on the SPJG (SELECT-PROJECT-JOIN-GROUP-BY) pattern:

Performs in-depth analysis of the structural information of the SQL.
Automatically finds and selects a suitable materialized view for transparent rewriting.
When multiple candidate materialized views exist, selects the optimal materialized view to respond to the query based on strategies such as a cost model, further improving performance.

Creating an Async Materialized View on a Data Lake

The syntax for creating an async materialized view on a data lake is identical to creating one on an internal table, but note the following:

Metadata source: The partition version and other information required for materialized view refresh come from the metadata cache of the data lake, not directly from the external environment. Therefore, after the refresh completes, the data is consistent with the result of querying the data lake through Doris, but it may be inconsistent with the results from other engines, depending on how the cache has been refreshed.
External change awareness: If the underlying Hive data is changed by an external process not controlled by Doris (such as a Spark, Hive, or Flink job) but the metadata does not change (for example, when insert overwrite is executed), the materialized view will incorrectly consider itself consistent with the base table, but the query results will be inconsistent with the results of querying the data lake through Doris. This issue can be resolved by manually forcing a refresh of the materialized view.
Iceberg limitations: When creating a partitioned materialized view on Iceberg, only Iceberg tables with a single partition column are supported, and partition evolution is supported to a limited extent. For example, changes to the time range of a time-type partition are supported; if the partition field itself changes, the materialized view refresh will fail.
Hudi limitations: When creating a materialized view on Hudi, the system cannot detect whether the base table data has changed. As long as the materialized view (or some of its partitions) has been refreshed, it is considered synchronized with the base table. Therefore, creating a materialized view on Hudi is only suitable for manual on-demand refresh scenarios.

Materialized View Refresh Support on Data Lakes

The following table shows the level of refresh support for different table types and Catalogs:

Table Type	Catalog Type	Refresh Method		Refresh Trigger
Table Type	Catalog Type	Full Refresh	Partitioned Refresh	Auto Trigger
Internal Table	Internal	Supported in 2.1	Supported in 2.1	Supported in 2.1.4
Hive	Hive	Supported in 2.1	Supported in 2.1	Not supported
Iceberg	Iceberg	Supported in 2.1	Supported in 3.1	Not supported
Paimon	Paimon	Supported in 2.1	Supported in 3.1	Not supported
Hudi	Hudi	Supported in 2.1	Supported in 3.1	Not supported
JDBC	JDBC	Supported in 2.1	Not supported	Not supported
ES	ES	Supported in 2.1	Not supported	Not supported

Transparent Rewriting Support on Data Lakes

The transparent rewriting feature of async materialized views currently supports the following table types and Catalogs.

Real-time awareness of base table data: This means that when the base table data used by the materialized view changes, the materialized view can detect the change in real time and use the latest data in queries.

Table Type	Catalog Type	Transparent Rewriting Support	Real-time Awareness of Base Table Data
Internal Table	Internal	Supported	Supported
Hive	Hive	Supported	Supported in 3.1
Iceberg	Iceberg	Supported	Supported in 3.1
Paimon	Paimon	Supported	Supported in 3.1
Hudi	Hudi	Supported	Not supported
JDBC	JDBC	Supported	Not supported
ES	ES	Supported	Not supported

Enabling Transparent Rewriting for External Table Materialized Views

When a materialized view uses external tables, it does not participate in transparent rewriting by default. To enable it, run:

SET materialized_view_rewrite_enable_contain_external_table = true;

Performance Optimization for External Table Transparent Rewriting

Starting from 2.1.11, Doris has optimized the transparent rewriting performance for external tables, mainly by improving the performance of obtaining available materialized views that contain external tables.

If transparent rewriting on a partitioned materialized view that contains external tables is slow, you can adjust the following parameters in fe.conf:

Parameter	Default	Description	Version
`max_hive_partition_cache_num`	10000	The maximum number of table-level partition caches in Hive Metastore. When a Hive external table has many partitions, set it to `20000` or higher	-
`external_cache_expire_time_minutes_after_access`	10 minutes	The expiration time of a cached object after its last access. It can be increased appropriately (applies to both external table schema cache and Hive metadata cache)	-
`external_cache_refresh_time_minutes`	10 minutes	The auto-refresh interval of external table metadata cache objects. It can be increased appropriately	3.1+

For more configuration details on external table metadata cache, see Metadata Cache.

Relationship Between Materialized Views and OLAP Internal Tables

There is no restriction on the base table model used in the SQL definition of an async materialized view. It can be a Duplicate model, a Unique model (merge-on-write or merge-on-read), an Aggregate model, and so on.

The underlying implementation of the materialized view itself relies on an OLAP table of the Duplicate model and can in theory support all core features of the Duplicate model. However, to ensure that the materialized view can stably and efficiently execute data refresh tasks, a series of necessary restrictions are placed on its functionality:

Partition operations: The partitions of a materialized view are automatically created and maintained based on its base tables. Users cannot perform partition operations on a materialized view.
Drop and rename: Because there are associated jobs (JOB) behind a materialized view, the DROP TABLE or RENAME TABLE commands cannot be used to operate on a materialized view. Use the materialized view's own commands to perform the corresponding operations.
Column data types: The column data types of a materialized view are automatically derived from the query statement specified at creation time and cannot be modified, otherwise the refresh task may fail.
Property modification: A materialized view has some properties that a Duplicate table does not have. These must be modified through the materialized view's commands. Other common properties are modified using the ALTER TABLE command.

FAQ

Q1: Can an async materialized view guarantee real-time data consistency with the base table?

No. An async materialized view is eventually consistent with the base table, but it cannot synchronize in real time. If real-time consistency is required, consider a synchronous materialized view.

Q2: Why can a query that contains window functions not be rewritten?

The current transparent rewriting algorithm does not yet support window functions. Queries that contain window functions are not rewritten into a form based on materialized views.

Q3: Why is a materialized view based on an external table not hit by rewriting?

External-table-based materialized views do not participate in transparent rewriting by default. Set materialized_view_rewrite_enable_contain_external_table = true to make them participate in rewriting.

Q4: Why are the materialized view query results inconsistent after Hive external table data is modified by an external process?

If Hive data is modified by an external process such as Spark, Hive, or Flink but the metadata does not change (for example, insert overwrite), the materialized view cannot detect the change. This can be resolved by manually forcing a refresh of the materialized view.

Q5: Can a materialized view based on Hudi refresh automatically?

It cannot detect changes in the base table and is only suitable for manual on-demand refresh scenarios.

Quick Tour​

Use Cases​

Limitations​

Principles​

Refresh Mechanism​

Transparent Rewriting​

Creating an Async Materialized View on a Data Lake​

Materialized View Refresh Support on Data Lakes​

Transparent Rewriting Support on Data Lakes​

Enabling Transparent Rewriting for External Table Materialized Views​

Performance Optimization for External Table Transparent Rewriting​

Relationship Between Materialized Views and OLAP Internal Tables​

FAQ​

See Also​