MySQL CDC with SQL Mapping

SQL Mapping Sync is implemented through Job + CDC Stream TVF, targeting an existing Doris table (INSERT INTO tbl SELECT * FROM cdc_stream(...)). With the expressive power of Doris SQL, you can perform column mapping, filtering, and data transformation in the sync pipeline, and exactly-once semantics is guaranteed. This is suitable for real-time sync scenarios that require data processing.

By integrating the read capability of Flink CDC, Doris reads the change log (Binlog) from MySQL and completes full + incremental sync from source table to target table. If you want Doris to automatically create downstream tables and sync a group of tables on a per-database basis, see MySQL CDC with Auto Table Creation.

Applicable Scenarios

Continuous sync of a single MySQL table to Doris, where the target table schema is already planned
Column pruning, column mapping, field renaming, or data transformation is required during the sync
Real-time data integration that requires end-to-end exactly-once semantics

Prerequisites

Item	Description
Doris version	4.1.0 and above
Table type	Currently only primary key tables are supported as the target table
User privileges	`Load` privilege is required
MySQL configuration	Binlog must be enabled on the MySQL side. See the Configuration Guide
Semantic guarantee	Supports exactly-once semantics

Quick Start

The following minimal runnable example demonstrates the complete flow: create job, then check status.

Step 1: Create the Import Job

Use CREATE STREAMING JOB to create a continuous import job:

CREATE JOB mysql_single_sync
ON STREAMING
DO
INSERT INTO db1.tbl1
SELECT * FROM cdc_stream(
    "type" = "mysql",
    "jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
    "driver_url" = "mysql-connector-java-8.0.25.jar",
    "driver_class" = "com.mysql.cj.jdbc.Driver",
    "user" = "root",
    "password" = "123456",
    "database" = "source_db",
    "table" = "source_table",
    "offset" = "initial"
)

Step 2: Check the Import Status

select * from jobs("type"="insert") where ExecuteType = "STREAMING";

Step 3: Job Operations

For more general operations (pause, resume, delete, view tasks, and so on), see the Continuous Import Overview.

Parameter Reference

Data Source Parameters

The MySQL data source parameters supported by CDC Stream TVF are as follows:

Parameter	Default	Description
type	-	Data source type. Set to `mysql`
jdbc_url	-	MySQL JDBC connection string
driver_url	-	Path to the JDBC driver jar. Supports file name, local absolute path, and HTTP URL. See JDBC Catalog Overview for details
driver_class	-	JDBC driver class name
user	-	Database username
password	-	Database password
database	-	Database name
table	-	Name of the table to sync
offset	latest	`latest`: incremental sync only; `initial`: full + incremental sync
snapshot_split_size	8096	Size of a split (in rows). During full sync, the table is divided into multiple splits for syncing
snapshot_parallelism	1	Parallelism during the full sync stage, that is, the maximum number of splits scheduled per Task

Import Configuration Parameters

Parameter	Default	Description
session.*	None	All session variables can be configured under `job_properties`. For import variables, see Insert Into Select

For more general parameters (such as max_interval), see the Continuous Import Overview.

FAQ

Q1: What is the difference between SQL Mapping Sync and Auto Table Creation Sync?

SQL Mapping Sync: The target Doris table must be created in advance. Supports column mapping and data transformation, suitable for fine-grained processing scenarios.
Auto Table Creation Sync: Doris automatically creates downstream tables and syncs as a whole on a per-database basis. See MySQL CDC with Auto Table Creation for details.

Q2: Are non-primary-key tables supported as the target table?

Currently, only primary key tables are supported as the target table.

Q3: How do I sync only incremental data without the historical full data?

Set the offset parameter to latest. The job will skip the full sync stage and only sync Binlog incremental data.

Q4: How do I optimize when full sync is too slow?

You can adjust the following two parameters to improve throughput during the full sync stage:

snapshot_split_size: Increase the number of rows per split.
snapshot_parallelism: Increase the parallelism of splits scheduled per Task.

Applicable Scenarios​

Prerequisites​

Quick Start​

Step 1: Create the Import Job​

Step 2: Check the Import Status​

Step 3: Job Operations​

Parameter Reference​

Data Source Parameters​

Import Configuration Parameters​

FAQ​

Related Documents​