Skip to main content

MySQL Table-Level Sync

Table-level sync is implemented through Job + CDC Stream TVF, targeting an existing Doris table (INSERT INTO tbl SELECT * FROM cdc_stream(...)). With the expressive power of Doris SQL, you can perform column mapping, filtering, and data transformation in the sync pipeline, and exactly-once semantics is guaranteed. This is suitable for real-time sync scenarios that require data processing.

By integrating the read capability of Flink CDC, Doris reads the change log (Binlog) from MySQL and completes full + incremental sync from source table to target table. If you want Doris to automatically create downstream tables and sync a group of tables on a per-database basis, see MySQL Database-Level Sync.

Applicable Scenarios

  • Continuous sync of a single MySQL table to Doris, where the target table schema is already planned
  • Column pruning, column mapping, field renaming, or data transformation is required during the sync
  • Real-time data integration that requires end-to-end exactly-once semantics

Prerequisites

ItemDescription
Doris version4.1.0 and above
Table typeCurrently only primary key tables are supported as the target table
User privilegesLoad privilege is required
MySQL configurationBinlog must be enabled on the MySQL side. See the Configuration Guide
Semantic guaranteeSupports exactly-once semantics

Quick Start

The following minimal runnable example demonstrates the complete flow: create job, then check status.

Step 1: Create the Import Job

Use CREATE STREAMING JOB to create a continuous import job:

CREATE JOB mysql_single_sync
ON STREAMING
DO
INSERT INTO db1.tbl1
SELECT * FROM cdc_stream(
"type" = "mysql",
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
"driver_url" = "mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver",
"user" = "root",
"password" = "123456",
"database" = "source_db",
"table" = "source_table",
"offset" = "initial"
)

Step 2: Check the Import Status

select * from jobs("type"="insert") where ExecuteType = "STREAMING";

Step 3: Job Operations

For more general operations (pause, resume, delete, view tasks, and so on), see the Continuous Import Overview.

Parameter Reference

Data Source Parameters

The MySQL data source parameters supported by CDC Stream TVF are as follows:

ParameterDefaultDescription
type-Data source type. Set to mysql
jdbc_url-MySQL JDBC connection string
driver_url-Path to the JDBC driver jar. Supports file name, local absolute path, and HTTP URL. See JDBC Catalog Overview for details
driver_class-JDBC driver class name
user-Database username
password-Database password
database-Database name
table-Name of the table to sync
offsetinitialinitial: full + incremental sync; latest: incremental sync only
snapshot_split_size8096Size of a split (in rows). During full sync, the table is divided into multiple splits for syncing
snapshot_parallelism1Parallelism during the full sync stage, that is, the maximum number of splits scheduled per Task

Import Configuration Parameters

ParameterDefaultDescription
session.*NoneAll session variables can be configured under job_properties. For import variables, see Insert Into Select

For more general parameters (such as max_interval), see the Continuous Import Overview.

FAQ

Q1: What is the difference between table-level sync and database-level sync?

  • Table-level sync: The target Doris table must be created in advance. Supports column mapping and data transformation, suitable for fine-grained processing scenarios.
  • Database-level sync: Doris automatically creates downstream tables and syncs as a whole on a per-database basis. See MySQL Database-Level Sync for details.

Q2: Are non-primary-key tables supported as the target table?

Currently, only primary key tables are supported as the target table.

Q3: How do I sync only incremental data without the historical full data?

Set the offset parameter to latest. The job will skip the full sync stage and only sync Binlog incremental data.

Q4: How do I optimize when full sync is too slow?

You can adjust the following two parameters to improve throughput during the full sync stage:

  • snapshot_split_size: Increase the number of rows per split.
  • snapshot_parallelism: Increase the parallelism of splits scheduled per Task.