INSERT OVERWRITE
Description
The function of this statement is to overwrite a table or some partitions of a table
INSERT OVERWRITE table table_name
[ PARTITION (p1, ... | *) ]
[ WITH LABEL label]
[ (column [, ...]) ]
[ [ hint [, ...] ] ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
Parameters
table_name: the destination table to overwrite. This table must exist. It can be of the form
db_name.table_namepartitions: the table partitions that needs to be overwritten. The following two formats are supported
- partition names. must be one of the existing partitions in
table_nameseparated by a comma- asterisk(*)。Enable auto-detect-partition. The write operation will automatically detect the partitions involved in the data and overwrite those partitions. This format is supported since Apache Doris 2.1.3 version.
label: specify a label for the Insert task
column_name: the specified destination column must be one of the existing columns in
table_nameexpression: the corresponding expression that needs to be assigned to a column
DEFAULT: let the column use the default value
query: a common query, the result of the query will overwrite the target.
hint: some indicator used to indicate the execution behavior of
INSERT. You can choose one of this values:/*+ STREAMING */,/*+ SHUFFLE */or/*+ NOSHUFFLE */.
- STREAMING: At present, it has no practical effect and is only reserved for compatibility with previous versions. (In the previous version, adding this hint would return a label, but now it defaults to returning a label)
- SHUFFLE: When the target table is a partition table, enabling this hint will do repartiiton.
- NOSHUFFLE: Even if the target table is a partition table, repartiiton will not be performed, but some other operations will be performed to ensure that the data is correctly dropped into each partition.
Notice:
- In the current version, the session variable
enable_insert_strictis set totrueby default. If some data that does not conform to the format of the target table is filtered out during the execution of theINSERT OVERWRITEstatement, such as when overwriting a partition and not all partition conditions are satisfied, overwriting the target table will fail. - The
INSERT OVERWRITEstatement first creates a new table, inserts the data to be overwritten into the new table, and then atomically replaces the old table with the new table and modifies its name. Therefore, during the process of overwriting the table, the data in the old table can still be accessed normally until the overwriting is completed.
For Auto Partition Table
If the target table of the INSERT OVERWRITE is an autopartitioned table, the behaviour is controlled by the Session Variable enable_auto_create_when_overwrite controls the behaviour as follows:
- If PARTITION is not specified (overwrite the whole table), when
enable_auto_create_when_overwriteistrue, the table is overwritten and partitions are created according to the table's auto-partitioning rules for data that does not have a corresponding partition, and those datas is admit. Ifenable_auto_create_when_overwriteisfalse, data for which no partition is found will accumulate error rows until it fails. - If an overwrite PARTITION is specified, the AUTO PARTITION table behaves as a normal partitioned table during this process, and data that does not satisfy the conditions of an existing partition is filtered instead of creating a new partition.
- If you specify PARTITION as
partition(*)(auto detect partition and overwrite), whenenable_auto_create_when_overwriteistrue, for the data that have corresponding partitions in the table, overwrite their corresponding partitions, and leave the other existing partitions unchanged. At the same time, for data without corresponding partitions, create partitions according to the table's auto-partitioning rules, and accommodate the data without corresponding partitions. Ifenable_auto_create_when_overwriteisfalse, data for which no partition is found will accumulate error rows until it fails.
In versions without enable_auto_create_when_overwrite, the behaviour is as if the variable had a value of false.
The quick check conclusion is as follows:
- For auto-partition tables that have
enable_auto_create_when_overwriteenabled:
| Partitions to Overwrite | Clear Other Partitions | Auto Create for Unpartitioned Data | |
|---|---|---|---|
| Unlabeled (Full Table) | All | √ | √ |
| Designated Partition | Explicit partition | × | × |
partition(*) | The partition that the data belongs to | × | √ |
- For trivial tables, auto-partition tables disabled
enable_auto_create_when_overwrite:
| Partitions to Overwrite | Clear Other Partitions | Auto Create for Unpartitioned Data | |
|---|---|---|---|
| Unlabeled (Full Table) | All | √ | × |
| Designated Partition | Explicit partition | × | × |
partition(*) | The partition that the data belongs to | × | × |
Examples are shown below:
mysql> create table auto_list(
-> k0 varchar null
-> )
-> auto partition by list (k0)
-> (
-> PARTITION p1 values in (("Beijing"), ("BEIJING")),
-> PARTITION p2 values in (("Shanghai"), ("SHANGHAI")),
-> PARTITION p3 values in (("xxx"), ("XXX")),
-> PARTITION p4 values in (("list"), ("LIST")),
-> PARTITION p5 values in (("1234567"), ("7654321"))
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.14 sec)
mysql> insert into auto_list values ("Beijing"),("Shanghai"),("xxx"),("list"),("1234567");
Query OK, 5 rows affected (0.22 sec)
mysql> insert overwrite table auto_list partition(*) values ("BEIJING"), ("new1");
Query OK, 2 rows affected (0.28 sec)
mysql> select * from auto_list;
+----------+ --- p1 is overwritten, new1 gets the new partition, and the other partitions remain unchanged.
| k0 |
+----------+
| 1234567 |
| BEIJING |
| list |
| xxx |
| new1 |
| Shanghai |
+----------+
6 rows in set (0.48 sec)
mysql> insert overwrite table auto_list values ("SHANGHAI"), ("new2");
Query OK, 2 rows affected (0.17 sec)
mysql> select * from auto_list;
+----------+ --- The whole table is overwritten, and new2 gets the new partition.
| k0 |
+----------+
| new2 |
| SHANGHAI |
+----------+
2 rows in set (0.15 sec)
Example
Assuming there is a table named test. The table contains two columns c1 and c2, and two partitions p1 and p2
CREATE TABLE IF NOT EXISTS test (
`c1` int NOT NULL DEFAULT "1",
`c2` int NOT NULL DEFAULT "4"
) ENGINE=OLAP
UNIQUE KEY(`c1`)
PARTITION BY LIST (`c1`)
(
PARTITION p1 VALUES IN ("1","2","3"),
PARTITION p2 VALUES IN ("4","5","6")
)
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);
Overwrite Table
-
Overwrite the
testtable using the form ofVALUES.// Single-row overwrite.
INSERT OVERWRITE table test VALUES (1, 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, DEFAULT);
INSERT OVERWRITE table test (c1) VALUES (1);
// Multi-row overwrite.
INSERT OVERWRITE table test VALUES (1, 2), (3, 2 + 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, 2), (3, 2 * 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, DEFAULT), (3, DEFAULT);
INSERT OVERWRITE table test (c1) VALUES (1), (3);
- The first and second statements have the same effect. If the target column is not specified during overwriting, the column order in the table will be used as the default target column. After the overwrite is successful, there is only one row of data in the
testtable. - The third and fourth statements have the same effect. The unspecified column
c2will be overwritten with the default value 4. After the overwrite is successful, there is only one row of data in thetesttable. - The fifth and sixth statements have the same effect. Expressions (such as
2+2,2*2) can be used in the statement. The result of the expression will be computed during the execution of the statement and then overwritten into thetesttable. After the overwrite is successful, there are two rows of data in thetesttable. - The seventh and eighth statements have the same effect. The unspecified column
c2will be overwritten with the default value 4. After the overwrite is successful, there are two rows of data in thetesttable.
-
Overwrite the
testtable in the form of a query statement. The data format of thetest2table and thetesttable must be consistent. If they are not consistent, implicit data type conversion will be triggered.INSERT OVERWRITE table test SELECT * FROM test2;
INSERT OVERWRITE table test (c1, c2) SELECT * from test2;
- The first and second statements have the same effect. The purpose of these statements is to take data from the
test2table and overwrite thetesttable with the taken data. After the overwrite is successful, the data in thetesttable will be consistent with the data in thetest2table.
-
Overwrite the
testtable and specify a label.INSERT OVERWRITE table test WITH LABEL `label1` SELECT * FROM test2;
INSERT OVERWRITE table test WITH LABEL `label2` (c1, c2) SELECT * from test2;
- Users can use the
SHOW LOAD;command to check the status of the job imported by thislabel. It should be noted that the label is unique.
Overwrite Table Partition
When using INSERT OVERWRITE to rewrite partitions, we actually encapsulate the following three steps into a single transaction and execute it. If it fails halfway through, the operations that have been performed will be rolled back:
- Assuming that partition
p1is specified to be rewritten, first create an empty temporary partitionpTMPwith the same structure as the target partition to be rewritten. - Write data to
pTMP. - replace
p1with thepTMPatom
The following is examples:
-
Overwrite partitions
P1andP2of thetesttable using the form ofVALUES.// Single-row overwrite.
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (1, 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, DEFAULT);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1) VALUES (1);
// Multi-row overwrite.
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (1, 2), (4, 2 + 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, 2), (4, 2 * 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, DEFAULT), (4, DEFAULT);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1) VALUES (1), (4);Unlike overwriting an entire table, the above statements are overwriting partitions in the table. Partitions can be overwritten one at a time or multiple partitions can be overwritten at once. It should be noted that only data that satisfies the corresponding partition filtering condition can be overwritten successfully. If there is data in the overwritten data that does not satisfy any of the partitions, the overwrite will fail. An example of a failure is shown below.
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (7, 2);The data overwritten by the above statements (
c1=7) does not satisfy the conditions of partitionsP1andP2, so the overwrite will fail. -
Overwrite partitions
P1andP2of thetesttable in the form of a query statement. The data format of thetest2table and thetesttable must be consistent. If they are not consistent, implicit data type conversion will be triggered.INSERT OVERWRITE table test PARTITION(p1,p2) SELECT * FROM test2;
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) SELECT * from test2; -
Overwrite partitions
P1andP2of thetesttable and specify a label.INSERT OVERWRITE table test PARTITION(p1,p2) WITH LABEL `label3` SELECT * FROM test2;
INSERT OVERWRITE table test PARTITION(p1,p2) WITH LABEL `label4` (c1, c2) SELECT * from test2;
Overwrite Auto Detect Partition
This feature is available since version 2.1.3.
When the PARTITION clause specified by the INSERT OVERWRITE command is PARTITION(*), this overwrite will automatically detect the partition where the data is located. Example:
mysql> create table test(
-> k0 int null
-> )
-> partition by range (k0)
-> (
-> PARTITION p10 values less than (10),
-> PARTITION p100 values less than (100),
-> PARTITION pMAX values less than (maxvalue)
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.11 sec)
mysql> insert into test values (1), (2), (15), (100), (200);
Query OK, 5 rows affected (0.29 sec)
mysql> select * from test order by k0;
+------+
| k0 |
+------+
| 1 |
| 2 |
| 15 |
| 100 |
| 200 |
+------+
5 rows in set (0.23 sec)
mysql> insert overwrite table test partition(*) values (3), (1234);
Query OK, 2 rows affected (0.24 sec)
mysql> select * from test order by k0;
+------+
| k0 |
+------+
| 3 |
| 15 |
| 1234 |
+------+
3 rows in set (0.20 sec)
As you can see, all data in partitions p10 and pMAX, where data 3 and 1234 are located, are overwritten, while partition p100 remains unchanged. This operation can be interpreted as syntactic sugar for specifying a specific partition to be overwritten by the PARTITION clause during an INSERT OVERWRITE operation, which is implemented in the same way as specify a partition to overwrite. The PARTITION(*) syntax eliminates the need to manually fill in all the partition names when overwriting a large number of partitions.
Keywords
INSERT OVERWRITE, OVERWRITE, AUTO DETECT