Exporting Query Result
This document describes how to use the SELECT INTO OUTFILE command to export query results.
SELECT INTO OUTFILE
is a synchronous command, which means that the operation is completed when the command returns. It also returns a row of results to show the execution result of the export.
Examples
Export to HDFS
Export simple query results to the file hdfs://path/to/result.txt
, specifying the export format as CSV.
SELECT * FROM tbl
INTO OUTFILE "hdfs://path/to/result_"
FORMAT AS CSV
PROPERTIES
(
"broker.name" = "my_broker",
"column_separator" = ",",
"line_delimiter" = "\n"
);
Export to Local Files
When exporting to a local file, you need to configure enable_outfile_to_local=true
in fe.conf first
select * from tbl1 limit 10
INTO OUTFILE "file:///home/work/path/result_";
For more usage, see OUTFILE documentation.
Concurrent Export
By default, the export of the query result set is non-concurrent, that is, a single point of export. If the user wants the query result set to be exported concurrently, the following conditions need to be met:
session variable 'enable_parallel_outfile' to enable concurrent export:
set enable_parallel_outfile = true;
The export method is S3, HDFS instead of using a broker
The query can meet the needs of concurrent export, for example, the top level does not contain single point nodes such as sort. (I will give an example later, which is a query that does not export the result set concurrently)
If the above three conditions are met, the concurrent export query result set can be triggered. Concurrency = be_instacne_num * parallel_fragment_exec_instance_num
How to Verify that the Result Set is Exported Concurrently
After the user enables concurrent export through the session variable setting, if you want to verify whether the current query can be exported concurrently, you can use the following method.
explain select xxx from xxx where xxx into outfile "s3://xxx" format as csv properties ("AWS_ENDPOINT" = "xxx", ...);
After explaining the query, Doris will return the plan of the query. If you find that RESULT FILE SINK
appears in PLAN FRAGMENT 1
, it means that the export concurrency has been opened successfully.
If RESULT FILE SINK
appears in PLAN FRAGMENT 0
, it means that the current query cannot be exported concurrently (the current query does not satisfy the three conditions of concurrent export at the same time).
Planning example for concurrent export:
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:<slot 2> | <slot 3> | <slot 4> | <slot 5> |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 1:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS:`k1` + `k2` |
| PARTITION: HASH_PARTITIONED: `default_cluster:test`.`multi_tablet`.`k1` |
| |
| RESULT FILE SINK |
| FILE PATH: s3://ml-bd-repo/bpit_test/outfile_1951_ |
| STORAGE TYPE: S3 |
| |
| 0:OlapScanNode |
| TABLE: multi_tablet |
+-----------------------------------------------------------------------------+
Usage Examples
For details, please refer to OUTFILE Document.
Return Results
The command is a synchronization command. The command returns, which means the operation is over. At the same time, a row of results will be returned to show the exported execution result.
If it exports and returns normally, the result is as follows:
mysql> select * from tbl1 limit 10 into outfile "file:///home/work/path/result_";
+------------+-----------+----------+--------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+----------+--------------------------------------------------------------------+
| 1 | 2 | 8 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
1 row in set (0.05 sec)
FileNumber: The number of files finally generated.
TotalRows: The number of rows in the result set.
FileSize: The total size of the exported file. Unit byte.
URL: If it is exported to a local disk, the Compute Node to which it is exported is displayed here.
If a concurrent export is performed, multiple rows of data will be returned.
+------------+-----------+----------+--------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+----------+--------------------------------------------------------------------+
| 1 | 3 | 7 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
| 1 | 2 | 4 | file:///192.168.1.11/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
2 rows in set (2.218 sec)
If the execution is incorrect, an error message will be returned, such as:
mysql> SELECT * FROM tbl INTO OUTFILE ...
ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ...
Notice
The CSV format does not support exporting binary types, such as BITMAP and HLL types. These types will be output as
\N
, which is null.If you do not enable concurrent export, the query result is exported by a single BE node in a single thread. Therefore, the export time and the export result set size are positively correlated. Turning on concurrent export can reduce the export time.
The export command does not check whether the file and file path exist. Whether the path will be automatically created or whether the existing file will be overwritten is entirely determined by the semantics of the remote storage system.
If an error occurs during the export process, the exported file may remain on the remote storage system. Doris will not clean these files. The user needs to manually clean up.
The timeout of the export command is the same as the timeout of the query. It can be set by
SET query_timeout = xxx
.For empty result query, there will be an empty file.
File spliting will ensure that a row of data is stored in a single file. Therefore, the size of the file is not strictly equal to
max_file_size
.For functions whose output is invisible characters, such as BITMAP and HLL types, the output is
\N
, which is NULL.At present, the output type of some geo functions, such as
ST_Point
is VARCHAR, but the actual output value is an encoded binary character. Currently these functions will output garbled characters. For geo functions, useST_AsText
for output.
More Help
For more detailed syntax and best practices for using OUTFILE, please refer to the OUTFILE command manual, you can also More help information can be obtained by typing HELP OUTFILE
at the command line of the MySql client.