This statement is used to export query results to a file using the command. Currently, it supports exporting to remote storage, such as HDFS, S3, BOS, COS (Tencent Cloud), through the Broker process, through the S3 protocol, or directly through the HDFS protocol.

    grammar:

    illustrate:

    1. file_path

      filepath points to the path where the file is stored and the file prefix. Such as `hdfs://path/to/my_file`.

      1. The final filename will consist of `my_file_`, the file number and the file format suffix. The file serial number starts from 0, and the number is the number of files to be divided. Such as:
      2. my_file_abcdefg_0.csv
      3. my_file_abcdefg_1.csv
      4. my_file_abcdegf_2.csv
    2. format_as

      1. FORMAT AS CSV

      Specifies the export format. Supported formats include CSV, PARQUET, CSV_WITH_NAMES, CSV_WITH_NAMES_AND_TYPES and ORC. Default is CSV.

    3. properties

      Specify related properties. Currently exporting via the Broker process, or via the S3 protocol is supported.

      1. grammar:
      2. [PROPERTIES ("key"="value", ...)]
      3. The following properties are supported:
      4. column_separator: column separator. <version since="1.2.0">Support mulit-bytes, such as: "\\x01", "abc"</version>
      5. line_delimiter: line delimiter. <version since="1.2.0">Support mulit-bytes, such as: "\\x01", "abc"</version>
      6. max_file_size: the size limit of a single file, if the result exceeds this value, it will be cut into multiple files.
      7. Broker related properties need to be prefixed with `broker.`:
      8. broker.name: broker name
      9. broker.hadoop.security.authentication: specify the authentication method as kerberos
      10. broker.kerberos_principal: specifies the principal of kerberos
      11. broker.kerberos_keytab: specifies the path to the keytab file of kerberos. The file must be the absolute path to the file on the server where the broker process is located. and can be accessed by the Broker process
      12. HDFS related properties:
      13. fs.defaultFS: namenode address and port
      14. hadoop.username: hdfs username
      15. dfs.nameservices: if hadoop enable HA, please set fs nameservice. See hdfs-site.xml
      16. dfs.ha.namenodes.[nameservice ID]:unique identifiers for each NameNode in the nameservice. See hdfs-site.xml
      17. dfs.namenode.rpc-address.[nameservice ID].[name node ID]`:the fully-qualified RPC address for each NameNode to listen on. See hdfs-site.xml
      18. dfs.client.failover.proxy.provider.[nameservice ID]:the Java class that HDFS clients use to contact the Active NameNode, usually it is org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
      19. For a kerberos-authentication enabled Hadoop cluster, additional properties need to be set:
      20. dfs.namenode.kerberos.principal: HDFS namenode service principal
      21. hadoop.security.authentication: kerberos
      22. hadoop.kerberos.principal: the Kerberos pincipal that Doris will use when connectiong to HDFS.
      23. hadoop.kerberos.keytab: HDFS client keytab location.
      24. For the S3 protocol, you can directly execute the S3 protocol configuration:
      25. AWS_ENDPOINT
      26. AWS_ACCESS_KEY
      27. AWS_SECRET_KEY
      28. AWS_REGION
      29. use_path_stype: (optional) default false . The S3 SDK uses the virtual-hosted style by default. However, some object storage systems may not be enabled or support virtual-hosted style access. At this time, we can add the use_path_style parameter to force the use of path style access method.
    1. Use the broker method to export, and export the simple query results to the file hdfs://path/to/result.txt. Specifies that the export format is CSV. Use my_broker and set kerberos authentication information. Specify the column separator as , and the row separator as \n.

      1. SELECT * FROM tbl
      2. INTO OUTFILE "hdfs://path/to/result_"
      3. FORMAT AS CSV
      4. PROPERTIES
      5. (
      6. "broker.name" = "my_broker",
      7. "broker.hadoop.security.authentication" = "kerberos",
      8. "broker.kerberos_principal" = "doris@YOUR.COM",
      9. "column_separator" = ",",
      10. "line_delimiter" = "\n",
      11. "max_file_size" = "100MB"
      12. );
    2. Export the simple query results to the file hdfs://path/to/result.parquet. Specify the export format as PARQUET. Use my_broker and set kerberos authentication information.

      Exporting query results to parquet files requires explicit schema.

    3. Export the query result of the CTE statement to the file hdfs://path/to/result.txt. The default export format is CSV. Use my_broker and set hdfs high availability information. Use the default row and column separators.

      1. WITH
      2. x1 AS
      3. (SELECT k1, k2 FROM tbl1),
      4. x2 AS
      5. (SELECT k3 FROM tbl2)
      6. SELEC k1 FROM x1 UNION SELECT k3 FROM x2
      7. INTO OUTFILE "hdfs://path/to/result_"
      8. PROPERTIES
      9. (
      10. "broker.name" = "my_broker",
      11. "broker.username"="user",
      12. "broker.password"="passwd",
      13. "broker.dfs.nameservices" = "my_ha",
      14. "broker.dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
      15. "broker.dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
      16. "broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
      17. "broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
      18. );

      If the final generated file is not larger than 1GB, it will be: result_0.csv. If larger than 1GB, it may be result_0.csv, result_1.csv, ....

    4. Export the query result of the UNION statement to the file bos://bucket/result.txt. Specify the export format as PARQUET. Use my_broker and set hdfs high availability information. The PARQUET format does not require a column delimiter to be specified. After the export is complete, an identity file is generated.

      1. SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
      2. INTO OUTFILE "bos://bucket/result_"
      3. FORMAT AS PARQUET
      4. PROPERTIES
      5. (
      6. "broker.name" = "my_broker",
      7. "broker.bos_endpoint" = "http://bj.bcebos.com",
      8. "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
      9. "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyy",
      10. "schema"="required,int32,k1;required,byte_array,k2"
      11. );
    5. Export the query result of the select statement to the file s3a://${bucket_name}/path/result.txt. Specify the export format as csv. After the export is complete, an identity file is generated.

      1. select k1,k2,v1 from tbl1 limit 100000
      2. into outfile "s3a://my_bucket/export/my_file_"
      3. FORMAT AS CSV
      4. PROPERTIES
      5. (
      6. "broker.name" = "hdfs_broker",
      7. "broker.fs.s3a.access.key" = "xxx",
      8. "broker.fs.s3a.secret.key" = "xxxx",
      9. "broker.fs.s3a.endpoint" = "https://cos.xxxxxx.myqcloud.com/",
      10. "column_separator" = ",",
      11. "line_delimiter" = "\n",
      12. "max_file_size" = "1024MB",
      13. "success_file_name" = "SUCCESS"
      14. )

      If the final generated file is not larger than 1GB, it will be: my_file_0.csv. If larger than 1GB, it may be my_file_0.csv, result_1.csv, .... Verify on cos

      1. 1. A path that does not exist will be automatically created
      2. 2. Access.key/secret.key/endpoint needs to be confirmed with students of cos. Especially the value of endpoint does not need to fill in bucket_name.
    6. Use the s3 protocol to export to bos, and enable concurrent export.

      The resulting file is prefixed with .

    7. Use the s3 protocol to export to bos, and enable concurrent export of session variables. Note: However, since the query statement has a top-level sorting node, even if the concurrently exported session variable is enabled for this query, it cannot be exported concurrently.

      1. set enable_parallel_outfile = true;
      2. into outfile "s3://my_bucket/export/my_file_"
      3. format as csv
      4. properties
      5. (
      6. "AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
      7. "AWS_ACCESS_KEY" = "xxxx",
      8. "AWS_SECRET_KEY" = "xxx",
      9. "AWS_REGION" = "bd"
      10. )
      1. -- the default port of fileSystem_port is 9000
      2. SELECT * FROM tbl
      3. INTO OUTFILE "hdfs://${host}:${fileSystem_port}/path/to/result_"
      4. FORMAT AS CSV
      5. PROPERTIES
      6. (
      7. "fs.defaultFS" = "hdfs://ip:port",
      8. "hadoop.username" = "work"
      9. );

      If the Hadoop cluster is highly available and Kerberos authentication is enabled, you can refer to the following SQL statement:

      1. SELECT * FROM tbl
      2. INTO OUTFILE "hdfs://path/to/result_"
      3. FORMAT AS CSV
      4. PROPERTIES
      5. (
      6. 'fs.defaultFS'='hdfs://hacluster/',
      7. 'dfs.nameservices'='hacluster',
      8. 'dfs.ha.namenodes.hacluster'='n1,n2',
      9. 'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
      10. 'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
      11. 'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
      12. 'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
      13. 'hadoop.security.authentication'='kerberos',
      14. 'hadoop.kerberos.principal'='doris_test@REALM.COM',
      15. 'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
      16. );
      17. If the final generated file is not larger than 100MB, it will be: `result_0.csv`.
      18. If larger than 100MB, it may be `result_0.csv, result_1.csv, ...`.
    8. Export the query result of the select statement to the file cosn://${bucket_name}/path/result.txt on Tencent Cloud Object Storage (COS). Specify the export format as csv. After the export is complete, an identity file is generated.

      1. select k1,k2,v1 from tbl1 limit 100000
      2. into outfile "cosn://my_bucket/export/my_file_"
      3. FORMAT AS CSV
      4. PROPERTIES
      5. (
      6. "broker.name" = "broker_name",
      7. "broker.fs.cosn.userinfo.secretId" = "xxx",
      8. "broker.fs.cosn.userinfo.secretKey" = "xxxx",
      9. "broker.fs.cosn.bucket.endpoint_suffix" = "https://cos.xxxxxx.myqcloud.com/",
      10. "column_separator" = ",",
      11. "line_delimiter" = "\n",
      12. "max_file_size" = "1024MB",
      13. "success_file_name" = "SUCCESS"
      14. )

    OUTFILE

    1. Export data volume and export efficiency

      This function essentially executes an SQL query command. The final result is a single-threaded output. Therefore, the time-consuming of the entire export includes the time-consuming of the query itself and the time-consuming of writing the final result set. If the query is large, you need to set the session variable to appropriately extend the query timeout.

    2. Management of export files

      Doris does not manage exported files. Including the successful export, or the remaining files after the export fails, all need to be handled by the user.

    3. Export to local file

      The ability to export to a local file is not available for public cloud users, only for private deployments. And the default user has full control over the cluster nodes. Doris will not check the validity of the export path filled in by the user. If the process user of Doris does not have write permission to the path, or the path does not exist, an error will be reported. At the same time, for security reasons, if a file with the same name already exists in this path, the export will also fail.

      Doris does not manage files exported locally, nor does it check disk space, etc. These files need to be managed by the user, such as cleaning and so on.