Skip to main content

Data Lakehouse FAQ

Certificate Issues

  1. When querying, an error curl 77: Problem with the SSL CA cert. occurs. This indicates that the current system certificate is too old and needs to be updated locally.

    • You can download the latest CA certificate from https://curl.se/docs/caextract.html.
    • Place the downloaded cacert-xxx.pem into the /etc/ssl/certs/ directory, for example: sudo cp cacert-xxx.pem /etc/ssl/certs/ca-certificates.crt.
  2. When querying, an error occurs: ERROR 1105 (HY000): errCode = 2, detailMessage = (x.x.x.x)[CANCELLED][INTERNAL_ERROR]error setting certificate verify locations: CAfile: /etc/ssl/certs/ca-certificates.crt CApath: none.

    yum install -y ca-certificates
    ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-certificates.crt

Kerberos

  1. When connecting to a Hive Metastore authenticated with Kerberos, an error GSS initiate failed is encountered.

    This is usually due to incorrect Kerberos authentication information. You can troubleshoot by following these steps:

    1. In versions prior to 1.2.1, the libhdfs3 library that Doris depends on did not enable gsasl. Please update to versions 1.2.2 and later.

    2. Ensure that correct keytab and principal are set for each component and verify that the keytab file exists on all FE and BE nodes.

      1. hadoop.kerberos.keytab/hadoop.kerberos.principal: Used for Hadoop HDFS access, fill in the corresponding values for HDFS.
      2. hive.metastore.kerberos.principal: Used for Hive Metastore.
    3. Try replacing the IP in the principal with a domain name (do not use the default _HOST placeholder).

    4. Ensure that the /etc/krb5.conf file exists on all FE and BE nodes.

  2. When connecting to a Hive database through the Hive Catalog, an error occurs: RemoteException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS].

    If the error occurs during the query when there are no issues with show databases and show tables, follow these two steps:

    • Place core-site.xml and hdfs-site.xml in the fe/conf and be/conf directories.
    • Execute Kerberos kinit on the BE node, restart BE, and then proceed with the query.
  3. When encountering the error GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos Ticket) while querying a table configured with Kerberos, restarting FE and BE nodes usually resolves the issue.

    • Before restarting all nodes, configure -Djavax.security.auth.useSubjectCredsOnly=false in the JAVA_OPTS parameter in "${DORIS_HOME}/be/conf/be.conf" to obtain JAAS credentials information through the underlying mechanism rather than the application.
    • Refer to JAAS Troubleshooting for solutions to common JAAS errors.
  4. To resolve the error Unable to obtain password from user when configuring Kerberos in the Catalog:

    • Ensure the principal used is listed in klist by checking with klist -kt your.keytab.
    • Verify the Catalog configuration for any missing settings such as yarn.resourcemanager.principal.
    • If the above checks are fine, it may be due to the JDK version installed by the system's package manager not supporting certain encryption algorithms. Consider installing JDK manually and setting the JAVA_HOME environment variable.
    • Kerberos typically uses AES-256 for encryption. For Oracle JDK, JCE must be installed. Some distributions of OpenJDK automatically provide unlimited strength JCE, eliminating the need for separate installation.
    • JCE versions correspond to JDK versions; download the appropriate JCE zip package and extract it to the $JAVA_HOME/jre/lib/security directory based on the JDK version:
  5. When encountering the error java.security.InvalidKeyException: Illegal key size while accessing HDFS with KMS, upgrade the JDK version to >= Java 8 u162, or install the corresponding JCE Unlimited Strength Jurisdiction Policy Files.

  6. If configuring Kerberos in the Catalog results in the error SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS], place the core-site.xml file in the "${DORIS_HOME}/be/conf" directory.

    If accessing HDFS results in the error No common protection layer between client and server, ensure that the hadoop.rpc.protection properties on the client and server are consistent.

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>

    <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
    </property>

    </configuration>
  7. When using Broker Load with Kerberos configured and encountering the error Cannot locate default realm.:

    Add the configuration item -Djava.security.krb5.conf=/your-path to the JAVA_OPTS in the start_broker.sh script for Broker Load.

  8. When using Kerberos configuration in the Catalog, the hadoop.username property cannot be used simultaneously.

  9. Accessing Kerberos with JDK 17

    When running Doris with JDK 17 and accessing Kerberos services, you may encounter issues due to the use of deprecated encryption algorithms. You need to add the allow_weak_crypto=true property in krb5.conf, or upgrade the encryption algorithm in Kerberos.

    For more details, refer to: https://seanjmullan.org/blog/2021/09/14/jdk17#kerberos

JDBC Catalog

  1. Error connecting to SQLServer via JDBC Catalog: unable to find valid certification path to requested target

    Add the trustServerCertificate=true option in the jdbc_url.

  2. Connecting to MySQL database via JDBC Catalog results in Chinese character garbling or incorrect Chinese character query conditions

    Add useUnicode=true&characterEncoding=utf-8 in the jdbc_url.

    Note: Starting from version 1.2.3, when connecting to MySQL database via JDBC Catalog, these parameters will be automatically added.

  3. Error connecting to MySQL database via JDBC Catalog: Establishing SSL connection without server's identity verification is not recommended

    Add useSSL=true in the jdbc_url.

  4. When synchronizing MySQL data to Doris using JDBC Catalog, date data synchronization error occurs. Verify if the MySQL version matches the MySQL driver package, for example, MySQL 8 and above require the driver com.mysql.cj.jdbc.Driver.

  5. When a single field is too large, a Java memory OOM occurs on the BE side during a query.

    When JDBC Scanner reads data through JDBC, the Session Variable batch_size determines the number of rows processed in the JVM per batch. If a single field is too large, it may cause field_size * batch_size (approximate value, considering JVM static memory and data copy overhead) to exceed the JVM memory limit, resulting in OOM.

    Solutions:

    • Reduce the batch_size value by executing set batch_size = 512;. The default value is 4064.
    • Increase the BE JVM memory by modifying the -Xmx parameter in JAVA_OPTS. For example: -Xmx8g.

Hive Catalog

  1. Accessing Iceberg or Hive table through Hive Catalog reports an error: failed to get schema or Storage schema reading not supported

    You can try the following methods:

    • Put the iceberg runtime-related jar package in the lib/ directory of Hive.

    • Configure in hive-site.xml:

      metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader

      After the configuration is completed, you need to restart the Hive Metastore.

    • Add "get_schema_from_table" = "true" in the Catalog properties.

      This parameter is supported since versions 2.1.10 and 3.0.6.

  2. Error connecting to Hive Catalog: Caused by: java.lang.NullPointerException

    If the fe.log contains the following stack trace:

    Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:78) ~[hive-exec-3.1.3-core.jar:3.1.3]
    at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:55) ~[hive-exec-3.1.3-core.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1548) ~[doris-fe.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1542) ~[doris-fe.jar:3.1.3]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]

    Try adding "metastore.filter.hook" = "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl" in the CREATE CATALOG statement to resolve.

  3. If after creating Hive Catalog, show tables works fine but querying results in java.net.UnknownHostException: xxxxx

    Add the following in the Catalog's PROPERTIES:

    'fs.defaultFS' = 'hdfs://<your_nameservice_or_actually_HDFS_IP_and_port>'
  4. Tables in ORC format in Hive 1.x may encounter system column names in the underlying ORC file Schema as _col0, _col1, _col2, etc. In this case, add hive.version as 1.x.x in the Catalog configuration to map with the column names in the Hive table.

    CREATE CATALOG hive PROPERTIES (
    'hive.version' = '1.x.x'
    );
  5. When querying table data using Catalog, errors related to Hive Metastore such as Invalid method name are encountered, set the hive.version parameter.

  6. When querying a table in ORC format, if the FE reports Could not obtain block or Caused by: java.lang.NoSuchFieldError: types, it may be due to the FE accessing HDFS to retrieve file information and perform file splitting by default. In some cases, the FE may not be able to access HDFS. This can be resolved by adding the following parameter: "hive.exec.orc.split.strategy" = "BI". Other options include HYBRID (default) and ETL.

  7. In Hive, you can find the partition field values of a Hudi table, but in Doris, you cannot. Doris and Hive currently have different ways of querying Hudi. In Doris, you need to add the partition fields in the avsc file structure of the Hudi table. If not added, Doris will query with partition_val being empty (even if hoodie.datasource.hive_sync.partition_fields=partition_val is set).

    {
    "type": "record",
    "name": "record",
    "fields": [{
    "name": "partition_val",
    "type": [
    "null",
    "string"
    ],
    "doc": "Preset partition field, empty string when not partitioned",
    "default": null
    },
    {
    "name": "name",
    "type": "string",
    "doc": "Name"
    },
    {
    "name": "create_time",
    "type": "string",
    "doc": "Creation time"
    }
    ]
    }
  8. When querying a Hive external table, if you encounter the error java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found, search for hadoop-lzo-*.jar in the Hadoop environment, place it in the "${DORIS_HOME}/fe/lib/" directory, and restart the FE. Starting from version 2.0.2, you can place this file in the custom_lib/ directory of the FE (if it does not exist, create it manually) to prevent file loss when upgrading the cluster due to the lib directory being replaced.

  9. When creating a Hive table specifying the serde as org.apache.hadoop.hive.contrib.serde2.MultiDelimitserDe, and encountering the error storage schema reading not supported when accessing the table, add the following configuration to the hive-site.xml file and restart the HMS service:

    <property>
    <name>metastore.storage.schema.reader.impl</name>
    <value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value>
    </property>
  10. Error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty. The complete error message in the FE log is as follows:

    org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
    org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path exception. path=s3://bucket/part-*, err: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
    org.apache.hadoop.fs.s3a.AWSClientIOException: listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
    Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
    Caused by: javax.net.ssl.SSLException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
    Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
    Caused by: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty

    Try updating the CA certificate on the FE node using update-ca-trust (CentOS/RockyLinux), and then restart the FE process.

  11. BE error: java.lang.InternalError. If you see an error similar to the following in be.INFO:

    W20240506 15:19:57.553396 266457 jni-util.cpp:259] java.lang.InternalError
    at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(Native Method)
    at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.<init>(ZlibDecompressor.java:114)
    at org.apache.hadoop.io.compress.GzipCodec$GzipZlibDecompressor.<init>(GzipCodec.java:229)
    at org.apache.hadoop.io.compress.GzipCodec.createDecompressor(GzipCodec.java:188)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:183)
    at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.<init>(CodecFactory.java:99)
    at org.apache.parquet.hadoop.CodecFactory.createDecompressor(CodecFactory.java:223)
    at org.apache.parquet.hadoop.CodecFactory.getDecompressor(CodecFactory.java:212)
    at org.apache.parquet.hadoop.CodecFactory.getDecompressor(CodecFactory.java:43)

    It is because the Doris built-in libz.a conflicts with the system environment's libz.so. To resolve this issue, first execute export LD_LIBRARY_PATH=/path/to/be/lib:$LD_LIBRARY_PATH, and then restart the BE process.

  12. When inserting data into Hive, an error occurred as HiveAccessControlException Permission denied: user [user_a] does not have [UPDATE] privilege on [database/table].

    Since after inserting the data, the corresponding statistical information needs to be updated, and this update operation requires the ALTER privilege. Therefore, the ALTER privilege needs to be added for this user on Ranger.

  13. When querying ORC files, if an error like Orc row reader nextBatch failed. reason = Can't open /usr/share/zoneinfo/+08:00 occurs.

    First check the time_zone setting of the current session. It is recommended to use a region-based timezone name such as Asia/Shanghai.

    If the session timezone is already set to Asia/Shanghai but the query still fails, it indicates that the ORC file was generated with the timezone +08:00. During query execution, this timezone is required when parsing the ORC footer. In this case, you can try creating a symbolic link under the /usr/share/zoneinfo/ directory that points +08:00 to an equivalent timezone.

  14. When querying a Hive table that uses JSON SerDe (e.g., org.openx.data.jsonserde.JsonSerDe), an error occurs: failed to get schema or Storage schema reading not supported

    When a Hive table uses JSON format storage (ROW FORMAT SERDE is org.openx.data.jsonserde.JsonSerDe), the Hive Metastore may not be able to read the table's schema information through the default method, causing the following error when querying from Doris:

    errCode = 2, detailMessage = failed to get schema for table xxx in db xxx.
    reason: org.apache.hadoop.hive.metastore.api.MetaException:
    java.lang.UnsupportedOperationException: Storage schema reading not supported

    This can be resolved by adding "get_schema_from_table" = "true" in the Catalog properties. This parameter instructs Doris to retrieve the schema directly from the Hive table metadata instead of relying on the underlying storage's Schema Reader.

    CREATE CATALOG hive PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://x.x.x.x:9083',
    'get_schema_from_table' = 'true'
    );

    This parameter is supported since versions 2.1.10 and 3.0.6.

  15. When querying Hive Catalog tables, query planning is extremely slow, the nereids cost too much time error occurs, and each HMS access takes a consistently long time (e.g., around 10 seconds).

    Root Cause Analysis:

    This issue is usually not caused by slow execution of the HMS RPC itself. Instead, the most common root cause is incorrect DNS configuration on the Doris FE node. During the initialization phase of the Hive Metastore Client, hostname resolution is triggered. If the configured DNS server is unreachable or unresponsive, it causes a DNS resolution timeout (typically 10 seconds) every time a new HMS client connection is established, which severely slows down metadata fetching.

    Typical Symptoms:

    • Normal Network Connectivity: The HMS port is reachable, but metadata access in Doris remains extremely slow.
    • Consistent Delay: The delay consistently hits a fixed timeout threshold (e.g., 10 seconds).
    • Workarounds Fail: Simply increasing the HMS client timeout parameter in the Catalog properties only masks the error but does not eliminate the fixed 10-second delay on each connection.

    Troubleshooting Steps:

    Run the following commands on the Doris FE node to verify the DNS and hostname resolution:

    # Check current DNS server configuration
    cat /etc/resolv.conf
    # Test if the DNS server is reachable and measure resolution latency
    ping <nameserver_ip>
    dig @<nameserver_ip> example.com
    dig @<nameserver_ip> -x <hms_ip>

    Solutions (Choose One):

    1. Fix DNS Configuration (Recommended): Correct the nameserver entries in /etc/resolv.conf on the Doris FE node to ensure the DNS service is reachable and responds quickly. If DNS is not required in your local network environment, consider commenting out the invalid nameservers.
    2. Configure Static Hosts Mapping: Add the IP and Hostname mapping of the HMS nodes to /etc/hosts on the FE node.
    3. Standardize Catalog Properties: When creating the Catalog, it is highly recommended to use a resolvable hostname instead of a bare IP address for the hive.metastore.uris property.
  16. Queries on Hive Catalog tables occasionally experience extremely long hangs or directly report the optimizer timeout error nereids cost too much time, but subsequent queries work fine immediately after.

    Problem Description:

    This usually happens after the Catalog has been idle for a while. When an HMS RPC is initiated, if a stale connection from the pool is reused, the request will hang for the duration of the Socket Timeout (default 10s). Due to the Hive Client's internal retry mechanism, this can result in cumulative waits of 20-30 seconds if multiple retries occur. This causes the query planning phase to be extremely slow, often triggering the Doris FE optimizer timeout error nereids cost too much time. Once the connection is purged and rebuilt, performance returns to normal.

    Root Cause Analysis:

    Doris maintains a Client Pool for each HMS Catalog to reuse connections. In complex network environments (e.g., across VPCs, through firewalls, or NAT gateways), idle TCP connections are often "silently" reclaimed by network devices after an idle timeout. Since these devices typically do not send FIN/RST packets to notify the endpoints, Doris still believes the connection is valid. Reusing such a "zombie connection" requires waiting for a full Socket Timeout before the failure is detected and a retry is triggered.

    Troubleshooting Steps:

    • Verify if there are firewalls, NAT gateways, or Load Balancers between Doris FE and HMS.
    • Use the Pulse (hms-tools) diagnostic tool. If the tool shows fast network connectivity but stable delays that are multiples of 10s when executing RPCs after a long idle period, it confirms that idle connections are being silently reclaimed.

    Solution:

    Configure the connection lifetime in your Catalog properties to be slightly shorter than the network device's idle timeout. We recommend using Hive's native socket lifetime property:

    CREATE CATALOG hive_catalog PROPERTIES (
    "type" = "hms",
    "hive.metastore.uris" = "thrift://<hms_host>:<port>",
    -- Set a value shorter than your network's idle timeout (e.g., 300s)
    "hive.metastore.client.socket.lifetime" = "300s"
    );

    When set, the HMS Client will check the connection age before sending an RPC. If it exceeds the lifetime, it proactively reconnects, avoiding long hangs and optimizer timeouts caused by stale connections.

HDFS

  1. When accessing HDFS 3.x, if you encounter the error java.lang.VerifyError: xxx, in versions prior to 1.2.1, Doris depends on Hadoop version 2.8. You need to update to 2.10.2, or upgrade Doris to versions after 1.2.2.

  2. Using Hedged Read to optimize slow HDFS reads. In some cases, high load on HDFS may lead to longer read times for data replicas on a specific HDFS, thereby slowing down overall query efficiency. The HDFS Client provides the Hedged Read feature. This feature initiates another read thread to read the same data if a read request exceeds a certain threshold without returning, and the result returned first is used.

    Note: This feature may increase the load on the HDFS cluster, so use it judiciously.

    You can enable this feature by:

    CREATE CATALOG regression PROPERTIES (
    'type' = 'hms',
    'hive.metastore.uris' = 'thrift://172.21.16.47:7004',
    'dfs.client.hedged.read.threadpool.size' = '128',
    'dfs.client.hedged.read.threshold.millis' = '500'
    );

    dfs.client.hedged.read.threadpool.size represents the number of threads used for Hedged Read, which are shared by an HDFS Client. Typically, for an HDFS cluster, BE nodes will share an HDFS Client.

    dfs.client.hedged.read.threshold.millis is the read threshold in milliseconds. When a read request exceeds this threshold without returning, a Hedged Read is triggered.

    When enabled, you can see the related parameters in the Query Profile:

    TotalHedgedRead: Number of times Hedged Read was initiated.

    HedgedReadWins: Number of successful Hedged Reads (times when the request was initiated and returned faster than the original request).

    Note that these values are cumulative for a single HDFS Client, not for a single query. The same HDFS Client can be reused by multiple queries.

  3. Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

    In the startup scripts of FE and BE, the environment variable HADOOP_CONF_DIR is added to the CLASSPATH. If HADOOP_CONF_DIR is set incorrectly, such as pointing to a non-existent or incorrect path, it may load the wrong xxx-site.xml file, resulting in reading incorrect information.

    Check if HADOOP_CONF_DIR is configured correctly or remove this environment variable.

  4. BlockMissingExcetpion: Could not obtain block: BP-XXXXXXXXX No live nodes contain current block

    Possible solutions include:

    • Use hdfs fsck file -files -blocks -locations to check if the file is healthy.

    • Check connectivity with DataNodes using telnet.

      The following error may be printed in the error log:

      No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.70.150.122:50010,DS-7bba8ffc-651c-4617-90e1-6f45f9a5f896,DISK]

      You can first check the connectivity between the Doris cluster and 10.70.150.122:50010.

      In addition, in some cases, the HDFS cluster uses dual network with internal and external IPs. In this case, domain names are required for communication, and the following needs to be added to the Catalog properties: "dfs.client.use.datanode.hostname" = "true".

      At the same time, please check whether the parameter is true in the hdfs-site.xml file placed under fe/conf and be/conf.

    • Check DataNode logs.

      If you encounter the following error:

      org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to read expected SASL data transfer protection handshake from client at /XXX.XXX.XXX.XXX:XXXXX. Perhaps the client is running an older version of Hadoop which does not support SASL data transfer protection

      it means that the current HDFS has enabled encrypted transmission, but the client has not, causing the error.

      Use any of the following solutions:

      • Copy hdfs-site.xml and core-site.xml to fe/conf and be/conf. (Recommended)
      • In hdfs-site.xml, find the corresponding configuration dfs.data.transfer.protection and set this parameter in the Catalog.
  5. When querying a Hive Catalog table, an error occurs: RPC response has a length of xxx exceeds maximum data length

    For example:

    RPC response has a length of 1213486160 exceeds maximum data length

    The value 1213486160 in hexadecimal is 0x48545450, which corresponds to the ASCII string "HTTP". This indicates that the Doris FE attempted to connect to an HDFS NameNode RPC port, but received an HTTP response instead.

    The root cause is that the HDFS NameNode port configured in the Catalog or in hdfs-site.xml is incorrect — an HTTP port was used where an RPC port is required. HDFS NameNode typically exposes two types of ports:

    • RPC port (default: 8020 or 9000): Used for HDFS client communication (this is the correct port for Doris).
    • HTTP port (default: 9870 or 50070): Used for the NameNode Web UI.

    Check the HDFS NameNode port configuration in the Catalog properties or in hdfs-site.xml under fe/conf and be/conf, and ensure it is set to the RPC port (dfs.namenode.rpc-address), not the HTTP port (dfs.namenode.http-address).

DLF Catalog

  1. When using the DLF Catalog, if Invalid address occurs during BE reading JindoFS data, add the domain name appearing in the logs to IP mapping in /etc/hosts.

  2. If there is no permission to read data, use the hadoop.username property to specify a user with permission.

  3. The metadata in the DLF Catalog should be consistent with DLF. When managing metadata using DLF, newly imported partitions in Hive may not be synchronized by DLF, leading to inconsistencies between DLF and Hive metadata. To address this, ensure that Hive metadata is fully synchronized by DLF.

Other Issues

  1. Query results in garbled characters after mapping Binary type to Doris

    Doris natively does not support the Binary type, so when mapping Binary types from various data lakes or databases to Doris, it is usually done using the String type. The String type can only display printable characters. If you need to query the content of Binary data, you can use the TO_BASE64() function to convert it to Base64 encoding before further processing.

  2. Analyzing Parquet files

    When querying Parquet files, due to potential differences in the format of Parquet files generated by different systems, such as the number of RowGroups, index values, etc., sometimes it is necessary to check the metadata of Parquet files for issue identification or performance analysis. Here is a tool provided to help users analyze Parquet files more conveniently:

    1. Download and unzip Apache Parquet Cli 1.14.0.

    2. Download the Parquet file to be analyzed to your local machine, assuming the path is /path/to/file.parquet.

    3. Use the following command to analyze the metadata of the Parquet file:

      ./parquet-tools meta /path/to/file.parquet

    4. For more functionalities, refer to Apache Parquet Cli documentation.

Diagnostic Tools

Pulse

Pulse is a lightweight connectivity testing toolkit designed to diagnose infrastructure dependencies in data lake environments. It includes several specialized tools to help users quickly pinpoint environment-related issues in external table access.

Pulse consists of the following key toolsets:

  1. HMS Diagnostic Tool (hms-tools):

    • Designed specifically for troubleshooting Hive Metastore (HMS) issues.
    • Supports health checks, ping tests, object metadata retrieval, and configuration diagnostics.
    • Performance Benchmarking: Features a bench mode to measure the response distribution and latency of HMS, helping determine if the bottleneck is at the metadata layer.
  2. Kerberos Diagnostic Tool (kerberos-tools):

    • Used to validate krb5.conf configurations in environments with Kerberos authentication.
    • Supports testing KDC reachability, inspecting keytab files, and performing login tests to ensure the security layer is not blocking the connection.
  3. Object Storage Diagnostic Tools (s3-tools, gcs-tools, azure-blob-cpp):

    • Diagnostic tools for major cloud storage services (AWS S3, Google GCS, Azure Blob Storage).
    • Used for troubleshooting common external table access issues such as "Access Denied" or "Bucket Not Found".
    • Supports validating credential sources and STS identities, and performing bucket-level operation tests.

Example Commands (e.g., HMS):

# Test basic HMS connectivity and latency details using hms-tools
java -jar hms-tools.jar ping --uris thrift://<hms_host>:<port> --count 3 --verbose

# Benchmark actual metadata RPC response distribution using hms-tools
java -jar hms-tools.jar bench --uris thrift://<hms_host>:<port> --rpc get_all_databases --iterations 10

When metadata access is slow or external table connectivity fails, it is recommended to use the corresponding Pulse tool based on the issue type (e.g., authentication failure, slow metadata, or storage reachability) for investigation. If the connect phase is extremely fast but there are significant and consistent delays during the overall initialization, please refer to the FAQ above to check the DNS and hostname resolution settings on the FE node.