跳到主要内容

Persistent Volume and ConfigMap

Doris-Operator supports mounting PV (Persistent Volume) on pods of various Doris components.

PV is generally created by the kubernetes system administrator. Doris-Operator does not use PV directly when deploying Doris services. Instead, it declares a set of resources through PVC to apply for PV from the kubernetes cluster. When a PVC is created, Kubernetes will attempt to bind it to an available PV that meets the requirements. StorageClass shields administrators from the process of manually creating PVs. When there are no ready-made PVs that meet PVC requirements, PVs can be dynamically allocated based on StorageClass. PV provides a variety of storage types, mainly divided into two categories: network storage and local storage. Based on their respective principles and implementations, the two provide users with different performance and usage experiences. Users can choose according to their own containerized service types and their own needs.

If PVC is not configured during deployment, Doris-Operator uses the emptyDir mode by default to store metadata data files and run logs. When the pod is restarted, related data will be lost.

Recommended node directory type for persistent storage:

  • FE: doris-meta, log
  • BE: storage, log
  • CN: storage, log
  • BROKER: log

Doris-Operator outputs logs to the console and the specified directory at the same time. If the user's Kubernetes system has complete log collection capabilities, log information at the Doris INFO level (default) can be collected through console output. However, it is still recommended to configure PVC to persist log files, because in addition to INFO level logs, there are also logs such as fe.out, be.out, audit.log and garbage collection logs, which facilitates quick problem location and audit log backtracking.

ConfigMap is a resource object used to store configuration files in Kubernetes. It allows dynamically mounting configuration files and decouples configuration files from applications, making configuration management more flexible and maintainable. Like PVCs, ConfigMap can be referenced by Pods in order to use configuration data in the application.

StorageClass

Doris-Operator provides Kubernetes default StorageClass mode to support FE and BE data storage, where the storage path (mountPath) uses the default configuration in the image. If users need to specify the StorageClass themselves, they need to modify persistentVolumeClaimSpec.storageClassName in spec.feSpec.persistentVolumes, as shown below:

apiVersion: doris.selectdb.com/v1
kind: DorisCluster
metadata:
labels:
app.kubernetes.io/name: doriscluster
name: doriscluster-sample-storageclass1
spec:
feSpec:
replicas: 3
image: selectdb/doris.fe-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
persistentVolumes:
- mountPath: /opt/apache-doris/fe/doris-meta
name: storage0
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
# notice: if the storage size less 5G, fe will not start normal.
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/fe/log
name: storage1
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
beSpec:
replicas: 3
image: selectdb/doris.be-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
persistentVolumes:
- mountPath: /opt/apache-doris/be/storage
name: storage2
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/be/log
name: storage3
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
storageClassName: ${your_storageclass}
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi

Customized ConfigMap

Doris uses ConfigMap in Kubernetes to decouple configuration files and services. Before deploying doriscluster, you need to deploy the ConfigMap you want to use under the same namespace in advance. The following example shows that FE uses ConfigMap named fe-configmap and BE uses ConfigMap named be-configmap. Cluster related yaml:

ConfigMap sample for FE

apiVersion: v1
kind: ConfigMap
metadata:
name: fe-configmap
labels:
app.kubernetes.io/component: fe
data:
fe.conf: |
CUR_DATE=`date +%Y%m%d-%H%M%S`

# the output dir of stderr and stdout
LOG_DIR = ${DORIS_HOME}/log

JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:$DORIS_HOME/log/fe.gc.log.$CUR_DATE"

# For jdk 9+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_9="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:$DORIS_HOME/log/fe.gc.log.$CUR_DATE:time"

# INFO, WARN, ERROR, FATAL
sys_log_level = INFO

# NORMAL, BRIEF, ASYNC
sys_log_mode = NORMAL

# Default dirs to put jdbc drivers,default value is ${DORIS_HOME}/jdbc_drivers
# jdbc_drivers_dir = ${DORIS_HOME}/jdbc_drivers

http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010

enable_fqdn_mode = true

Note that when using FE's ConfigMap, you must add enable_fqdn_mode = true to fe.conf. For specific reasons, please refer to document here

BE's ConfigMap sample

apiVersion: v1
kind: ConfigMap
metadata:
name: be-configmap
labels:
app.kubernetes.io/component: be
data:
be.conf: |
CUR_DATE=`date +%Y%m%d-%H%M%S`

PPROF_TMPDIR="$DORIS_HOME/log/"

JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# For jdk 9+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# since 1.2, the JAVA_HOME need to be set to run BE process.
# JAVA_HOME=/path/to/jdk/

# https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile
# https://jemalloc.net/jemalloc.3.html
JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false"
JEMALLOC_PROF_PRFIX=""

# INFO, WARNING, ERROR, FATAL
sys_log_level = INFO

# ports for admin, web, heartbeat service
be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060

doriscluster deployment example using the above two ConfigMap:

apiVersion: doris.selectdb.com/v1
kind: DorisCluster
metadata:
labels:
app.kubernetes.io/name: doriscluster
name: doriscluster-sample-configmap
spec:
feSpec:
replicas: 3
image: selectdb/doris.fe-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
configMapInfo:
# use kubectl create configmap fe-configmap --from-file=fe.conf
configMapName: fe-configmap
resolveKey: fe.conf
beSpec:
replicas: 3
image: selectdb/doris.be-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
configMapInfo:
# use kubectl create configmap be-configmap --from-file=be.conf
configMapName: be-configmap
resolveKey: be.conf
brokerSpec:
replicas: 3
image: selectdb/doris.broker-ubuntu:2.0.2
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
configMapInfo:
# use kubectl create configmap broker-configmap --from-file=apache_hdfs_broker.conf
configMapName: broker-configmap
resolveKey: apache_hdfs_broker.conf

The resolveKey here is the name of the incoming configuration file (must be fe.conf, be.conf or apache_hdfs_broker.conf, the cn node is also be.conf) used to parse the incoming Doris cluster configuration file, doris-operator will parse the file to guide the customized deployment of doriscluster.

Add special configuration files to the conf directory

This paragraph is for reference. Containerized deployment solutions that configure other files need to be placed in the conf directory of the Doris node. For example, the common HDFS/Hive configuration file mapping of Data Lake Multi-catalog.

Here we take BE's ConfigMap and the core-site.xml file that needs to be added as an example:

apiVersion: v1
kind: ConfigMap
metadata:
name: be-configmap
labels:
app.kubernetes.io/component: be
data:
be.conf: |
be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060
core-site.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
</configuration>
...

Note that the data structure in data is as follows: key-value pair mapping:

data:
file_name_1:
file_content_1
file_name_2:
file_content_2
file_name_3:
file_content_3

BE multi-disk configuration

Doris' BE service supports multi-disk mounting, which can well solve the problem of mismatch between computing resources and storage resources in the server era. At the same time, using multiple disks can also greatly improve the storage efficiency of doris. On Kubernetes, Doris can also mount multiple disks to maximize storage efficiency. Using multiple disks on Kubernetes requires using configuration files. In order to achieve decoupling of service and configuration, doris uses ConfigMap as the bearer of configuration to dynamically mount configuration files for service use. The following is the doriscluster configuration in which the BE service uses ConfigMap to host the configuration file and mount two disks for BE use:

apiVersion: doris.selectdb.com/v1
kind: DorisCluster
metadata:
labels:
app.kubernetes.io/name: doriscluster
name: doriscluster-sample-storageclass1
spec:
feSpec:
replicas: 3
image: selectdb/doris.fe-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
persistentVolumes:
- mountPath: /opt/apache-doris/fe/doris-meta
name: storage0
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
# notice: if the storage size less 5G, fe will not start normal.
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/fe/log
name: storage1
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
beSpec:
replicas: 3
image: selectdb/doris.be-ubuntu:2.0.2
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
configMapInfo:
configMapName: be-configmap
resolveKey: be.conf
persistentVolumes:
- mountPath: /opt/apache-doris/be/storage
name: storage2
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/be/storage1
name: storage3
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
- mountPath: /opt/apache-doris/be/log
name: storage4
persistentVolumeClaimSpec:
# when use specific storageclass, the storageClassName should reConfig, example as annotation.
#storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi

Compared with the default example, the configuration of configMapInfo is added, and a configuration of persistentVolumeClaimSpec is also added, persistentVolumeClaimSpec fully follows the definition format of the Kubernetes native resource PVC spec. In the example, configMapInfo identifies which ConfigMap under the same namespace and which key corresponding content will be used as the configuration file after BE is deployed, where the key must be be.conf. The following is an example of the above doriscluster ConfigMap that needs to be pre-deployed:

apiVersion: v1
kind: ConfigMap
metadata:
name: be-configmap
labels:
app.kubernetes.io/component: be
data:
be.conf: |
CUR_DATE=`date +%Y%m%d-%H%M%S`

PPROF_TMPDIR="$DORIS_HOME/log/"

JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# For jdk 9+, this JAVA_OPTS will be used as default JVM options
JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"

# since 1.2, the JAVA_HOME need to be set to run BE process.
# JAVA_HOME=/path/to/jdk/

# https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile
# https://jemalloc.net/jemalloc.3.html
JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false"
JEMALLOC_PROF_PRFIX=""

# INFO, WARNING, ERROR, FATAL
sys_log_level = INFO

# ports for admin, web, heartbeat service
be_port = 9060
webserver_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060

storage_root_path = /opt/apache-doris/be/storage,medium:ssd;/opt/apache-doris/be/storage1,medium:ssd

When using multiple disks, the path in the corresponding value of storage_root_path in ConfigMap should correspond to each mounting path of persistentVolume in doriscluster. storage_root_path For the corresponding writing rules, please refer to the document in the link. When using cloud disks, the media is uniformly SSD.