Load Balancing
Users connect to Doris through FE's query port (query_port
, default 9030) using the MySQL protocol. When deploying multiple FE nodes, users can deploy a load balancing layer on top of multiple FEs to achieve high availability for Doris queries.
This document introduces various load balancing solutions suitable for Doris and explains how to implement client IP passthrough using the Proxy Protocol.
Load Balancingβ
This article uses the following three FE nodes as examples for demonstrating the steps:
192.168.1.101:9030
192.168.1.102:9030
192.168.1.103:9030
Proxy server node:
192.168.1.100
01 JDBC URLβ
Use the built-in load balancing configuration in JDBC URL.
jdbc:mysql:loadbalance://192.168.1.101:9030,192.168.1.102:9030,192.168.1.103:9030/test_db
For details, please refer to MySQL Official Documentation
02 Nginxβ
Use Nginx TCP reverse proxy to implement Doris load balancing.
Install Nginxβ
Please refer to Nginx official website to install Nginx correctly. Here we demonstrate the Nginx compilation and installation steps using Ubuntu system with Nginx 1.18.0 version as an example.
-
Install compilation dependencies
sudo apt-get install build-essential
sudo apt-get install libpcre3 libpcre3-dev
sudo apt-get install zlib1g-dev
sudo apt-get install openssl libssl-dev -
Install Nginx
sudo wget http://nginx.org/download/nginx-1.18.0.tar.gz
sudo tar zxvf nginx-1.18.0.tar.gz
cd nginx-1.18.0
sudo ./configure --prefix=/usr/local/nginx --with-stream --with-http_ssl_module --with-http_gzip_static_module --with-http_stub_status_module
sudo make && make install
Configure Reverse Proxyβ
Create a new configuration file:
vim /usr/local/nginx/conf/default.conf
Content as follows:
events {
worker_connections 1024;
}
stream {
upstream mysqld {
hash $remote_addr consistent;
server 192.168.1.101:9030 weight=1 max_fails=2 fail_timeout=60s;
server 192.168.1.102:9030 weight=1 max_fails=2 fail_timeout=60s;
server 192.168.1.103:9030 weight=1 max_fails=2 fail_timeout=60s;
}
server {
# Proxy port
listen 6030;
proxy_connect_timeout 300s;
proxy_timeout 300s;
proxy_pass mysqld;
}
}
Start Nginxβ
Start with specified configuration file:
cd /usr/local/nginx
/usr/local/nginx/sbin/nginx -c conf.d/default.conf
Verifyβ
Connect using the proxy port:
mysql -uroot -P6030 -h192.168.1.100
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| test |
+--------------------+
2 rows in set (0.00 sec)
03 HAProxyβ
HAProxy is a high-performance TCP/HTTP load balancer written in C language.
Install HAProxyβ
-
Download HAProxy
Download link: https://src.fedoraproject.org/repo/pkgs/haproxy/
-
Extract
tar -zxvf haproxy-2.6.15.tar.gz -C /opt/
mv haproxy-2.6.15 haproxy
cd haproxy -
Compile
yum install gcc gcc-c++ -y
make TARGET=linux-glibc PREFIX=/usr/local/haproxy
make install PREFIX=/usr/local/haproxy
Configure HAProxyβ
-
Configure haproxy.conf file
Open configuration file:
vim /etc/rsyslog.d/haproxy.conf
Content as follows:
$ModLoad imudp
$UDPServerRun 514
local0.* /usr/local/haproxy/logs/haproxy.log
&~ -
Enable remote logging
vim /etc/sysconfig/rsyslog
Add content:
SYSLOGD_OPTIONS="-c 2 -r -m 0"
Parameter description:
-c 2
: Use compatibility mode, default is-c 5
.-r
: Enable remote logging.-m 0
: Mark timestamps. In minutes, when 0, indicates this feature is disabled.
Apply changes:
systemctl restart rsyslog
-
Edit load balancing file
vim /usr/local/haproxy/haproxy.cfg
global
maxconn 2000
ulimit-n 40075
log 127.0.0.1 local0 info
uid 200
gid 200
chroot /var/empty
daemon
group haproxy
user haproxy
defaults
log global
mode http
retries 3
option redispatch
timeout connect 5000
timeout client 5000
timeout server 5000
timeout check 2000
frontend agent-front
bind *:6030
mode tcp
default_backend forward-fe
backend forward-fe
mode tcp
balance roundrobin
server fe-1 192.168.1.101:9030 weight 1 check inter 3000 rise 2 fall 3
server fe-2 192.168.1.102:9030 weight 1 check inter 3000 rise 2 fall 3
server fe-3 192.168.1.103:9030 weight 1 check inter 3000 rise 2 fall 3
Start HAProxyβ
-
Start service
/opt/haproxy/haproxy -f /usr/local/haproxy/haproxy.cfg
-
Check service status
netstat -lnatp | grep -i haproxy
Verifyβ
mysql -h 192.168.1.100 -uroot -P6030 -p
04 ProxySQLβ
ProxySQL is an open-source MySQL database proxy software written in C language. It can implement connection management, read-write splitting, load balancing, failover, and other functions. It has advantages such as high performance, configurability, and dynamic management, and is commonly used in Web services, big data platforms, cloud databases, and other scenarios.
Install ProxySQLβ
Please refer to the official documentation to install ProxySQL correctly.
Configure ProxySQLβ
ProxySQL includes configuration file /etc/proxysql.cnf
and configuration database file /var/lib/proxysql/proxysql.db
.
Special attention should be paid that if there is a "proxysql.db" file in the /var/lib/proxysql
directory, the ProxySQL service only reads and parses proxysql.cnf
during the first startup, and subsequent startups will no longer read it.
To make proxysql.cnf
configuration take effect after restart, you need to delete /var/lib/proxysql/proxysql.db
before restarting the service, which is equivalent to initialization startup and will generate a new proxysql.db
file, and the original configuration rules will be cleared.
Here is the main content of the configuration file proxysql.cnf
:
datadir="/var/lib/proxysql" #Data directory
admin_variables=
{
admin_credentials="admin:admin" # Admin database username and password.
mysql_ifaces="0.0.0.0:6032" # Admin database port, used for connecting admin database of ProxySQL
}
mysql_variables=
{
threads=4
max_connections=2048
default_query_delay=0
default_query_timeout=36000000
have_compress=true
poll_timeout=2000
interfaces="0.0.0.0:6030"
default_schema="information_schema"
stacksize=1048576
server_version="5.7.99"
connect_timeout_server=3000
monitor_username="monitor"
monitor_password="monitor"
monitor_history=600000
monitor_connect_interval=60000
monitor_ping_interval=10000
monitor_read_only_interval=1500
monitor_read_only_timeout=500
ping_interval_server_msec=120000
ping_timeout_server=500
commands_stats=true
sessions_sort=true
connect_retries_on_failure=10
}
mysql_servers =
(
)
mysql_users:
(
)
mysql_query_rules:
(
)
scheduler=
(
)
mysql_replication_hostgroups=
(
)
Connect to ProxySQL Admin Databaseβ
mysql -uadmin -padmin -P6032 -hdoris01
ProxySQL > show databases;
+-----+---------------+-------------------------------------+
| seq | name | file |
+-----+---------------+-------------------------------------+
| 0 | main | |
| 2 | disk | /var/lib/proxysql/proxysql.db |
| 3 | stats | |
| 4 | monitor | |
| 5 | stats_history | /var/lib/proxysql/proxysql_stats.db |
+-----+---------------+-------------------------------------+
5 rows in set (0.000 sec)
ProxySQL > use main;
ProxySQL > show tables;
+--------------------------------------------+
| tables |
+--------------------------------------------+
| global_variables |
| mysql_collations |
| mysql_group_replication_hostgroups |
| mysql_query_rules |
| mysql_query_rules_fast_routing |
| mysql_replication_hostgroups |
| mysql_servers |
| mysql_users |
| proxysql_servers |
| runtime_checksums_values |
| runtime_global_variables |
| runtime_mysql_group_replication_hostgroups |
| runtime_mysql_query_rules |
| runtime_mysql_query_rules_fast_routing |
| runtime_mysql_replication_hostgroups |
| runtime_mysql_servers |
| runtime_mysql_users |
| runtime_proxysql_servers |
| runtime_scheduler |
| scheduler |
+--------------------------------------------+
20 rows in set (0.000 sec)
Configure Backend Doris FE in ProxySQLβ
Use INSERT statements to add the FE nodes and ports that need to be proxied to the mysql_servers
table.
Where: hostgroup_id
of 10
indicates write group, 20
indicates read group. We don't need read-write splitting here, so it can be set arbitrarily.
mysql -uadmin -padmin -P6032 -h127.0.0.1
ProxySQL > insert into mysql_servers(hostgroup_id,hostname,port) values(10,'192.168.0.101',9030);
Query OK, 1 row affected (0.000 sec)
ProxySQL > insert into mysql_servers(hostgroup_id,hostname,port) values(10,'192.168.0.102',9030);
Query OK, 1 row affected (0.000 sec)
ProxySQL > insert into mysql_servers(hostgroup_id,hostname,port) values(10,'192.168.0.103',9030);
Query OK, 1 row affected (0.000 sec)
Check results:
ProxySQL > select hostgroup_id,hostname,port,status,weight from mysql_servers;
+--------------+---------------+------+--------+--------+
| hostgroup_id | hostname | port | status | weight |
+--------------+---------------+------+--------+--------+
| 10 | 192.168.0.101 | 9030 | ONLINE | 1 |
| 20 | 192.168.0.102 | 9030 | ONLINE | 1 |
| 20 | 192.168.0.103 | 9030 | ONLINE | 1 |
+--------------+---------------+------+--------+--------+
3 rows in set (0.000 sec)
If you encounter an error during insertion:
ERROR 1045 (#2800): UNIQUE constraint failed: mysql_servers.hostgroup_id, mysql_servers.hostname, mysql_servers.port
This indicates that other configurations may have been defined previously. You can empty this table or delete the configuration for the corresponding host:
ProxySQL > select * from mysql_servers;
ProxySQL > delete from mysql_servers;
Query OK, 6 rows affected (0.000 sec)
Save information:
ProxySQL > load mysql servers to runtime;
Query OK, 0 rows affected (0.006 sec)
ProxySQL > save mysql servers to disk;
Query OK, 0 rows affected (0.348 sec)
Configure Monitoring for Doris FE Nodesβ
After adding Doris FE nodes, these backend nodes need to be monitored.
First, create a user for monitoring in Doris:
mysql -uroot -P9030 -h192.168.0.101
Doris > create user monitor@'192.168.0.100' identified by 'P@ssword1!';
Query OK, 0 rows affected (0.03 sec)
Doris > grant ADMIN_PRIV on *.* to monitor@'192.168.0.100';
Query OK, 0 rows affected (0.02 sec)
Then go back to the mysql-proxy proxy layer node to configure monitoring
mysql -uadmin -padmin -P6032 -h127.0.0.1
ProxySQL > set mysql-monitor_username='monitor';
Query OK, 1 row affected (0.000 sec)
ProxySQL > set mysql-monitor_password='P@ssword1!';
Query OK, 1 row affected (0.000 sec)
Save configuration and exit:
ProxySQL > load mysql servers to runtime;
Query OK, 0 rows affected (0.006 sec)
ProxySQL > save mysql servers to disk;
Query OK, 0 rows affected (0.348 sec)
Verification monitoring results.
The metrics of the ProxySQL monitoring module are all saved in the monitor.log
table.
Connection monitoring:
ProxySQL > select * from mysql_server_connect_log;
+---------------+------+------------------+-------------------------+---------------+
| hostname | port | time_start_us | connect_success_time_us | connect_error |
+---------------+------+------------------+-------------------------+---------------+
| 192.168.0.101 | 9030 | 1548665195883957 | 762 | NULL |
| 192.168.0.102 | 9030 | 1548665195894099 | 399 | NULL |
| 192.168.0.103 | 9030 | 1548665195904266 | 483 | NULL |
| 192.168.0.101 | 9030 | 1548665255883715 | 824 | NULL |
| 192.168.0.102 | 9030 | 1548665255893942 | 656 | NULL |
| 192.168.0.101 | 9030 | 1548665495884125 | 615 | NULL |
| 192.168.0.102 | 9030 | 1548665495894254 | 441 | NULL |
| 192.168.0.103 | 9030 | 1548665495904479 | 638 | NULL |
| 192.168.0.101 | 9030 | 1548665512917846 | 487 | NULL |
| 192.168.0.102 | 9030 | 1548665512928071 | 994 | NULL |
| 192.168.0.103 | 9030 | 1548665512938268 | 613 | NULL |
+---------------+------+------------------+-------------------------+---------------+
20 rows in set (0.000 sec)
Heartbeat monitoring:
ProxySQL > select * from mysql_server_ping_log;
+---------------+------+------------------+----------------------+------------+
| hostname | port | time_start_us | ping_success_time_us | ping_error |
+---------------+------+------------------+----------------------+------------+
| 192.168.0.101 | 9030 | 1548665195883407 | 98 | NULL |
| 192.168.0.102 | 9030 | 1548665195885128 | 119 | NULL |
...........
| 192.168.0.102 | 9030 | 1548665415889362 | 106 | NULL |
| 192.168.0.103 | 9030 | 1548665562898295 | 97 | NULL |
+---------------+------+------------------+----------------------+------------+
110 rows in set (0.001 sec)
Client IP Passthroughβ
In most cases, when connecting to the backend Doris service through a proxy service, client IP information will be lost, and the Doris server can only obtain the IP address information of the proxy server.
Starting from version 2.1.1, Doris supports the Proxy Protocol protocol. Using this protocol, client IP passthrough can be implemented, so that after going through load balancing, Doris can still obtain the client's real IP to implement whitelist and other permission controls.
Below we introduce how to enable Proxy Protocol in Nginx and Haproxy respectively.
Enable Proxy Protocol Support in Dorisβ
Add to fe.conf
in FE:
enable_proxy_protocol = true
-
Only supports Proxy Protocol V1.
-
Only supports and affects MySQL protocol ports, does not support or affect HTTP, ADBC, and other protocol ports.
-
Before Doris 3.1 version, after enabling, you must use the Proxy Protocol protocol to connect, otherwise the connection will fail. Starting from version 3.1, after enabling Proxy Protocol, you can still connect using the standard MySQL connection protocol.
01 Nginxβ
Add proxy_protocol on;
to the server
section in the configuration file:
events {
worker_connections 1024;
}
stream {
upstream mysqld {
hash $remote_addr consistent;
server 192.168.1.101:9030 weight=1 max_fails=2 fail_timeout=60s;
server 192.168.1.102:9030 weight=1 max_fails=2 fail_timeout=60s;
server 192.168.1.103:9030 weight=1 max_fails=2 fail_timeout=60s;
}
server {
# Proxy port
listen 6030;
proxy_connect_timeout 300s;
proxy_timeout 300s;
proxy_pass mysqld;
# Enable Proxy Protocol to the upstream server
proxy_protocol on;
}
}
02 HAProxyβ
Add send-proxy
parameter to the backend
section in haproxy.cfg
:
backend forward-fe
mode tcp
balance roundrobin
server fe-1 192.168.1.101:9030 weight 1 check inter 3000 rise 2 fall 3 send-proxy
server fe-2 192.168.1.102:9030 weight 1 check inter 3000 rise 2 fall 3 send-proxy
server fe-3 192.168.1.103:9030 weight 1 check inter 3000 rise 2 fall 3 send-proxy
Verify IP Passthrough Successβ
Connect to Doris through proxy:
mysql -uroot -P6030 -h192.168.1.100
Verify
mysql> show processlist;
+------------------+------+------+-------------------+---------------------+----------+------+---------+------+-------+-----------------------------------+------------------+
| CurrentConnected | Id | User | Host | LoginTime | Catalog | Db | Command | Time | State | QueryId | Info |
+------------------+------+------+-------------------+---------------------+----------+------+---------+------+-------+-----------------------------------+------------------+
| Yes | 1 | root | 192.168.1.101:34390 | 2024-03-17 16:32:22 | internal | | Query | 0 | OK | 82edc460d93f4e28-8bbed058a068e259 | show processlist |
+------------------+------+------+-------------------+---------------------+----------+------+---------+------+-------+-----------------------------------+------------------+
1 row in set (0.00 sec)
If you see the real client IP in the Host
column, the verification is successful. Otherwise, you can only see the IP address of the proxy service.
At the same time, the real client IP will also be recorded in fe.audit.log.