Galera:多主同步MySQL集群原理解析

Galera Cluster 实现mysql群集

参考:
http://blog.itpub.net/14431099/viewspace-1316643/
http://lovestoned.blog.51cto.com/2904444/1617817
http://www.itbaofeng.com/?p=59
http://lansgg.blog.51cto.com/5675165/1180305
http://www.yunweipai.com/archives/19574.html

1. MySQL Galera介绍
  • MySQL/Galera是MySQL/InnoDB的多主集群,有以下特性:
  • 同步复制
  • Active-active的多主拓扑结构
  • 集群任意节点可以读和写
  • 自动身份控制,失败节点自动脱离集群
  • 自动节点接入
  • 真正的基于”行”级别和ID检查的并行复制
  • 客户端连接跟操作单台MySQL数据库的体验一致

总结下来,官网上给出了大概以上等数条优势,总结下来就是两个比较突出的点:多主和同步。
多主:
Galera Cluster没有MySQL主从集群只有一个主能提供写服务的限制,集群中每个节点都可读可写,无需读写分离。在一个Galera Cluster前直接部署HAProxy做读写负责均衡是比较常用的做法。
同步:
Galera replication具有实时性,能够保障不同节点的数据视图在较小的时间范围内是一致的。MySQL原生replication方案slave中的SQL线程和IO线程是分离的,即便使用半同步甚至同步复制,也可能因为SQL线程的速度跟不上IO线程而导致slave数据落后很多,当然5.7引入并行复制后会好很多,而Galera中除了具有并行复制的功能外,还具有flow control的功能来控制节点间数据同步的速度。

Galera本质是一个wsrep提供者(provider),运行依赖于wsrep的API接口。Wsrep API定义了一系列应用回调和复制调用库,来实现事务数据库同步写集(writeset)复制以及相似应用。目的在于从应用细节上实现抽象的,隔离的复制。虽然这个接口的主要目标是基于认证的多主复制,但同样适用于异步和同步的主从复制。

2. MySQL Galera安装

安装前准备

  1. 机器准备(虚拟机)
  • master 192.168.137.128
  • backup 192.168.137.129
  • joiner 192.168.137.130
  1. 安装依赖
  • 确认安装有gcc和gcc-c++的版本最少为4.4
1
yum install -y gcc gcc-c++
  • 确认安装有boost-devel的版本至少为1.4.1
1
yum install -y boost-devel
  • 安装scons check-devel openssl-devel
1
yum install -y scons check-devel openssl-devel
MySQL Galera安装(从网上拷贝了一份安装文档,里面的版本为5.5.29,所以这里暂时用5.5.29版本)
  1. 安装含wsrep Patch的MySQL 5.5.29

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    yum install libaio
    wget https://launchpad.net/codership-mysql/5.5/5.5.29-23.7.3/+download/mysql-5.5.29_wsrep_23.7.3-linux-x86_64.tar.gz
    tar zxvf mysql-5.5.29_wsrep_23.7.3-linux-x86_64.tar.gz
    mv mysql-5.5.29_wsrep_23.7.3-linux-x86_64 /usr/local/mysql
    cd /usr/local/mysql/
    groupadd mysql
    useradd -r -g mysql mysql
    chown -R mysql:mysql .
    ./scripts/mysql_install_db --no-defaults --datadir=/usr/local/mysql/data --user=mysql
    chown -R root .
    chown -R mysql data
  2. 安装Galera复制插件

    1
    2
    3
    4
    5
    6
    wget https://launchpad.net/galera/2.x/23.2.4/+download/galera-23.2.4-src.tar.gz
    tar zxvf galera-23.2.4-src.tar.gz
    cd galera-23.2.4-src
    scons
    cp garb/garbd /usr/local/mysql/bin/
    cp libgalera_smm.so /usr/local/mysql/lib/plugin/
MySQL Galera配置
  1. MySQL Galera配置例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld
mkdir -p /var/lib/mysql
chown mysql:mysql /var/lib/mysql
vi /etc/my.cnf
cat /etc/my.cnf

[client]
port = 3306
socket = /var/lib/mysql/mysql.sock

[mysqld_safe]
log-error = /var/lib/mysql/mysql.log
pid-file = /var/lib/mysql/mysql.pid

[mysqld]
wsrep_node_name = node1
wsrep_provider = /usr/local/mysql/lib/plugin/libgalera_smm.so
#wsrep_provider_options ='gcache.size=1G;socket.ssl_key=my_key;socket.ssl_cert=my_cert'
#wsrep_slave_threads=16
wsrep_sst_method = rsync
#wsrep_sst_auth=root:

port = 3306
socket = /var/lib/mysql/mysql.sock
user = mysql
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data

default_storage_engine=InnoDB
#innodb_buffer_pool_size=1G
#innodb_log_file_size=256M
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
innodb_flush_log_at_trx_commit=0
innodb_doublewrite=0
innodb_file_per_table=1

binlog_format=ROW
log-bin=mysql-bin
server-id=101
relay-log=mysql-relay-bin
#read_only=1
log-slave-updates=1

注:以上配置,参考与Mysql wsrep 参数

Mysql Galera启动与关闭
  1. 初次启动
1
2
3
4
5
6
/usr/local/mysql/bin/mysqld_safe --wsrep_cluster_address=gcomm:// >/dev/null &

# 或
service mysql start --wsrep_cluster_address=gcomm://

# ”gcomm://”是特殊的地址,仅仅是galera cluster初始化启动时候使用,再次启动的时候需要使用具体的IP地址.
  1. 启动完成后发现mysqld的监听端口有两个
    1
    2
    3
    [root@master ~]# netstat -plantu | grep mysqld
    tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 3656/mysqld
    tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 3656/mysqld

这里4567端口是wsrep 使用的默认端口。

MySQL Galera新节点
  1. 节点接入
    添加新节点的时候,新接入的节点叫Joiner,给joiner提供复制的节点叫Donor.新的节点接入需要:
  • 安装带wsrep patch的MySQL版本
  • 安装Galera复制插件
  • 配置好新节点的MySQL(参考Donnor的my.cnf)
  • 配置或启动的gcomm://的地址是需要使用donnor的IP.

接入节点backup:

1
/usr/local/mysql/bin/mysqld_safe --wsrep_cluster_address="gcomm://192.168.137.128:4567,192.168.137.130:4567"    >/dev/null &

接入节点joiner:

1
service mysql start --wsrep_cluster_address="gcomm://192.168.137.128:4567,192.168.137.129:4567"

  1. 修改节点的wsrep_cluster_address
    修改wsrep_cluster_address有两种方式:

i. 使用新的wsrep_cluster_address重启节点:

1
service mysql restart --wsrep_cluster_address="gcomm://192.168.137.129:4567,192.168.137.130:4567"

ii. 直接修改MySQL全局变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
mysql> SHOW VARIABLES LIKE 'wsrep_cluster_address';
+-----------------------+----------------------------+
| Variable_name | Value |
+-----------------------+----------------------------+
| wsrep_cluster_address | gcomm://192.168.137.129:4567 |
+-----------------------+----------------------------+
1 row in set (0.00 sec)

mysql> set global wsrep_cluster_address="gcomm://192.168.137.129:4567,192.168.137.130:4567"; Query OK, 0 rows affected (2.20 sec)

mysql> SHOW VARIABLES LIKE 'wsrep_cluster_address';
+-----------------------+-------------------------------------------------------+
| Variable_name | Value |
+-----------------------+-------------------------------------------------------+
| wsrep_cluster_address | gcomm://192.168.137.129:4567,192.168.137.130:4567 |
+-----------------------+-------------------------------------------------------+
1 row in set (0.00 sec)
MySQL Galera监控
  1. 查看相关变量
  • 查看MySQL版本:

    1
    2
    3
    4
    5
    6
    7
    mysql> SHOW GLOBAL VARIABLES LIKE 'version';
    +---------------+------------+
    | Variable_name | Value |
    +---------------+------------+
    | version | 5.5.29-log |
    +---------------+------------+
    1 row in set (0.00 sec)
  • 查看wsrep版本:

    1
    2
    3
    4
    5
    6
    7
    mysql> SHOW GLOBAL STATUS LIKE 'wsrep_provider_version';
    +------------------------+------------+
    | Variable_name | Value |
    +------------------------+------------+
    | wsrep_provider_version | 2.4(rXXXX) |
    +------------------------+------------+
    1 row in set (0.00 sec)

查看wsrep有关的所有变量:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
mysql> SHOW VARIABLES LIKE 'wsrep%' \G
*************************** 1. row ***************************
Variable_name: wsrep_OSU_method
Value: TOI
*************************** 2. row ***************************
Variable_name: wsrep_auto_increment_control
Value: ON
*************************** 3. row ***************************
Variable_name: wsrep_causal_reads
Value: OFF
*************************** 4. row ***************************
Variable_name: wsrep_certify_nonPK
Value: ON
*************************** 5. row ***************************
Variable_name: wsrep_cluster_address
Value: gcomm://192.168.1.222:4567,192.168.1.223:4567
*************************** 6. row ***************************
Variable_name: wsrep_cluster_name
Value: my_wsrep_cluster
*************************** 7. row ***************************
Variable_name: wsrep_convert_LOCK_to_trx
Value: OFF
*************************** 8. row ***************************
Variable_name: wsrep_data_home_dir
Value: /usr/local/mysql/data/
*************************** 9. row ***************************
Variable_name: wsrep_dbug_option
Value:
*************************** 10. row ***************************
Variable_name: wsrep_debug
Value: OFF
*************************** 11. row ***************************
Variable_name: wsrep_drupal_282555_workaround
Value: OFF
*************************** 12. row ***************************
Variable_name: wsrep_forced_binlog_format
Value: NONE
*************************** 13. row ***************************
Variable_name: wsrep_log_conflicts
Value: OFF
*************************** 14. row ***************************
Variable_name: wsrep_max_ws_rows
Value: 131072
*************************** 15. row ***************************
Variable_name: wsrep_max_ws_size
Value: 1073741824
*************************** 16. row ***************************
Variable_name: wsrep_mysql_replication_bundle
Value: 0
*************************** 17. row ***************************
Variable_name: wsrep_node_address
Value:
*************************** 18. row ***************************
Variable_name: wsrep_node_incoming_address
Value: AUTO
*************************** 19. row ***************************
Variable_name: wsrep_node_name
Value: node1
*************************** 20. row ***************************
Variable_name: wsrep_notify_cmd
Value:
*************************** 21. row ***************************
Variable_name: wsrep_on
Value: ON
*************************** 22. row ***************************
Variable_name: wsrep_provider
Value: /usr/local/mysql/lib/plugin/libgalera_smm.so
*************************** 23. row ***************************
Variable_name: wsrep_provider_options
Value: base_host = 192.168.1.221; base_port = 4567; cert.log_conflicts = no; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = PT5M; gcache.dir = /usr/local/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /usr/local/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.1.221; pc.checksum = true; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; pc.weight = 1; protonet.backend = asio; protonet.version = 0; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
*************************** 24. row ***************************
Variable_name: wsrep_recover
Value: OFF
*************************** 25. row ***************************
Variable_name: wsrep_replicate_myisam
Value: OFF
*************************** 26. row ***************************
Variable_name: wsrep_retry_autocommit
Value: 1
*************************** 27. row ***************************
Variable_name: wsrep_slave_threads
Value: 2
*************************** 28. row ***************************
Variable_name: wsrep_sst_auth
Value:
*************************** 29. row ***************************
Variable_name: wsrep_sst_donor
Value:
*************************** 30. row ***************************
Variable_name: wsrep_sst_donor_rejects_queries
Value: OFF
*************************** 31. row ***************************
Variable_name: wsrep_sst_method
Value: rsync
*************************** 32. row ***************************
Variable_name: wsrep_sst_receive_address
Value: AUTO
*************************** 33. row ***************************
Variable_name: wsrep_start_position
Value: 80cdd13d-8cf2-11e2-0800-e0817023b754:0
33 rows in set (0.00 sec)

  1. 状态监控
    查看Galera集群状态:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    mysql> show status like 'wsrep%';
    +----------------------------+----------------------------------------------------------+
    | Variable_name | Value |
    +----------------------------+----------------------------------------------------------+
    | wsrep_local_state_uuid | 80cdd13d-8cf2-11e2-0800-e0817023b754 |
    | wsrep_protocol_version | 4 |
    | wsrep_last_committed | 3 |
    | wsrep_replicated | 3 |
    | wsrep_replicated_bytes | 522 |
    | wsrep_received | 6 |
    | wsrep_received_bytes | 1134 |
    | wsrep_local_commits | 1 |
    | wsrep_local_cert_failures | 0 |
    | wsrep_local_bf_aborts | 0 |
    | wsrep_local_replays | 0 |
    | wsrep_local_send_queue | 0 |
    | wsrep_local_send_queue_avg | 0.000000 |
    | wsrep_local_recv_queue | 0 |
    | wsrep_local_recv_queue_avg | 0.000000 |
    | wsrep_flow_control_paused | 0.000000 |
    | wsrep_flow_control_sent | 0 |
    | wsrep_flow_control_recv | 0 |
    | wsrep_cert_deps_distance | 1.000000 |
    | wsrep_apply_oooe | 0.000000 |
    | wsrep_apply_oool | 0.000000 |
    | wsrep_apply_window | 1.000000 |
    | wsrep_commit_oooe | 0.000000 |
    | wsrep_commit_oool | 0.000000 |
    | wsrep_commit_window | 1.000000 |
    | wsrep_local_state | 4 |
    | wsrep_local_state_comment | Synced |
    | wsrep_cert_index_size | 5 |
    | wsrep_causal_reads | 0 |
    | wsrep_incoming_addresses | 192.168.1.221:3306,192.168.1.222:3306,192.168.1.223:3306 |
    | wsrep_cluster_conf_id | 13 |
    | wsrep_cluster_size | 3 |
    | wsrep_cluster_state_uuid | 80cdd13d-8cf2-11e2-0800-e0817023b754 |
    | wsrep_cluster_status | Primary |
    | wsrep_connected | ON |
    | wsrep_local_index | 0 |
    | wsrep_provider_name | Galera |
    | wsrep_provider_vendor | Codership Oy |
    | wsrep_provider_version | 2.4(rXXXX) |
    | wsrep_ready | ON |
    +----------------------------+----------------------------------------------------------+
    40 rows in set (0.00 sec)
  2. 监控状态说明

  • 集群完整性检查:

    wsrep_cluster_state_uuid:在集群所有节点的值应该是相同的,有不同值的节点,说明其没有连接入集群.
    wsrep_cluster_conf_id:正常情况下所有节点上该值是一样的.如果值不同,说明该节点被临时”分区”了.当节点之间网络连接恢复的时候应该会恢复一样的值.
    wsrep_cluster_size:如果这个值跟预期的节点数一致,则所有的集群节点已经连接.
    wsrep_cluster_status:集群组成的状态.如果不为”Primary”,说明出现”分区”或是”split-brain”状况.
    
  • 节点状态检查:

    wsrep_ready: 该值为ON,则说明可以接受SQL负载.如果为Off,则需要检查wsrep_connected.
    wsrep_connected: 如果该值为Off,且wsrep_ready的值也为Off,则说明该节点没有连接到集群.(可能是wsrep_cluster_address或wsrep_cluster_name等配置错造成的.具体错误需要查看错误日志)
    wsrep_local_state_comment:如果wsrep_connected为On,但wsrep_ready为OFF,则可以从该项查看原因.
    
  • 复制健康检查:

    wsrep_flow_control_paused:表示复制停止了多长时间.即表明集群因为Slave延迟而慢的程度.值为0~1,越靠近0越好,值为1表示复制完全停止.可优化wsrep_slave_threads的值来改善.
    wsrep_cert_deps_distance:有多少事务可以并行应用处理.wsrep_slave_threads设置的值不应该高出该值太多.
    wsrep_flow_control_sent:表示该节点已经停止复制了多少次.
    wsrep_local_recv_queue_avg:表示slave事务队列的平均长度.slave瓶颈的预兆.
    最慢的节点的wsrep_flow_control_sent和wsrep_local_recv_queue_avg这两个值最高.这两个值较低的话,相对更好.
    
  • 检测慢网络问题:

    wsrep_local_send_queue_avg:网络瓶颈的预兆.如果这个值比较高的话,可能存在网络瓶
    
  • 冲突或死锁的数目:

    wsrep_last_committed:最后提交的事务数目
    wsrep_local_cert_failures和wsrep_local_bf_aborts:回滚,检测到的冲突数目