目录
  1. 1. 基础概念
    1. 1.1. Slot
    2. 1.2. Gossip协议
    3. 1.3. Failover机制
  2. 2. 安装
    1. 2.1. 系统参数优化
    2. 2.2. 依赖
    3. 2.3. ruby
    4. 2.4. rubygems
    5. 2.5. redis.gem
    6. 2.6. tcl
    7. 2.7. redis-4.0.11
  3. 3. 配置
  4. 4. 集群
    1. 4.1. 启动redis实例
    2. 4.2. 创建redis cluster
  5. 5. 常用管理命令
    1. 5.1. cluster cmd
    2. 5.2. redis-trib.rb cmd
  6. 6. Redis 工具
  7. 7. RedisCluster 测试
  8. 8. 标准化规范
  9. 9. 附录

基础概念

Slot

RedisCluster将key分成16384个slot。key和slot的映射关系如下: (CRC16是一种冗余码校验和,可将字符串转换成16位的数字)

1
slot = CRC16(key) mod 16384

因为有slot的存在,所以multi-key操作(scan/mget/keys…)无法支持。为此RedisCluster提供hash tags,用于确保多个key能被分配到同一个slot中,并可支持multi-key操作。但hash tags容易引起数据倾斜,需谨慎。直接在key前加入{hash_tag}即可,如:{foo}student

Gossip协议

RedisCluster通过gossip协议,实现集群间状态同步更新、选举自助failover等重要的集群功能。

gossip协议包含多种消息:(RedisCluster使用redis_port+10000端口作为node间通信端口)

  • meet: 某个节点发送meet给新加入的节点,让新节点加入集群中,然后新节点就会开始与其他节点进行通信
  • ping: 每个节点都会频繁给其他节点发送ping,其中包含自己的状态还有自己维护的集群元数据,互相通过ping交换元数据
  • pong: 应答ping和meet,包含自己的状态和其他信息,也可以用于信息广播和更新
  • fail: 某个节点判断另一个节点fail之后,就发送fail给其他节点,通知其他节点,指定的节点宕机了s

Failover机制

failover是RedisCluster提供的容错机制,failover支持两种方式:

  • 故障failover: 自动恢复集群可用性
  • 人为failover: 手动操作恢复集群可用性

fail探测:

  • node在node timeout时间内没有响应PING请求,则被标记为 PFAIL
  • PFAIL标记随着gossip传播
  • 过半node都标记PFAIL,则更改node状态为FAIL并广播消息

故障failover过程:

  • slave探测到master为FAIL
  • slave将记录的currentEpoch + 1,并广播Failover Request消息
  • 所有node接收到广播消息,只有master能响应,判断合法性,合法则发送FAILOVER_AUTH_ACK
  • slave收集FAILOVER_AUTH_ACK消息,过半同意则升级为master
  • 成功升级为master后通过PONG消息通知所有node

安装

系统参数优化

1
2
vm.overcommit_memory=1
echo never > /sys/kernel/mm/transparent_hugepage/enabled

依赖

1
yum -y install zlib zlib-devel openssl openssl-devel gcc gcc-c++

ruby

1
2
3
4
5
6
7
wget -O ruby-2.4.4.tar.gz 'https://cache.ruby-lang.org/pub/ruby/2.4/ruby-2.4.4.tar.gz' --no-check-certificate
tar -zxf ruby-2.4.4.tar.gz && cd ruby-2.4.4
./configure --prefix=/usr/local/ruby
make && make install

echo 'export PATH=/usr/local/redis-cluster/bin:/usr/local/tcl/bin:/usr/local/rubygems/bin:/usr/local/ruby/bin:$PATH' >> /etc/profile
source /etc/profile
1
2
ruby -v
ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux]

rubygems

1
2
3
4
wget -O rubygems-2.7.6.tgz 'https://rubygems.org/rubygems/rubygems-2.7.6.tgz' --no-check-certificate
tar -zxf rubygems-2.7.6.tgz
mv rubygems-2.7.6 /usr/local/rubygems
cd /usr/local/rubygems && ruby setup.rb
1
2
gem -v
2.7.6

redis.gem

redis-4.x.x.gem 有bug!不能进行扩容/缩容reshard|rebalance, 报错Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY)

安装redis-3.x.x.gem版本可调用reshard|rebalance但不支持密码,不适用于有密码验证的集群

  • 4.x.x 版本

    1
    2
    wget -O redis-4.0.2.gem 'https://rubygems.org/downloads/redis-4.0.2.gem' --no-check-certificate
    gem install -l redis-4.0.2.gem
    1
    2
    3
    4
    5
    gem list redis

    *** LOCAL GEMS ***

    redis (4.0.2)
  • 3.x.x 版本

    1
    2
    3
    # gem uninstall redis --version 4.0.2
    wget -O redis-3.3.3.gem 'https://rubygems.org/downloads/redis-3.3.3.gem' --no-check-certificate
    gem install -l redis-3.3.3.gem

tcl

1
2
3
4
5
wget -O tcl868-src.zip 'https://jaist.dl.sourceforge.net/project/tcl/Tcl/8.6.8/tcl868-src.zip' --no-check-certificate
unzip tcl868-src.zip && cd tcl8.6.8/unix
./configure --prefix=/usr/local/tcl
make && make install && make install-private-headers
ln -s /usr/local/tcl/bin/tclsh8.6 /usr/local/tcl/bin/tclsh

redis-4.0.11

1
2
3
4
5
wget -O redis-4.0.11.tar.gz 'http://download.redis.io/releases/redis-4.0.11.tar.gz'
tar -zxf redis-4.0.11.tar.gz && cd redis-4.0.11
make && make PREFIX=/usr/local/redis-cluster install
cp src/redis-trib.rb /usr/local/redis-cluster/bin
chown -R redis:redis /usr/local/redis-cluster

配置

1
su - redis -c "mkdir -p /usr/local/redis-cluster/conf /data/logs/redis/ /data/redis/{8000,8001,8002}"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# /usr/local/redis-cluster/redis.conf
bind 0.0.0.0
protected-mode no
port 6379
tcp-backlog 511
timeout 300
tcp-keepalive 300
daemonize yes
supervised no
pidfile nodes.pid
loglevel notice
logfile nodes.log
databases 16
always-show-logo yes
save ""
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir ./
masterauth "<password>"
requirepass "<password>"
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-timeout 120
repl-disable-tcp-nodelay no
repl-backlog-size 100mb
slave-priority 100
rename-command flushdb rflushdb
rename-command flushall rflushall
rename-command keys rkeys
rename-command shutdown rshutdown
rename-command config rconfig
rename-command slaveof rslaveof
rename-command sync rsync
rename-command monitor rmonitor
rename-command save rsave
rename-command bgsave rbgsave
rename-command bgrewriteaof rbgrewriteaof
maxmemory 5gb
#maxclients 10000
maxmemory-policy volatile-lru
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
slave-lazy-flush no
appendonly yes
appendfilename "appendonly.aof"
appendfsync no
no-appendfsync-on-rewrite yes
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 1gb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 15000
cluster-slave-validity-factor 0
cluster-migration-barrier 1
cluster-require-full-coverage no
slowlog-log-slower-than 10000
slowlog-max-len 10000
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
1
2
3
4
5
# /usr/local/redis-cluster/conf/8000.conf
include /usr/local/redis-cluster/redis.conf
port 8000
dir "/data/redis/8000"
logfile "/data/logs/redis/8000.log"
1
2
3
4
5
# /usr/local/redis-cluster/conf/8001.conf
include /usr/local/redis-cluster/redis.conf
port 8001
dir "/data/redis/8001"
logfile "/data/logs/redis/redis_8001.log"

集群

启动redis实例

1
2
3
4
su - redis -c "/usr/local/redis-cluster/bin/redis-server /usr/local/redis-cluster/conf/8000.conf"
su - redis -c "/usr/local/redis-cluster/bin/redis-server /usr/local/redis-cluster/conf/8001.conf"
su - redis -c "/usr/local/redis-cluster/bin/redis-server /usr/local/redis-cluster/conf/8002.conf"
su - redis -c "/usr/local/redis-cluster/bin/redis-server /usr/local/redis-cluster/conf/8003.conf"

创建redis cluster

  • iptables

    1
    2
    iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 8000:8006 -j ACCEPT
    iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 18000:18006 -j ACCEPT
  • 修改密码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    # 修改:password
    /usr/local/ruby/lib/ruby/gems/2.4.0/gems/redis-4.0.2/lib/redis/client.rb
    #/usr/local/ruby/lib/ruby/gems/2.4.0/gems/redis-3.3.3/lib/redis/client.rb

    DEFAULTS = {
    :url => lambda { ENV["REDIS_URL"] },
    :scheme => "redis",
    :host => "127.0.0.1",
    :port => 6379,
    :path => nil,
    :timeout => 5.0,
    :password => nil, #修改此处
    :db => 0,
    :driver => nil,
    :id => nil,
    :tcp_keepalive => 0,
    :reconnect_attempts => 1,
    :inherit_socket => false
    }
  • 创建集群

    • 自动分配主从角色

      1
      redis-trib.rb create --replicas 1 192.168.1.180:8000 192.168.1.180:8001 192.168.1.180:8002 192.168.1.181:8003 192.168.1.181:8004 192.168.1.181:8005
    • 手动分配主从角色

      • 创建master

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        redis-trib.rb create 192.168.1.180:8000 192.168.1.180:8001 192.168.1.180:8002

        >>> Creating cluster
        >>> Performing hash slots allocation on 3 nodes...
        Using 3 masters:
        192.168.1.180:8000
        192.168.1.180:8001
        192.168.1.180:8002
        M: 673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.180:8000
        slots:0-5460 (5461 slots) master
        M: 605950ea5c2214f50d5f3dddd87c80f4e7d1b631 192.168.1.180:8001
        slots:5461-10922 (5462 slots) master
        M: a923e10183ef356bcadc1566503be4ab1ea1adb6 192.168.1.180:8002
        slots:10923-16383 (5461 slots) master
        Can I set the above configuration? (type 'yes' to accept): yes
        >>> Nodes configuration updated
        >>> Assign a different config epoch to each node
        >>> Sending CLUSTER MEET messages to join the cluster
        Waiting for the cluster to join.
        >>> Performing Cluster Check (using node 192.168.1.180:8000)
        M: 673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.180:8000
        slots:0-5460 (5461 slots) master
        0 additional replica(s)
        M: 605950ea5c2214f50d5f3dddd87c80f4e7d1b631 192.168.1.180:8001
        slots:5461-10922 (5462 slots) master
        0 additional replica(s)
        M: a923e10183ef356bcadc1566503be4ab1ea1adb6 192.168.1.180:8002
        slots:10923-16383 (5461 slots) master
        0 additional replica(s)
        [OK] All nodes agree about slots configuration.
        >>> Check for open slots...
        >>> Check slots coverage...
        [OK] All 16384 slots covered.
      • 创建slave

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        # 添加slave:将192.168.1.181:8003加入到192.168.1.180:8000集群中,并且作为指定<node_id>的slave
        redis-trib.rb add-node --slave --master-id 673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.181:8003 192.168.1.180:8000

        >>> Adding node 192.168.1.181:8003 to cluster 192.168.1.180:8000
        >>> Performing Cluster Check (using node 192.168.1.180:8000)
        M: 673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.180:8000
        slots:0-5460 (5461 slots) master
        0 additional replica(s)
        M: 605950ea5c2214f50d5f3dddd87c80f4e7d1b631 192.168.1.180:8001
        slots:5461-10922 (5462 slots) master
        0 additional replica(s)
        M: a923e10183ef356bcadc1566503be4ab1ea1adb6 192.168.1.180:8002
        slots:10923-16383 (5461 slots) master
        0 additional replica(s)
        [OK] All nodes agree about slots configuration.
        >>> Check for open slots...
        >>> Check slots coverage...
        [OK] All 16384 slots covered.
        >>> Send CLUSTER MEET to node 192.168.1.181:8003 to make it join the cluster.
        Waiting for the cluster to join...
        >>> Configure node as replica of 192.168.1.180:8000.
        [OK] New node added correctly.
        1
        2
        3
        redis-trib.rb add-node --slave --master-id 605950ea5c2214f50d5f3dddd87c80f4e7d1b631 192.168.1.181:8004 192.168.1.180:8000

        redis-trib.rb add-node --slave --master-id a923e10183ef356bcadc1566503be4ab1ea1adb6 192.168.1.181:8005 192.168.1.180:8000
      • 查看集群信息

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        redis-trib.rb check 192.168.1.180:8000

        >>> Performing Cluster Check (using node 192.168.1.180:8000)M: 673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.180:8000
        slots:0-5460 (5461 slots) master
        1 additional replica(s)
        S: dea85bf87e560b6a5074f60965b8ad334bfeb5e8 192.168.1.181:8004
        slots: (0 slots) slave
        replicates 605950ea5c2214f50d5f3dddd87c80f4e7d1b631
        M: a923e10183ef356bcadc1566503be4ab1ea1adb6 192.168.1.180:8002
        slots:10923-16383 (5461 slots) master
        1 additional replica(s)
        M: 605950ea5c2214f50d5f3dddd87c80f4e7d1b631 192.168.1.180:8001
        slots:5461-10922 (5462 slots) master
        1 additional replica(s)
        S: 6f12a4c4e1c60f435f68fbce1b72dc60ac73de83 192.168.1.181:8005
        slots: (0 slots) slave
        replicates a923e10183ef356bcadc1566503be4ab1ea1adb6
        S: 74bfaa76306dd6bc59e559d012203ceed2a8ab24 192.168.1.181:8003
        slots: (0 slots) slave
        replicates 673e32925e0a6f9beefac8aeaad8a397758c5e47
        [OK] All nodes agree about slots configuration.
        >>> Check for open slots...
        >>> Check slots coverage...
        [OK] All 16384 slots covered.
        1
        2
        3
        4
        5
        6
        7
        192.168.1.180:8000> CLUSTER NODES
        dea85bf87e560b6a5074f60965b8ad334bfeb5e8 192.168.1.181:8004@18004 slave 605950ea5c2214f50d5f3dddd87c80f4e7d1b631 0 1541155161000 2 connected
        673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.180:8000@18000 myself,master - 0 1541155160000 1 connected 0-5460
        a923e10183ef356bcadc1566503be4ab1ea1adb6 192.168.1.180:8002@18002 master - 0 1541155162973 3 connected 10923-16383
        605950ea5c2214f50d5f3dddd87c80f4e7d1b631 192.168.1.180:8001@18001 master - 0 1541155160000 2 connected 5461-10922
        6f12a4c4e1c60f435f68fbce1b72dc60ac73de83 192.168.1.181:8005@18005 slave a923e10183ef356bcadc1566503be4ab1ea1adb6 0 1541155162000 3 connected
        74bfaa76306dd6bc59e559d012203ceed2a8ab24 192.168.1.181:8003@18003 slave 673e32925e0a6f9beefac8aeaad8a397758c5e47 0 1541155162000 1 connected
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        192.168.1.180:8000> CLUSTER INFO
        cluster_state:ok
        cluster_slots_assigned:16384
        cluster_slots_ok:16384
        cluster_slots_pfail:0
        cluster_slots_fail:0
        cluster_known_nodes:6
        cluster_size:3
        cluster_current_epoch:3
        cluster_my_epoch:1
        cluster_stats_messages_ping_sent:1756
        cluster_stats_messages_pong_sent:1713
        cluster_stats_messages_sent:3469
        cluster_stats_messages_ping_received:1708
        cluster_stats_messages_pong_received:1756
        cluster_stats_messages_meet_received:5
        cluster_stats_messages_received:3469
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        192.168.1.180:8000> CLUSTER SLOTS
        1) 1) (integer) 0
        2) (integer) 5460
        3) 1) "192.168.1.180"
        2) (integer) 8000
        3) "673e32925e0a6f9beefac8aeaad8a397758c5e47"
        4) 1) "192.168.1.181"
        2) (integer) 8003
        3) "74bfaa76306dd6bc59e559d012203ceed2a8ab24"
        2) 1) (integer) 10923
        2) (integer) 16383
        3) 1) "192.168.1.180"
        2) (integer) 8002
        3) "a923e10183ef356bcadc1566503be4ab1ea1adb6"
        4) 1) "192.168.1.181"
        2) (integer) 8005
        3) "6f12a4c4e1c60f435f68fbce1b72dc60ac73de83"
        3) 1) (integer) 5461
        2) (integer) 10922
        3) 1) "192.168.1.180"
        2) (integer) 8001
        3) "605950ea5c2214f50d5f3dddd87c80f4e7d1b631"
        4) 1) "192.168.1.181"
        2) (integer) 8004
        3) "dea85bf87e560b6a5074f60965b8ad334bfeb5e8"
        1
        2
        3
        4
        5
        6
        redis-trib.rb info 192.168.1.180:8000

        192.168.1.180:8000 (673e3292...) -> 1 keys | 5461 slots | 1 slaves.192.168.1.180:8002 (a923e101...) -> 1 keys | 5461 slots | 1 slaves.
        192.168.1.180:8001 (605950ea...) -> 1 keys | 5462 slots | 1 slaves.
        [OK] 3 keys in 3 masters.
        0.00 keys per slot on average.

常用管理命令

cluster cmd

  • 原生redis cluster命令集

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    //cluster
    CLUSTER INFO 打印集群的信息
    CLUSTER NODES 列出集群当前已知的所有节点(node),以及这些节点的相关信息。
    //node
    CLUSTER MEET <ip> <port> 将 ip 和 port 所指定的节点添加到集群当中,让它成为集群的一份子。
    CLUSTER FORGET <node_id> 从集群中移除 node_id 指定的节点。
    CLUSTER REPLICATE <node_id> 将当前节点设置为 node_id 主节点的从节点。
    CLUSTER SAVECONFIG 将节点的配置文件保存到硬盘里面。
    //slot
    CLUSTER ADDSLOTS <slot> [slot ...] 将一个或多个槽(slot)指派(assign)给当前节点。
    CLUSTER DELSLOTS <slot> [slot ...] 移除一个或多个槽对当前节点的指派。
    CLUSTER FLUSHSLOTS 移除指派给当前节点的所有槽,让当前节点变成一个没有指派任何槽的节点。
    CLUSTER SETSLOT <slot> NODE <node_id> 将槽 slot 指派给 node_id 指定的节点。
    CLUSTER SETSLOT <slot> MIGRATING <node_id> 将本节点的槽 slot 迁移到 node_id 指定的节点中。
    CLUSTER SETSLOT <slot> IMPORTING <node_id> 从 node_id 指定的节点中导入槽 slot 到本节点。
    CLUSTER SETSLOT <slot> STABLE 取消对槽 slot 的导入(import)或者迁移(migrate)。
    //key
    CLUSTER KEYSLOT <key> 计算键 key 应该被放置在哪个槽上。
    CLUSTER COUNTKEYSINSLOT <slot> 返回槽 slot 目前包含的键值对数量。
    CLUSTER GETKEYSINSLOT <slot> <count> 返回 count 个 slot 槽中的键。
    //新增
    CLUSTER SLAVES node-id 返回一个master节点的slaves 列表
  • 更改slave隶属master

    1
    2
    # 登录需要修改的slave
    redis-cli -c -h 192.168.1.181 -p 8006
    1
    2
    # 调用cluster命令修改隶属master
    192.168.1.181:8006> CLUSTER REPLICATE <new_master_id>
  • 手动缩容master(迁移slot)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # 查看集群状态
    > CLUSTER NODES
    74bfaa76306dd6bc59e559d012203ceed2a8ab24 192.168.1.181:8003@18003 slave a923e10183ef356bcadc1566503be4ab1ea1adb6 0 1542794664779 3 connected
    d976e1cc897744d5e5c2a0f754662c9d2a7cc077 192.168.1.181:8006@18006 myself,master - 0 1542794660000 12 connected 5461-5558 10923-10950
    a923e10183ef356bcadc1566503be4ab1ea1adb6 192.168.1.180:8002@18002 master - 0 1542794663778 3 connected 10951-16383
    dea85bf87e560b6a5074f60965b8ad334bfeb5e8 192.168.1.181:8004@18004 slave 6f12a4c4e1c60f435f68fbce1b72dc60ac73de83 0 1542794661000 11 connected
    605950ea5c2214f50d5f3dddd87c80f4e7d1b631 192.168.1.180:8001@18001 master - 0 1542794664000 10 connected 5559-10922
    673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.180:8000@18000 slave 605950ea5c2214f50d5f3dddd87c80f4e7d1b631 0 1542794664000 10 connected
    6f12a4c4e1c60f435f68fbce1b72dc60ac73de83 192.168.1.181:8005@18005 master - 0 1542794665780 11 connected 0-5460

    8006迁移到8001
    1
    2
    3
    4
    5
    6
    7
    # 迁移步骤说明
    > 在迁移目的节点执行cluster setslot <slot> IMPORTING <source node ID>命令,指明需要迁移的slot和迁移源节点。
    > 在迁移源节点执行cluster setslot <slot> MIGRATING <node ID>命令,指明需要迁移的slot和迁移目的节点。
    > 在迁移源节点执行cluster getkeysinslot获取该slot的key列表。
    > 在迁移源节点执行对每个key执行migrate命令,该命令会同步把该key迁移到目的节点。
    > 在迁移源节点反复执行cluster getkeysinslot命令,直到该slot的列表为空。
    > 在迁移源节点和目的节点执行cluster setslot <slot> NODE <target node ID>,完成迁移操作。
    1
    2
    # 使用脚本进行迁移slot
    migrate_slot.sh 192.168.1.181 8006 192.168.1.180 8001 5461 5558 "<redis_password>"

redis-trib.rb cmd

官方redis cluster集群管理工具命令集

  • 创建集群redis-trib.rb create

    • 只创建master节点

      1
      2
      redis-trib.rb create <master1 ip:port> <master2 ip:port> <master3 ip:port>
      e.g. redis-trib.rb create 192.168.1.180:8000 192.168.1.180:8001 192.168.1.180:8002
    • 自动分配master/slave节点

      1
      2
      redis-trib.rb create --replicas 1 <redis1 ip:port> <redis2 ip:port> <redis3 ip:port> <redis4 ip:port> <redis5 ip:port> <redis6 ip:port>
      e.g. redis-trib.rb create --replicas 1 192.168.1.180:8000 192.168.1.180:8001 192.168.1.180:8002 192.168.1.181:8003 192.168.1.181:8004 192.168.1.181:8005
  • 检查集群redis-trib.rb check

    1
    2
    redis-trib.rb check <redis ip:port>
    e.g. redis-trib.rb check 192.168.1.180:8000
  • 查看集群信息redis-trib.rb info

    1
    2
    redis-trib.rb info <redis ip:port>
    e.g. redis-trib.rb info 192.168.1.180:8000
  • 修复集群redis-trib.rb fix

    1
    2
    redis-trib.rb fix <redis ip:port>
    e.g. redis-trib.rb fix 192.168.1.180:8000
  • 新增节点redis-trib.rb add-node

    • 加入master节点

      1
      2
      redis-trib.rb add-node <new master ip:port> <one of cluster ip:port>
      e.g. redis-trib.rb add-node 192.168.1.181:8006 192.168.1.180:8000
    • 加入slave节点

      1
      2
      3
      # 将<new slave ip:port>节点加入集群,并作为<master_id>节点的slave
      redis-trib.rb add-node --slave --master-id <master node id> <new slave ip:port> <one of cluster ip:port>
      e.g. redis-trib.rb add-node --slave --master-id 673e32925e0a6f9beefac8aeaad8a397758c5e47 192.168.1.181:8007 192.168.1.180:8000
  • 删除节点redis-trib.rb del-node

    redis-trib.rb del-node只能删除没有分配slot的节点 (slave or 空master)

    1
    2
    redis-trib.rb del-node <one of cluster ip:port> <delete node id>
    e.g. redis-trib.rb del-node 192.168.1.180:8000 6f12a4c4e1c60f435f68fbce1b72dc60ac73de83
  • 在线迁移slotredis-trib.rb reshard

    redis-4.x.x.gem 有bug!不能进行扩容/缩容reshard|rebalance, 报错Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY)

    安装redis-3.x.x.gem版本可调用reshard|rebalance但不支持密码,不适用于有密码验证的集群

    1
    2
    3
    4
    5
    6
    7
    # 交互式
    redis-trib.rb reshard <one of cluster ip:port>

    # 非交互式
    redis-trib.rb reshard --from <all|node1_id,node2_id,node3_id> --to <dest node_id> --slots <numbers of slots> <one of cluster ip:port>
    e.g. redis-trib.rb reshard --from all --to 673e32925e0a6f9beefac8aeaad8a397758c5e47 --slots 5461 192.168.1.180:8000
    e.g. redis-trib.rb reshard --from 673e32925e0a6f9beefac8aeaad8a397758c5e47,a923e10183ef356bcadc1566503be4ab1ea1adb6 --to 673e32925e0a6f9beefac8aeaad8a397758c5e47 --slots 5461 192.168.1.180:8000
  • 平衡slotredis-trib.rb rebalance

    redis-4.x.x.gem 有bug!不能进行扩容/缩容reshard|rebalance, 报错Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY)

    安装redis-3.x.x.gem版本可调用reshard|rebalance但不支持密码,不适用于有密码验证的集群

    1
    2
    3
    4
    # 简单平衡所有节点
    redis-trib.rb rebalance 192.168.1.180:8000
    # 复杂平衡
    redis-trib.rb rebalance --threshold 1 --weight b31e3a2e=5 --weight 60b8e3a1=5 --use-empty-masters --simulate 192.168.1.180:8000
  • 设置node间心跳超时时间redis-trib.rb set-timeout

    1
    2
    redis-trib.rb set-timeout <one of cluster ip:port> <timeout>
    e.g. redis-trib.rb set-timeout 192.168.1.180:8000 30000
  • 集群所有node执行命令redis-trib.rb call

    1
    2
    3
    4
    5
    6
    7
    redis-trib.rb call <one of cluster ip:port> <redis command>
    e.g. redis-trib.rb call 192.168.1.180:8000 get key
    e.g. redis-trib.rb call 192.168.1.180:8000 rconfig rewrite

    redis-trib.rb call 10.10.10.171:7004 rconfig set masterauth "bincentTO*TOredis"
    redis-trib.rb call 10.10.10.171:7004 rconfig set requirepass "bincentTO*TOredis"
    redis-trib.rb call 10.10.10.171:7004 rconfig rewrite
  • 导入数据redis-trib.rb import

    1
    2
    3
    redis-trib.rb import --from <source ip:port> <one of cluster ip:port>
    # --copy 选项可以保存旧redis上的key
    # --replace 选项可以替换集群中相同名称的key,如果不使用,此类key不会被导入

Redis 工具

redis-faina: 分析hot key和top commands。需要注意的是,此脚本使用monitor命令进行分析

redis-rdb-tools: 通过RDB文件全量分析bigkey。redis原本支持查询bigkeyredis-cli --bigkeys

RedisCluster 测试

模拟场景 是否达到预期 业务功能是否受损预期 结论
宕1个slave 是 (耗时15s自动故障迁移) 业务不受损 当master无slave时,若有冗余slave,会自动进行切换,不影响集群功能。15s由cluster-node-timeout参数决定
宕2个slave 业务不受损 master无slave不影响集群功能,slave宕机不影响集群功能
宕3个slave 业务不受损 master无slave不影响集群功能,slave宕机不影响集群功能
宕1个master 是 (耗时15s自动slave切换master) 业务受影响15s slave自动切换成master期间(15s),宕掉的master对应的slot不可用,所有对此slot的操作皆会受影响,由于设置cluster-require-full-coverage=no,没宕的2个master可正常提供服务。15s由cluster-node-timeout参数决定
宕2个master FAILOVER FORCE / TAKEOVER恢复 业务受损 集群超过半数master宕,集群进入fail状态,slave不会自动切换成master,整个集群不可用。不要将多个master部署在一台机器上
宕3个master 业务受损 集群所有master宕,集群进入fail状态,slave不会自动切换成master。整个集群不可用
宕1组master/slave 业务受影响 一组主从同时宕机,会导致对应宕掉master的slot无法操作,由于设置cluster-require-full-coverage=no没宕的master可正常提供服务。不要将一组主从部署在一台机器上
扩容/缩容1个master 业务不受损 迁移slot,不影响对集群读写操作。由于官方工具redis-trib.rb有bug,扩容/缩容需手动迁移slot,极易误操作,建议尽量不要纯手工迁移slot,非要迁移尽量使用脚本迁移,避免纯手动迁移

标准化规范

  • 公共配置文件路径: /usr/local/redis-cluster/redis.conf
  • 实例配置文件路径: /usr/local/redis-cluster/conf
  • 配置文件命名: {redis_port}.conf,以redis端口命名配置文件名。(e.g., /usr/local/redis-cluster/conf/8000.conf)
  • 数据存放路径: /data/redis/{redis_port},以redis端口命名文件夹目录。(e.g., dir “/data/redis/8000”)
  • 日志存放路径: /data/logs/redis/
  • 日志文件命名: redis_{redis_port}.log,以redis端口命名日志。(e.g., logfile “/data/logs/redis/redis_8000.log”)
  • 端口范围: redis_port > 20000

附录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# 迁移slot脚本
. /etc/profile

set -eu

source_ip=$1
source_port=$2
target_ip=$3
target_port=$4
start_slot=$5
end_slot=$6
password=$7

if [[ $# != 7 ]]; then
echo -e "Usage: $0 source_ip source_port target_ip target_port start_slot end_slot password"
exit 1
fi


# 在目的节点执行cluster setslot <slot> IMPORTING <source node ID>命令,指明需要迁移的slot和迁移源节点
for slot in `seq ${start_slot} ${end_slot}`; do
redis-cli -c -h ${target_ip} -p ${target_port} -a ${password} cluster setslot ${slot} IMPORTING `redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster nodes | grep ${source_ip} | grep ${source_port} | awk '{print $1}'`
done

# 在源节点执行cluster setslot <slot> MIGRATING <target node ID>命令,指明需要迁移的slot和迁移目的节点
for slot in `seq ${start_slot} ${end_slot}`; do
redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster setslot ${slot} MIGRATING `redis-cli -c -h ${target_ip} -p ${target_port} -a ${password} cluster nodes | grep ${target_ip} | grep ${target_port} | awk '{print $1}'`
done


for slot in `seq ${start_slot} ${end_slot}`; do
while true ; do
#源节点执行getkeysinslot命令,从slot中取出count(20)个键值对的key
allkeys=`redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster getkeysinslot ${slot} 20`
#若slot中有key,则需先迁移key再迁移slot
if [ -z "${allkeys}" ]; then
#slot无key
#源节点和目标节点执行setslot命令,将slot分配给目标节点
redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster setslot ${slot} NODE `redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster nodes | grep ${target_ip} | grep ${target_port} | awk '{print $1}'`
redis-cli -c -h ${target_ip} -p ${target_port} -a ${password} cluster setslot ${slot} NODE `redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster nodes | grep ${target_ip} | grep ${target_port} | awk '{print $1}'`
break
else
for key in ${allkeys}; do
echo "migrate slot:${slot} key:${key}"
#源节点执行migrate命令,将key迁移到目标节点
# MIGRATE target_ip target_port key db timeout(millisecond) AUTH password
redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} MIGRATE ${target_ip} ${target_port} ${key} 0 5000 AUTH ${password}
done
fi
done
done

Powered: Hexo, Theme: Nadya remastered from NadyMain