Redis集群中节点更换IP后如何恢复集群且保留数据

前段时间在生产中遇到一个问题,即系统需要从一个网段迁移到另一个网段。我们知道redis集群在创建时是指定了节点的ip:port,因此在节点IP变更后,集群自然就失效了。如果需要恢复集群怎么办?当然在大部分情况下,我们可以选择删除所有节点的数据文件dbfilename、持久化文件appendfilename、集群配置文件cluster-config-file,然后重建集群。但是如果需要保留数据,又该怎么操作呢?

以下以一个三主三从的单副本集群来演示恢复过程:

[root@test1 bin]# ./redis-cli -a password --cluster check 192.168.66.101:7000 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.66.101:7000 (d1ddeaa7...) -> 334 keys | 5461 slots | 1 slaves. 192.168.66.102:7003 (d21ce248...) -> 341 keys | 5462 slots | 1 slaves. 192.168.66.101:7001 (bb5c5e76...) -> 325 keys | 5461 slots | 1 slaves. [OK] 1000 keys in 3 masters. 0.06 keys per slot on average. >>> Performing Cluster Check (using node 192.168.66.101:7000) M: d1ddeaa7c77e35b3df50953fc09834b662cbac8b 192.168.66.101:7000 slots:[0-5460] (5461 slots) master 1 additional replica(s) M: d21ce2482179af3b76a9f29d870848bae18a3214 192.168.66.102:7003 slots:[5461-10922] (5462 slots) master 1 additional replica(s) S: 089b2e16dff1f68c399a1efc73580e7cbbbfa71b 192.168.66.101:7002 slots: (0 slots) slave replicates d21ce2482179af3b76a9f29d870848bae18a3214 S: 92d8208b582c6111bd383b6fdfc2d80a86f47350 192.168.66.102:7005 slots: (0 slots) slave replicates d1ddeaa7c77e35b3df50953fc09834b662cbac8b S: ea68bec54e3deb0bd209f151151098ae6d8cf0b4 192.168.66.102:7004 slots: (0 slots) slave replicates bb5c5e768ab4aff9c92d7fd3f2d55007e2736c65 M: bb5c5e768ab4aff9c92d7fd3f2d55007e2736c65 192.168.66.101:7001 slots:[10923-16383] (5461 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.

将集群中所有节点的IP由192.168.66.*更换为192.168.77.*,此时如果尝试检查集群状态,可以看到集群仍然尝试连接192.168.66.*网段的节点:

[root@test1 bin]# ./redis-cli -a password --cluster check 192.168.77.101:7000 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. Could not connect to Redis at 192.168.66.102:7003: Connection timed out ......

shutdown所有节点。

找到节点配置文件中cluster-config-file,此字段配置集群配置文件的保存位置(本示例中为/data/redis/cluster/7000/nodes_7000.conf),查看该文件内容:

[root@test1 ~] cat /data/redis/cluster/7000/nodes_7000.conf d1ddeaa7c77e35b3df50953fc09834b662cbac8b 192.168.66.101:7000@17000 myself,master - 0 1626244031000 1 connected 0-5460 ea68bec54e3deb0bd209f151151098ae6d8cf0b4 192.168.66.102:7004@17004 slave bb5c5e768ab4aff9c92d7fd3f2d55007e2736c65 0 1626 244034813 5 connected d21ce2482179af3b76a9f29d870848bae18a3214 192.168.66.102:7003@17003 master - 0 1626244033803 4 connected 5461-10922 089b2e16dff1f68c399a1efc73580e7cbbbfa71b 192.168.66.101:7002@17002 slave d21ce2482179af3b76a9f29d870848bae18a3214 0 1626 244032793 4 connected bb5c5e768ab4aff9c92d7fd3f2d55007e2736c65 192.168.66.101:7001@17001 master - 0 1626244030770 2 connected 10923-16383 92d8208b582c6111bd383b6fdfc2d80a86f47350 192.168.66.102:7005@17005 slave d1ddeaa7c77e35b3df50953fc09834b662cbac8b 0 1626 244031782 6 connected vars currentEpoch 6 lastVoteEpoch 0

将所有节点的cluster-config-file文件中的IP地址均由192.168.66.*改为192.168.77.*:

# 192.168.66.101 执行 sed -i 's/192.168.66/192.168.77/g' /data/redis/cluster/7000/nodes_7000.conf /data/redis/cluster/7001/nodes_7001.conf /data/redis/cluster/7002/nodes_7002.conf # 192.168.66.102 执行 sed -i 's/192.168.66/192.168.77/g' /data/redis/cluster/7003/nodes_7003.conf /data/redis/cluster/7004/nodes_7004.conf /data/redis/cluster/7005/nodes_7005.conf
Code language: PHP (php)

启动所有节点。

再次检查集群状态:

[root@test1 bin]# ./redis-cli -a password --cluster check 192.168.77.101:7000 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.77.101:7000 (d1ddeaa7...) -> 334 keys | 5461 slots | 1 slaves. 192.168.77.102:7003 (d21ce248...) -> 341 keys | 5462 slots | 1 slaves. 192.168.77.101:7001 (bb5c5e76...) -> 325 keys | 5461 slots | 1 slaves. [OK] 1000 keys in 3 masters. 0.06 keys per slot on average. >>> Performing Cluster Check (using node 192.168.77.101:7000) M: d1ddeaa7c77e35b3df50953fc09834b662cbac8b 192.168.77.101:7000 slots:[0-5460] (5461 slots) master 1 additional replica(s) S: 92d8208b582c6111bd383b6fdfc2d80a86f47350 192.168.77.102:7005 slots: (0 slots) slave replicates d1ddeaa7c77e35b3df50953fc09834b662cbac8b M: d21ce2482179af3b76a9f29d870848bae18a3214 192.168.77.102:7003 slots:[5461-10922] (5462 slots) master 1 additional replica(s) M: bb5c5e768ab4aff9c92d7fd3f2d55007e2736c65 192.168.77.101:7001 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: 089b2e16dff1f68c399a1efc73580e7cbbbfa71b 192.168.77.101:7002 slots: (0 slots) slave replicates d21ce2482179af3b76a9f29d870848bae18a3214 S: ea68bec54e3deb0bd209f151151098ae6d8cf0b4 192.168.77.102:7004 slots: (0 slots) slave replicates bb5c5e768ab4aff9c92d7fd3f2d55007e2736c65 [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. [root@test1 bin]# ./redis-cli -a password --cluster info 192.168.77.101:7000 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.77.101:7000 (d1ddeaa7...) -> 334 keys | 5461 slots | 1 slaves. 192.168.77.102:7003 (d21ce248...) -> 341 keys | 5462 slots | 1 slaves. 192.168.77.101:7001 (bb5c5e76...) -> 325 keys | 5461 slots | 1 slaves. [OK] 1000 keys in 3 masters. 0.06 keys per slot on average.

可以看到集群状态已经恢复,key数量与IP变更前一致。

测试一下集群的数据写入和读取:

[root@test1 bin]# ./redis-cli -a password -c -h 192.168.77.101 -p 7000 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.77.101:7000> keys * 1) "name725" 2) "name359" ...... 192.168.77.101:7000> get name7 "hello\n" 192.168.77.101:7000> get name400 -> Redirected to slot [11448] located at 192.168.77.101:7001 "hello\n" 192.168.77.101:7001> set testkey 'testvalue' -> Redirected to slot [4757] located at 192.168.77.101:7000 OK 192.168.77.101:7000> get testkey "testvalue"

原有数据读取正常,新数据写入读取正常,集群恢复。

总结

redis集群节点更换IP后,只需要修改所有节点 cluster-config-file 中的IP地址为新地址,并重启所有节点,集群即可自动恢复。