厦门 网站设计,国家企业信用系统,免费正能量励志网站,江苏省建设证书变更网站错误说明#xff1a; MySQL主从同步的1032错误#xff0c;一般是指要更改的数据不存在#xff0c;SQL_THREAD提取的日志无法应用故报错#xff0c;造成同步失败
#xff08;Update、Delete、Insert一条已经delete的数据#xff09;。 1032的错误本身对数据一致性没什么影…错误说明 MySQL主从同步的1032错误一般是指要更改的数据不存在SQL_THREAD提取的日志无法应用故报错造成同步失败
Update、Delete、Insert一条已经delete的数据。 1032的错误本身对数据一致性没什么影响影响最大的是造成了同步失败、同步停止。 如果主主主从有同步失败要第一时间查看并着手解决。因为不同步会造成读取数据的不一致。应在第一时间恢复同步
尽量减少对业务的影响。然后再具体分析不同步的原因手动或者自动修复数据并做pt-table-checksum数据一致性检查。 目前业务一般是做主主同步主主同步由于是异步更新存在更新冲突的问题且很容易引起SQL ERROR 1032错误。这个应该在业务侧解决
保证同一时间只更新数据库的一个点类似单点写入。我们的解决办法是:写一个底层数据库调用库可能涉及到更新冲突的操作都调用这个库。
在配置文件里配2个点的数据库A、B保证一直都更新A库如果A库不可用就去更新B库。 另外如果是对数据一致性要求较高的场景比如涉及到钱建议用PXC强一致性、真正同步复制
方法一 1.Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND 是主从更新时丢失数据导致主从不一致在从库上mysql show slave status\G;结果如下 2.在master上用mysqlbinlog 分析下出错的binlog日志在干什么
/usr/local/mysql/bin/mysqlbinlog --no-defaults -v -v --base64-outputDECODE-ROWS mysql-bin.013934 | grep -A 10 975912206 /usr/local/mysql/bin/mysqlbinlog --no-defaults -v -v --base64-outputDECODE-ROWS mysql-bin.013934 20200528.log
查找对应POS发现是这么一条SQL操作
3.查询master和slave上对应的数据主库有而从库没有 select * from log_silver where id1019381914\G;
4.把丢失的数据在slave上填补然后跳过报错
mysql insert into t1 values();
mysql stop slave ;set global sql_slave_skip_counter1;start slave;
5.查看同步结果
mysql show slave status\G;
方法二
1.如果用方法一还是不能主从同步查询到这个错误是mysql的bug导致的
2.解决的办法 1.最好的办法是升级数据库 保证bug不会重现。 2.利用配置参数 来躲避这个bug vi /etc/my.cnf slave-skip-errors 1032,xxxx,xxxx .... 3.临时逃避此次错误。 set global sql_slave_skip_counter1; stop slave; start slave; [MySQL] SQL_ERROR 1032解决办法
一、缘由 在主主同步的测试环境由于业务侧没有遵循同一时间只写一个点的原则造成A库上删除了一条数据B库上在同时更新这条数据。
由于异步和网络延时B的更新event先到达A端执行造成A端找不到这条记录故SQL_THREAD报错1032主从同步停止。 二、解决办法 MySQL5.6.30版本binlog模式为ROW。 show slave status\G,可以看到如下报错 Slave_SQL_Running: NO Last_SQL_Errno: 1032 Last_SQL_Error: Worker 3 failed executing transaction at master log mysql-bin.000003, end_log_pos 440267874; Could not execute Delete_rows event on table db_test.tbuservcbgolog; Cant find record in tbuservcbgolog, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the events master log mysql-bin.000003, end_log_pos 440267874 从上可以看出是SQL_THREAD线程出错错误号码1032。是在应用delete db_test.tbuservcbgolog 表中一行数据的事件时由于这条数据
不存在而出错。此事件在主服务器Master binlog中的位置是 mysql-bin.000003, end_log_pos 440267874。当然可以在从服务器Slave的Relay
log中查找具体方法见最后 方法1跳过错误Event
先跳过这一条错误(event),让主从同步恢复正常。或者N条event一条一条跳过 stop slave; set global sql_slave_skip_counter1; start slave; 方法2跳过所有1032错误
更改my.cnf文件在Replication settings下添加 slave-skip-errors 1032
并重启数据库然后start salve。
注意因为要重启数据库不推荐除非错误事件太多。 方法3还原被删除的数据
根据错误提示信息用mysqlbinlog找到该条数据event SQL并逆向手动执行。如delete 改成insert。
本例中此事件在主服务器Master binlog中的位置是 mysql-bin.000003, end_log_pos 440267874。
1利用mysqlbinlog工具找出440267874的事件
/usr/local/mysql-5.6.30/bin/mysqlbinlog --base64-outputdecode-rows -vv mysql-bin.000003 |grep -A 20 440267874
或者/usr/local/mysql-5.6.30/bin/mysqlbinlog --base64-outputdecode-rows -vv mysql-bin.000003 --stop-position440267874 | tail -20
或者usr/local/mysql-5.6.30/bin/mysqlbinlog --base64-outputdecode-rows -vv mysql-bin.000003 decode.log 或者加上参数-d, --databasename 来进一步过滤 #160923 20:01:27 server id 1223307 end_log_pos 440267874 CRC32 0x134b2cbc Delete_rows: table id 319 flags: STMT_END_F ### DELETE FROM db_99ducj.tbuservcbgolog ### WHERE ### 110561502 /* INT meta0 nullable0 is_null0 */ ### 21683955 /* INT meta0 nullable0 is_null0 */ ### 390003 /* INT meta0 nullable0 is_null0 */ ### 40 /* INT meta0 nullable0 is_null0 */ ### 52016-09-23 17:02:24 /* DATETIME(0) meta0 nullable1 is_null0 */ ### 6NULL /* DATETIME(0) meta0 nullable1 is_null1 */ # at 440267874
以上为检索出来的结果事务语句为delete from db_99ducj.tbuservcbgolog where 110561502 and 21683955 ...
其中1 2 3...分别对应表tbuservcbgolog的列名填补上即可。
我们可以逆向此SQL 将deleter 变成Insert,手动在从库上执行此Insert SQL,之后restart slave就好了。 M-S监控脚本
#!/bin/bash # #check_mysql_slave_replication_status # # # parasum2 help_msg(){ cat help --------------------- Error Cause: you must input $parasum parameters! 1st : Host_IP 2st : Host_Port help exit } [ $# -ne ${parasum} ] help_msg #若参数不够打印帮助信息并退出 export HOST_IP$1 export HOST_PORt$2 MYUSERroot MYPASS123456 MYSQL_CMDmysql -u$MYUSER -p$MYPASS MailTitle #邮件主题 Mail_Address_MysqlStatusrootlocalhost.localdomain #收件人邮箱 time1$(date %Y%m%d%H%M%S) time2$(date %Y-%m-%d %H:%M:%S) SlaveStatusFile/tmp/salve_status_${HOST_PORT}.${time1} #邮件内容所在文件 echo --------------------Begin at: $time2 $SlaveStatusFile echo $SlaveStatusFile #get slave status ${MYSQL_CMD} -e show slave status\G $SlaveStatusFile #取得salve进程的状态 #get io_thread_status,sql_thread_status,last_errno 取得以下状态值 IOStatus$(cat $SlaveStatusFile|grep Slave_IO_Running|awk {print $2}) SQLStatus$(cat $SlaveStatusFile|grep Slave_SQL_Running |awk {print $2}) Errno$(cat $SlaveStatusFile|grep Last_Errno | awk {print $2}) Behind$(cat $SlaveStatusFile|grep Seconds_Behind_Master | awk {print $2}) echo $SlaveStatusFile if [ $IOStatus No ] || [ $SQLStatus No ];then #判断错误类型 if [ $Errno -eq 0 ];then #可能是salve线程未启动 $MYSQL_CMD -e start slave io_thread;start slave sql_thread; echo Cause slave threads doesnots running,trying start slsave io_thread;start slave sql_thread; $SlaveStatusFile MailTitle[Warning] Slave threads stoped on $HOST_IP $HOST_PORT elif [ $Errno -eq 1007 ] || [ $Errno -eq 1053 ] || [ $Errno -eq 1062 ] || [ $Errno -eq 1213 ] || [ $Errno -eq 1032 ]\ || [ Errno -eq 1158 ] || [ $Errno -eq 1159 ] || [ $Errno -eq 1008 ];then #忽略此些错误 $MYSQL_CMD -e stop slave;set global sql_slave_skip_counter1;start slave; echo Cause slave replication catch errors,trying skip counter and restart slave;stop slave ;set global sql_slave_skip_counter1;slave start; $SlaveStatusFile MailTitle[Warning] Slave error on $HOST_IP $HOST_PORT! ErrNum: $Errno else echo Slave $HOST_IP $HOST_PORT is down! $SlaveStatusFile MailTitle[ERROR]Slave replication is down on $HOST_IP $HOST_PORT ! ErrNum:$Errno fi fi if [ -n $Behind ];then Behind0 fi echo $Behind $SlaveStatusFile #delay behind master 判断延时时间 if [ $Behind -gt 300 ];then echo date %Y-%m%d %H:%M:%S slave is behind master $Bebind seconds! $SlaveStatusFile MailTitle[Warning]Slave delay $Behind seconds,from $HOST_IP $HOST_PORT fi if [ -n $MailTitle ];then #若出错或者延时时间大于300s则发送邮件 cat ${SlaveStatusFile} | /bin/mail -s $MailTitle $Mail_Address_MysqlStatus fi #del tmpfile:SlaveStatusFile $SlaveStatusFile 测试 [roothong shell]# sh mon_mysql_.sh --------------------- Error Cause: you must input 2 parameters! 1st : Host_IP 2st : Host_Port [roothong shell]# sh mon_mysql_.sh 192.168.0.112 3306 #参数需要 IP 和 端口 修改后脚本 只做了简单的整理修正了Behind为NULL的判断但均未测试 应可考虑增加: 对修复执行结果的判断;多条错误的循环修复、检测、再修复 取消SlaveStatusFile临时文件。 Errno、Behind两种告警分别发邮件告警正文增加show slave结果原文。 增加PATH以便加到crontab中。 考虑crontab中周期执行(加锁避免执行冲突、执行周期选择) 增加执行日志 #!/bin/sh # check_mysql_slave_replication_status # 参考:http://www.tianfeiyu.com/?p2062 Usage(){ echo Usage: echo $0 HOST PORT USER PASS } [ -z $1 -o -z $2 -o -z $3 -o -z $4 ] Usage exit 1 HOST$1 PORT$2 USER$3 PASS$4 MYSQL_CMDmysql -h$HOST -P$PORT -u$USER -p$PASS MailTitle #邮件主题 Mail_Address_MysqlStatusrootlocalhost.localdomain #收件人邮箱 time1$(date %Y%m%d%H%M%S) time2$(date %Y-%m-%d %H:%M:%S) SlaveStatusFile/tmp/salve_status_${HOST_PORT}.${time1} #邮件内容所在文件 echo --------------------Begin at: $time2 $SlaveStatusFile echo $SlaveStatusFile #get slave status ${MYSQL_CMD} -e show slave status\G $SlaveStatusFile #取得salve进程的状态 #get io_thread_status,sql_thread_status,last_errno 取得以下状态值 IOStatus$(cat $SlaveStatusFile|grep Slave_IO_Running|awk {print $2}) SQLStatus$(cat $SlaveStatusFile|grep Slave_SQL_Running |awk {print $2}) Errno$(cat $SlaveStatusFile|grep Last_Errno | awk {print $2}) Behind$(cat $SlaveStatusFile|grep Seconds_Behind_Master | awk {print $2}) echo $SlaveStatusFile if [ $IOStatus No -o $SQLStatus No ];then case $Errno in 0) # 可能是slave未启动 $MYSQL_CMD -e start slave io_thread;start slave sql_thread; echo Cause slave threads doesnots running,trying start slsave io_thread;start slave sql_thread; $SlaveStatusFile ;; 1007|1053|1062|1213|1032|1158|1159|1008) # 忽略这些错误 $MYSQL_CMD -e stop slave;set global sql_slave_skip_counter1;start slave; echo Cause slave replication catch errors,trying skip counter and restart slave;stop slave ;set global sql_slave_skip_counter1;slave start; $SlaveStatusFile MailTitle[Warning] Slave error on $HOST:$PORT! ErrNum: $Errno ;; *) echo Slave $HOST:$PORT is down! $SlaveStatusFile MailTitle[ERROR]Slave replication is down on $HOST:$PORT! Errno:$Errno ;; esac fi if [ $Behind NULL -o -z $Behind ];then Behind0 fi echo Behind:$Behind $SlaveStatusFile #delay behind master 判断延时时间 if [ $Behind -gt 300 ];then echo date %Y-%m%d %H:%M:%S slave is behind master $Bebind seconds! $SlaveStatusFile MailTitle[Warning]Slave delay $Behind seconds,from $HOST $PORT fi if [ -n $MailTitle ];then #若出错或者延时时间大于300s则发送邮件 cat ${SlaveStatusFile} | /bin/mail -s $MailTitle $Mail_Address_MysqlStatus fi #del tmpfile:SlaveStatusFile $SlaveStatusFile check mysql slave状态并跳过相应错误(守护模式) 一、具体代码 #!/bin/bash #-------------------------------------------------- #Author:jimmygong #Email:jimmygongtaomee.com #FileName:checkslave.sh #Function: #Version:1.0 #Created:2016-01-01 #-------------------------------------------------- if [[ id -u -ne 0 ]] then echo script need root exit 1 fi function usage () { echo Usage:bash $0 start echo Usage:bash $0 stop } if [[ $# -ne 1 ]] then usage exit 1 fi #根据情况修改变量信息 mysqluserroot mysqlpass123456 mysqlport3398 mysqlsocket/opt/mysql/$mysqlport/$mysqlport.sock currdate$(date %Y%m%d) logpath/root/checkslave logfile$logpath/${currdate} #各个错误代码解释 #1158网络错误,出现读错误,请检查网络连接状况 #1159网络错误,读超时,请检查网络连接状况 #1008数据库不存在,删除数据库失败 #1007数据库已存在,创建数据库失败 #1062字段值重复,入库失败 #1452不能删除或更新父行,外键约束失败 #PS:1452这个根据情况跳过错误 allerror(1053 1032 1158 1159 1008 1007 1062 1452) sleeptime60 [[ -d $logpath ]] || mkdir $logpath function echosucc () { succstatus[ Ok ] printf \033[32m $succstatus $* \033[0m\n } function mysqlconn () { comm$1 mysql -u${mysqluser} -p${mysqlpass} -S ${mysqlsocket} -e $comm } function checkrun () { esport$(lsof -i:$mysqlport|wc -l) if [[ $esport -lt 2 ]] then echo Mysql Server Failed exit 1 fi } function statuserror () { for((i0;i${#allerror[*]};i)) do if [[ $1 ${allerror[$i]} ]] then mysqlconn stop slave;set global sql_slave_skip_counter1;start slave; else echo $(date %m-%d %H:%M:%S) Mysql Slave Failed $1 $logfile fi done } function checkslave () { status($(mysqlconn show slave status\G|egrep -i _running|last_errno|awk {print $NF})) if [[ ${status[0]} Yes ]] [[ ${status[1]} Yes ]] then echo $(date %m-%d %H:%M:%S) Mysql Slave Ok $logfile else statuserror ${status[2]} fi } function start () { echosucc Starting Monitor while true do checkrun checkslave sleep $sleeptime done } function stop () { echosucc Stop Monitor echo $$ $logpath/pidfile for PID in $(ps -eo pid,command|grep $0|grep -v grep|awk {print $1}); do [[ $PID ! $(cat $logpath/pidfile) ]] kill -s TERM $PID /dev/null done } case $1 in start) start ;; stop) stop ;; *) usage ;; esac 二、相关日志都记录在 cat checkslave/20160102 |more 01-02 21:19:10 Mysql Slave Failed 1452 01-02 21:19:10 Mysql Slave Failed 1452 01-02 21:19:10 Mysql Slave Failed 1452 01-02 21:19:10 Mysql Slave Failed 1452 01-02 21:19:10 Mysql Slave Failed 1452 01-02 21:34:58 Mysql Slave Ok 01-02 21:35:58 Mysql Slave Ok 01-02 21:36:59 Mysql Slave Ok 01-02 21:37:59 Mysql Slave Ok 三、相关进程 ps -ef |grep -w checkslave.sh root 12260 1 0 21:34 pts/2 00:00:00 bash checkslave.sh start 四、相关启动和关闭 bash checkslave.sh start [ Ok ] Starting Monitor bash checkslave.sh stop [ Ok ] Stop Monitor 3分钟解决MySQL 1032主从错误
Part1:写在最前 1032错误----现在生产库中好多数据在从库误删了生产库更新后找不到了现在主从不同步了再跳过错误也没用因为没这条再更新还会报错
解决方案
Part1:临时方案 mysql stop slave; Query OK, 0 rows affected (0.00 sec) mysql set global sql_slave_skip_counter1; Query OK, 0 rows affected (0.00 sec) mysql start slave; Query OK, 0 rows affected (0.00 sec)
Part2:永久方案
end_log_pos 有了它根据pos值直接就能找到找到delete那条数据反做变成insert 故障模拟 HE1从库误删 mysql delete from helei where id3; Query OK, 1 row affected (0.29 sec) mysql select * from helei; ---------- | id | text | ---------- | 1 | aa | | 2 | bb | | 4 | ee | | 5 | ff | | 6 | gg | | 7 | hh | ---------- 6 rows in set (0.00 sec) mysql show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.250 Master_User: mysync Master_Port: 2503306 Connect_Retry: 60 Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 3711 Relay_Log_File: HE1-relay-bin.000007 Relay_Log_Pos: 484 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Yes Slave_SQL_Running: Yes 此时从库状态是正常的但一旦主库对该条记录进行操作 HE3主库更新从库刚刚误删的数据 mysql update helei set textccc where id3; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql select * from helei; ---------- | id | text | ---------- | 1 | aa | | 2 | bb | | 3 | ccc | | 4 | ee | | 5 | ff | | 6 | gg | | 7 | hh | ---------- 7 rows in set (0.00 sec) HE1从库报错 mysql show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.250 Master_User: mysync Master_Port: 2503306 Connect_Retry: 60 Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 3918 Relay_Log_File: HE1-relay-bin.000007 Relay_Log_Pos: 484 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1032 Last_Error: Could not execute Update_rows event on table test.helei; Cant find record in helei, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the events master log mysql-bin.000005, end_log_pos 3887 Skip_Counter: 0 Exec_Master_Log_Pos: 3711 Relay_Log_Space: 1626 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows event on table test.helei; Cant find record in helei, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the events master log mysql-bin.000005, end_log_pos 3887这个mysql-bin.000005,end_log_pos 3887是主库的 Replicate_Ignore_Server_Ids: Master_Server_Id: 2503306 Master_UUID: f7c96432-f665-11e5-943f-000c2967a454 Master_Info_File: /data/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 160331 09:25:02 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec) 此时主从又不同步了,如果还去执行跳过错误操作主从恢复同步而且状态均为yes但这并不能解决该问题如果主库又更新该条记录那么还是会报相同错误而且pos号还会变这就导致了恢复时你不知道前一条的pos号导致丢失数据。 mysql stop slave; Query OK, 0 rows affected (0.00 sec) mysql set global sql_slave_skip_counter1; Query OK, 0 rows affected (0.00 sec) mysql start slave; Query OK, 0 rows affected (0.00 sec) mysql select * from helei; ------------ | id | text | ------------ | 1 | aa | | 2 | bb | | 4 | ee | | 5 | ff | | 6 | gg | | 7 | hh | | 8 | helei1 | ------------ 7 rows in set (0.00 sec) mysql show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.250 Master_User: mysync Master_Port: 2503306 Connect_Retry: 60 Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 4119 Relay_Log_File: HE1-relay-bin.000008 Relay_Log_Pos: 283 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Yes Slave_SQL_Running: Yes 这里虽然通过跳过错误达到恢复主从同步但如果主库又对该条记录更新 mysql update helei set textcccc where id3; Query OK, 1 row affected (0.00 sec) mysql show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.250 Master_User: mysync Master_Port: 2503306 Connect_Retry: 60 Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 4328 Relay_Log_File: HE1-relay-bin.000008 Relay_Log_Pos: 283 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1032 Last_Error: Could not execute Update_rows event on table test.helei; Cant find record in helei, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the events master log mysql-bin.000005, end_log_pos 4297 Skip_Counter: 0 Exec_Master_Log_Pos: 4119 Relay_Log_Space: 1435 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows event on table test.helei; Cant find record in helei, Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the events master log mysql-bin.000005, end_log_pos 4297 Replicate_Ignore_Server_Ids: Master_Server_Id: 2503306 Master_UUID: f7c96432-f665-11e5-943f-000c2967a454 Master_Info_File: /data/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 160331 09:33:34 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0
如何快速解决MySQL 1032 主从错误 GTID跳过SQL错误的脚本 #!/bin/bash #Author:wwa USER PWD HOST PORT REP REPPWD REPH REPP GTID$2 GTID_START$3 GTID_END$4 GTID_PURGE(){ echo GTID_UUID:$GTID, GTID_START:$GTID_START, GTID_END%GTID_END mysql -u$USER -p$PWD -h$HOST -P$PORT -e stop slave;reset slave;reset master;set global gtid_purged$GTID:$GTID_START-$GTID_END;CHANGE MASTER TO MASTER_HOST$REPH, MASTER_PORT$REPP, MASTER_USER$REP,MASTER_PASSWORD$REPPWD, master_auto_position1;start slave; sleep 1 mysql -u$USER -p$PWD -h$HOST -P$PORT -e show slave status\G; } GTID_SKIP(){ mysql -u$USER -p$PWD -h$HOST -P$PORT -e stop slave;set session gtid_next$GTID:$GTID_START;begin;commit;set session gtid_nextAUTOMATIC;start slave; } case $1 in GTID_PURGE) echo Start GTID_PURGE, transaction in $GTID between $GTID_START-$GTID_END will be skipped...... GTID_PURGE echo GTID_PURGE success...... ;; GTID_SKIP) echo Start GTID_SKIP, transaction $GTID:$GTID_START will be skipped...... GTID_SKIP echo GTID_SKIP success...... ;; *) echo $Usage: $0 {GTID_PURGE args1 args2 args3|GTID_SKIP args1 args2} exit 1 ;; esac GTID_PURGE() 当同步发生大量的错误时使用flush table with read lock锁住主库记录GTID的事务编号最后那个例如后面示例里面的142787然后数据同步到从库在参数中加上UUID(空格)起始事务编号(空格)中止事务编号 原理purge掉master log中同步数据的SCN之前的事务从同步时间点以后开始读取binlog 这样做的好处是不用去master操作清理binlog手抖清理了其他东西就不好了~ GTID_SKIP() 当发生少量的错误时使用show slave status\G找到UUID和出错的事务编号参数中加上 UUID(空格)事务编号 原理生成一个空事务来跳过原本出错的事务然后继续往下同步 关于如何判断GTID_SKIP()需要跳过的事务编号 假设出错时slave status是这个样子 箭头所指的代表从主库拉去的日志中包含哪些事务以编号的形式 方框所指的代表从库现在执行了哪些事务这里的意思就是从库已经执行了编号1到编号139595的事务 如果出错了说明139596事务出错了这时候执行脚本里的事务编号写上139596就行 实际使用效果如图截图时间截晚了事务ID对不上不过意思表达清楚了就好~ 结语任何发生于数据库上的操作一定要三思而后回车血的教训数不胜数所以验证无误的固定操作用脚本来实现是个不错的选择。 GTID 跳过脚本
跳过单个error
STOP SLAVE; SET gtid_next 3b977b7e-ed28-11e7-a8ff-b4969113b678:138609841; BEGIN;COMMIT; SET gtid_next AUTOMATIC; START SLAVE; show slave status \G; select * from performance_schema.replication_applier_status_by_worker;
同步事物工作状态查询
(rootlocalhost) 12:13:57 [(none)] select * from performance_schema.replication_applier_status_by_worker; ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 1 | 1800788 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144139359 | 0 | | 0000-00-00 00:00:00 | | | 2 | 1800789 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144139248 | 0 | | 0000-00-00 00:00:00 | | | 3 | 1800790 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144138411 | 0 | | 0000-00-00 00:00:00 | | | 4 | 1800791 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144128311 | 0 | | 0000-00-00 00:00:00 | | | 5 | 1800792 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144108749 | 0 | | 0000-00-00 00:00:00 | | | 6 | 1800793 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144108752 | 0 | | 0000-00-00 00:00:00 | | | 7 | 1800794 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144108746 | 0 | | 0000-00-00 00:00:00 | | | 8 | 1800795 | ON | 3b977b7e-ed28-11e7-a8ff-b4969113b678:144108747 | 0 | | 0000-00-00 00:00:00 | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- 8 rows in set (0.02 sec)
GTID_error 跳过脚本
#!/bin/bash passiforgot
#sql21mysql -uroot -p${pass} -e show slave status\G; 2/dev/null|grep -v Last_Error | grep executing transaction| awk {print $21}
#Errnomysql -uroot -p${pass} -e show slave status\G; |grep Last_SQL_Errno | awk -F : {print $2} #echo $sql21 #echo $Errno
for((i1;i100000;i)) do sql21mysql -uroot -p${pass} -e show slave status\G; 2/dev/null|grep -v Last_Error | grep executing transaction| awk {print $21}
Errnomysql -uroot -p${pass} -e show slave status\G; 2/dev/null|grep Last_SQL_Errno | awk -F : {print $2}
SBMmysql -uroot -p${pass} -e show slave status\G; 2/dev/null|grep Seconds_Behind_Master | awk -F : {print $2} #echo $sql21 #echo $Errno #sleep 0.5
if [ $Errno -eq 1061 ] || [$Errno -eq 1062 ] || [$Errno -eq 1217 ] || [$Errno -eq 1050 ] then echo ${1} echo $sql21 echo $Errno mysql -uroot -p${pass} -e STOP SLAVE;SET SESSION.GTID_NEXT ${sql21};BEGIN; COMMIT;SET SESSION.GTID_NEXT AUTOMATIC;START SLAVE; # mysql -uroot -p${pass} -e STOP SLAVE;SET SESSION.GTID_NEXT ${sql21};BEGIN; COMMIT;SET SESSION.GTID_NEXT AUTOMATIC;START SLAVE;SHOW SLAVE STATUS \G; echo mysql -uroot -p${pass} -e STOP SLAVE;SET SESSION.GTID_NEXT ${sql21};BEGIN; COMMIT;SET SESSION.GTID_NEXT AUTOMATIC;START SLAVE;SHOW SLAVE STATUS \G; else sleep 3 echo 同步延迟 ${SBM} 秒 fi done 线上MYSQL同步报错故障处理方法总结(必看篇)_Mysql_脚本之家
原脚本 #!/bin/bash # #check_mysql_slave_replication_status # # # parasum2 help_msg(){ cat help --------------------- Error Cause: you must input $parasum parameters! 1st : Host_IP 2st : Host_Port help exit } [ $# -ne ${parasum} ] help_msg #若参数不够打印帮助信息并退出 export HOST_IP$1 export HOST_PORt$2 MYUSERroot MYPASS123456 MYSQL_CMDmysql -u$MYUSER -p$MYPASS MailTitle #邮件主题 Mail_Address_MysqlStatusrootlocalhost.localdomain #收件人邮箱 time1$(date %Y%m%d%H%M%S) time2$(date %Y-%m-%d %H:%M:%S) SlaveStatusFile/tmp/salve_status_${HOST_PORT}.${time1} #邮件内容所在文件 echo --------------------Begin at: $time2 $SlaveStatusFile echo $SlaveStatusFile #get slave status ${MYSQL_CMD} -e show slave status\G $SlaveStatusFile #取得salve进程的状态 #get io_thread_status,sql_thread_status,last_errno 取得以下状态值 IOStatus$(cat $SlaveStatusFile|grep Slave_IO_Running|awk {print $2}) SQLStatus$(cat $SlaveStatusFile|grep Slave_SQL_Running |awk {print $2}) Errno$(cat $SlaveStatusFile|grep Last_Errno | awk {print $2}) Behind$(cat $SlaveStatusFile|grep Seconds_Behind_Master | awk {print $2}) echo $SlaveStatusFile if [ $IOStatus No ] || [ $SQLStatus No ];then #判断错误类型 if [ $Errno -eq 0 ];then #可能是salve线程未启动 $MYSQL_CMD -e start slave io_thread;start slave sql_thread; echo Cause slave threads doesnots running,trying start slsave io_thread;start slave sql_thread; $SlaveStatusFile MailTitle[Warning] Slave threads stoped on $HOST_IP $HOST_PORT elif [ $Errno -eq 1007 ] || [ $Errno -eq 1053 ] || [ $Errno -eq 1062 ] || [ $Errno -eq 1213 ] || [ $Errno -eq 1032 ]\ || [ Errno -eq 1158 ] || [ $Errno -eq 1159 ] || [ $Errno -eq 1008 ];then #忽略此些错误 $MYSQL_CMD -e stop slave;set global sql_slave_skip_counter1;start slave; echo Cause slave replication catch errors,trying skip counter and restart slave;stop slave ;set global sql_slave_skip_counter1;slave start; $SlaveStatusFile MailTitle[Warning] Slave error on $HOST_IP $HOST_PORT! ErrNum: $Errno else echo Slave $HOST_IP $HOST_PORT is down! $SlaveStatusFile MailTitle[ERROR]Slave replication is down on $HOST_IP $HOST_PORT ! ErrNum:$Errno fi fi if [ -n $Behind ];then Behind0 fi echo $Behind $SlaveStatusFile #delay behind master 判断延时时间 if [ $Behind -gt 300 ];then echo date %Y-%m%d %H:%M:%S slave is behind master $Bebind seconds! $SlaveStatusFile MailTitle[Warning]Slave delay $Behind seconds,from $HOST_IP $HOST_PORT fi if [ -n $MailTitle ];then #若出错或者延时时间大于300s则发送邮件 cat ${SlaveStatusFile} | /bin/mail -s $MailTitle $Mail_Address_MysqlStatus fi #del tmpfile:SlaveStatusFile $SlaveStatusFile
修改后脚本
只做了简单的整理修正了Behind为NULL的判断但均未测试
应可考虑增加
对修复执行结果的判断多条错误的循环修复、检测、再修复
取消SlaveStatusFile临时文件。
Errno、Behind两种告警分别发邮件告警正文增加show slave结果原文。
增加PATH以便加到crontab中。
考虑crontab中周期执行(加锁避免执行冲突、执行周期选择)
增加执行日志 #!/bin/sh # check_mysql_slave_replication_status # 参考:http://www.tianfeiyu.com/?p2062 Usage(){ echo Usage: echo $0 HOST PORT USER PASS } [ -z $1 -o -z $2 -o -z $3 -o -z $4 ] Usage exit 1 HOST$1 PORT$2 USER$3 PASS$4 MYSQL_CMDmysql -h$HOST -P$PORT -u$USER -p$PASS MailTitle #邮件主题 Mail_Address_MysqlStatusrootlocalhost.localdomain #收件人邮箱 time1$(date %Y%m%d%H%M%S) time2$(date %Y-%m-%d %H:%M:%S) SlaveStatusFile/tmp/salve_status_${HOST_PORT}.${time1} #邮件内容所在文件 echo --------------------Begin at: $time2 $SlaveStatusFile echo $SlaveStatusFile #get slave status ${MYSQL_CMD} -e show slave status\G $SlaveStatusFile #取得salve进程的状态 #get io_thread_status,sql_thread_status,last_errno 取得以下状态值 IOStatus$(cat $SlaveStatusFile|grep Slave_IO_Running|awk {print $2}) SQLStatus$(cat $SlaveStatusFile|grep Slave_SQL_Running |awk {print $2}) Errno$(cat $SlaveStatusFile|grep Last_Errno | awk {print $2}) Behind$(cat $SlaveStatusFile|grep Seconds_Behind_Master | awk {print $2}) echo $SlaveStatusFile if [ $IOStatus No -o $SQLStatus No ];then case $Errno in 0) # 可能是slave未启动 $MYSQL_CMD -e start slave io_thread;start slave sql_thread; echo Cause slave threads doesnots running,trying start slsave io_thread;start slave sql_thread; $SlaveStatusFile ;; 1007|1053|1062|1213|1032|1158|1159|1008) # 忽略这些错误 $MYSQL_CMD -e stop slave;set global sql_slave_skip_counter1;start slave; echo Cause slave replication catch errors,trying skip counter and restart slave;stop slave ;set global sql_slave_skip_counter1;slave start; $SlaveStatusFile MailTitle[Warning] Slave error on $HOST:$PORT! ErrNum: $Errno ;; *) echo Slave $HOST:$PORT is down! $SlaveStatusFile MailTitle[ERROR]Slave replication is down on $HOST:$PORT! Errno:$Errno ;; esac fi if [ $Behind NULL -o -z $Behind ];then Behind0 fi echo Behind:$Behind $SlaveStatusFile #delay behind master 判断延时时间 if [ $Behind -gt 300 ];then echo date %Y-%m%d %H:%M:%S slave is behind master $Bebind seconds! $SlaveStatusFile MailTitle[Warning]Slave delay $Behind seconds,from $HOST $PORT fi if [ -n $MailTitle ];then #若出错或者延时时间大于300s则发送邮件 cat ${SlaveStatusFile} | /bin/mail -s $MailTitle $Mail_Address_MysqlStatus fi #del tmpfile:SlaveStatusFile $SlaveStatusFile