在我的生产错误日志中,我偶尔会看到:

SQLSTATE[HY000]:一般错误:1205 超过锁等待超时;试一试 重新启动事务

我知道哪个查询在那个时刻试图访问数据库,但是是否有一种方法可以找出哪个查询在那个精确的时刻拥有锁?


暴露这一点的是“事务”这个词。从语句中可以明显看出,该查询试图更改一个或多个InnoDB表中的至少一行。

因为您知道这个查询,所以所有被访问的表都可能是罪魁祸首。

从那里,你应该可以运行SHOW ENGINE INNODB STATUS\G

您应该能够看到受影响的表

你会得到各种额外的锁定和互斥信息。

下面是我的一个客户的例子:

mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
110514 19:44:14 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 4 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 9014315, signal count 7805377
Mutex spin waits 0, rounds 11487096053, OS waits 7756855
RW-shared spins 722142, OS waits 211221; RW-excl spins 787046, OS waits 39353
------------------------
LATEST FOREIGN KEY ERROR
------------------------
110507 21:41:35 Transaction:
TRANSACTION 0 606162814, ACTIVE 0 sec, process no 29956, OS thread id 1223895360 updating or deleting, thread declared inside InnoDB 499
mysql tables in use 1, locked 1
14 lock struct(s), heap size 3024, 8 row lock(s), undo log entries 1
MySQL thread id 3686635, query id 124164167 10.64.89.145 viget updating
DELETE FROM file WHERE file_id in ('6dbafa39-7f00-0001-51f2-412a450be5cc' )
Foreign key constraint fails for table `backoffice`.`attachment`:
,
  CONSTRAINT `attachment_ibfk_2` FOREIGN KEY (`file_id`) REFERENCES `file` (`file_id`)
Trying to delete or update in parent table, in index `PRIMARY` tuple:
DATA TUPLE: 17 fields;
 0: len 36; hex 36646261666133392d376630302d303030312d353166322d343132613435306265356363; asc 6dbafa39-7f00-0001-51f2-412a450be5cc;; 1: len 6; hex 000024214f7e; asc   $!O~;; 2: len 7; hex 000000400217bc; asc    @   ;; 3: len 2; hex 03e9; asc   ;; 4: len 2; hex 03e8; asc   ;; 5: len 36; hex 65666635323863622d376630302d303030312d336632662d353239626433653361333032; asc eff528cb-7f00-0001-3f2f-529bd3e3a302;; 6: len 40; hex 36646234376337652d376630302d303030312d353166322d3431326132346664656366352e6d7033; asc 6db47c7e-7f00-0001-51f2-412a24fdecf5.mp3;; 7: len 21; hex 416e67656c73204e6f7720436f6e666572656e6365; asc Angels Now Conference;; 8: len 34; hex 416e67656c73204e6f7720436f6e666572656e6365204a756c7920392c2032303131; asc Angels Now Conference July 9, 2011;; 9: len 1; hex 80; asc  ;; 10: len 8; hex 8000124a5262bdf4; asc    JRb  ;; 11: len 8; hex 8000124a57669dc3; asc    JWf  ;; 12: SQL NULL; 13: len 5; hex 8000012200; asc    " ;; 14: len 1; hex 80; asc  ;; 15: len 2; hex 83e8; asc   ;; 16: len 4; hex 8000000a; asc     ;;

But in child table `backoffice`.`attachment`, in index `PRIMARY`, there is a record:
PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 30; hex 36646261666133392d376630302d303030312d353166322d343132613435; asc 6dbafa39-7f00-0001-51f2-412a45;...(truncated); 1: len 30; hex 38666164663561652d376630302d303030312d326436612d636164326361; asc 8fadf5ae-7f00-0001-2d6a-cad2ca;...(truncated); 2: len 6; hex 00002297b3ff; asc   "   ;; 3: len 7; hex 80000040070110; asc    @   ;; 4: len 2; hex 0000; asc   ;; 5: len 30; hex 416e67656c73204e6f7720436f6e666572656e636520446f63756d656e74; asc Angels Now Conference Document;;

------------
TRANSACTIONS
------------
Trx id counter 0 620783814
Purge done for trx's n:o < 0 620783800 undo n:o < 0 0
History list length 35
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1192212800
MySQL thread id 5341758, query id 189708501 127.0.0.1 lwdba
show innodb status
---TRANSACTION 0 620783788, not started, process no 29956, OS thread id 1196472640
MySQL thread id 5341773, query id 189708353 10.64.89.143 viget
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1223895360
MySQL thread id 5341667, query id 189706152 10.64.89.145 viget
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1227888960
MySQL thread id 5341556, query id 189699857 172.16.135.63 lwdba
---TRANSACTION 0 620781112, not started, process no 29956, OS thread id 1222297920
MySQL thread id 5341511, query id 189696265 10.64.89.143 viget
---TRANSACTION 0 620783736, not started, process no 29956, OS thread id 1229752640
MySQL thread id 5339005, query id 189707998 10.64.89.144 viget
---TRANSACTION 0 620783785, not started, process no 29956, OS thread id 1198602560
MySQL thread id 5337583, query id 189708349 10.64.89.145 viget
---TRANSACTION 0 620783469, not started, process no 29956, OS thread id 1224161600
MySQL thread id 5333500, query id 189708478 10.64.89.144 viget
---TRANSACTION 0 620781240, not started, process no 29956, OS thread id 1198336320
MySQL thread id 5324256, query id 189708493 10.64.89.145 viget
---TRANSACTION 0 617458223, not started, process no 29956, OS thread id 1195141440
MySQL thread id 736, query id 175038790 Has read all relay log; waiting for the slave I/O thread to update it
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
519878 OS file reads, 18962880 OS file writes, 13349046 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 6.25 writes/s, 4.50 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1190, seg size 1192,
174800 inserts, 174800 merged recs, 54439 merges
Hash table size 35401603, node heap has 35160 buffer(s)
0.50 hash searches/s, 11.75 non-hash searches/s
---
LOG
---
Log sequence number 28 1235093534
Log flushed up to   28 1235093534
Last checkpoint at  28 1235091275
0 pending log writes, 0 pending chkp writes
12262564 log i/o's done, 3.25 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 18909316674; in additional pool allocated 1048576
Dictionary memory allocated 2019632
Buffer pool size   1048576
Free buffers       175763
Database pages     837653
Modified db pages  6
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 770138, created 108485, written 7795318
0.00 reads/s, 0.00 creates/s, 4.25 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 29956, id 1185823040, state: sleeping
Number of rows inserted 6453767, updated 4602534, deleted 3638793, read 388349505551
0.25 inserts/s, 1.25 updates/s, 0.00 deletes/s, 2.75 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set, 1 warning (0.00 sec)

你应该考虑通过设置innodb_lock_wait_timeout来增加InnoDB的锁等待超时时间,默认值是50秒

mysql> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50    |
+--------------------------+-------+
1 row in set (0.01 sec)

你可以用这一行在/etc/my.cnf中永久地将它设置为更高的值

[mysqld]
innodb_lock_wait_timeout=120

重新启动mysql。如果你不能在这个时候重新启动mysql,运行这个:

SET GLOBAL innodb_lock_wait_timeout = 120; 

您也可以将它设置为会话的持续时间

SET innodb_lock_wait_timeout = 120; 

然后是你的问题

看看pt-deadlock-logger实用程序的手册页:

brew install percona-toolkit
pt-deadlock-logger --ask-pass server_name

它从上面提到的引擎innodb状态中提取信息 它可以用来创建一个守护进程,每30秒运行一次。

从上面Rolando的回答推断,是这些阻碍了你的查询:

---TRANSACTION 0 620783788, not started, process no 29956, OS thread id 1196472640
MySQL thread id 5341773, query id 189708353 10.64.89.143 viget

如果你需要执行你的查询,不能等待其他人运行,使用MySQL线程id杀死他们:

kill 5341773 <replace with your thread id>

(从mysql内部,而不是shell,显然)

你必须找到线程id从:

show engine innodb status\G

命令,并找出哪个是阻塞数据库的程序。

正如有人在关于这个问题的众多SO线程中的一个线程中提到的:有时已经锁定表的进程在进程列表中显示为休眠!我非常着急,直到我杀死了数据库中打开的所有睡眠线程(当时没有一个是活动的)。这最终解锁了表并让更新查询运行。

评论者说了类似于“有时MySQL线程锁定了一个表,然后在等待与MySQL无关的事情发生时进入睡眠状态。”

在重新检查了show engine innodb状态日志后(一旦我找到了负责锁的客户端),我注意到被卡住的线程被列在事务列表的最底部,在由于锁冻结而即将出错的活动查询下面:

------------------
---TRANSACTION 2744943820, ACTIVE 1154 sec(!!)
2 lock struct(s), heap size 376, 2 row lock(s), undo log entries 1
MySQL thread id 276558, OS thread handle 0x7f93762e7710, query id 59264109 [ip] [database] cleaning up
Trx read view will not see trx with id >= 2744943821, sees < 2744943821

(不确定“Trx读视图”消息是否与冻结锁相关,但与其他活动事务不同的是,这个事务不显示与发出的查询一起,而是声称事务正在“清理”,但有多个行锁)

这个故事的寓意是,即使线程处于睡眠状态,事务也可以是活动的。

你可以使用:

show full processlist

它将列出MySQL中的所有连接和连接的当前状态以及正在执行的查询。还有一个更短的变种show processlist;它显示截断的查询以及连接统计信息。

争用越多,出现死锁的可能性就越大,DB引擎将通过对其中一个死锁事务进行超时处理来解决这个问题。

此外,已修改(例如UPDATE或DELETE)大量条目的长时间运行的事务更有可能与其他事务产生冲突。

虽然InnoDB MVCC,你仍然可以使用FOR UPDATE子句请求显式锁。然而,与其他流行的db (Oracle, MSSQL, PostgreSQL, DB2)不同,MySQL使用REPEATABLE_READ作为默认隔离级别。

现在,您获得的锁(通过修改行或使用显式锁定)将在当前运行的事务期间保持。如果你想了解REPEATABLE_READ和READ COMMITTED在锁方面的区别,请阅读这篇Percona文章。

在REPEATABLE READ中,事务期间获得的每个锁都被持有 在事务持续时间内。 在READ COMMITTED中,不匹配扫描的锁在STATEMENT完成后被释放。 ... 这意味着在READ COMMITTED中,其他事务可以自由地更新它们在update语句完成后无法更新的行(在REPEATABLE READ中)。

因此:隔离级别(REPEATABLE_READ, SERIALIZABLE)越严格,死锁的可能性就越大。这不是“本质上”的问题,而是一种权衡。

使用READ_COMMITTED可以获得非常好的结果,因为在使用跨越多个HTTP请求的逻辑事务时,需要防止应用程序级的丢失更新。乐观锁定方法针对即使使用SERIALIZABLE隔离级别也可能发生的丢失更新,同时通过允许使用READ_COMMITTED来减少锁争用。

这个异常的最大问题是,它通常在测试环境中不可重现,当它发生在prod上时,我们无法运行innodb引擎状态。所以在其中一个项目中,我把下面的代码放入了这个异常的catch块中。这帮助我在异常发生时捕捉引擎状态。这帮了大忙。

Statement st = con.createStatement();
ResultSet rs =  st.executeQuery("SHOW ENGINE INNODB STATUS");
while(rs.next()){
    log.info(rs.getString(1));
    log.info(rs.getString(2));
    log.info(rs.getString(3));
}

在记录中,锁等待超时异常也发生在出现死锁且MySQL无法检测到它时,因此它只是超时。另一个原因可能是一个运行时间非常长的查询,但是它更容易解决/修复,我在这里不描述这种情况。

如果死锁在两个事务中被“正确地”构造,MySQL通常能够处理死锁。然后MySQL只是杀死/回滚一个拥有较少锁的事务(不太重要,因为它影响的行较少),并让另一个事务完成。

现在,让我们假设有两个进程A和B和3个事务:

Process A Transaction 1: Locks X
Process B Transaction 2: Locks Y
Process A Transaction 3: Needs Y => Waits for Y
Process B Transaction 2: Needs X => Waits for X
Process A Transaction 1: Waits for Transaction 3 to finish

(see the last two paragraph below to specify the terms in more detail)

=> deadlock 

这是一个非常不幸的设置,因为MySQL无法看到有死锁(跨越3个事务)。MySQL做的是…没有什么!它只是等待,因为它不知道该做什么。它等待直到第一个获得的锁超过超时(进程A事务1:锁X),然后这将解锁锁X,从而解锁事务2等。

The art is to find out what (which query) causes the first lock (Lock X). You will be able to see easily (show engine innodb status) that Transaction 3 waits for Transaction 2, but you will not see which transaction Transaction 2 is waiting for (Transaction 1). MySQL will not print any locks or query associated with Transaction 1. The only hint will be that at the very bottom of the transaction list (of the show engine innodb status printout), you will see Transaction 1 apparently doing nothing (but in fact waiting for Transaction 3 to finish).

关于如何找到哪个SQL查询导致锁(lock X)被授予正在等待的给定事务的技术,在此处描述了在长时间运行的事务中跟踪MySQL查询历史

如果您想知道示例中的流程和事务究竟是什么。该进程是一个PHP进程。Transaction是由innodb-trx-table定义的事务。在我的例子中,我有两个PHP进程,在每个进程中我手动启动一个事务。有趣的是,即使我在一个进程中启动了一个事务,MySQL内部实际上使用了两个独立的事务(我不知道为什么,也许MySQL开发人员可以解释)。

MySQL is managing its own transactions internally and decided (in my case) to use two transactions to handle all the SQL requests coming from the PHP process (Process A). The statement that Transaction 1 is waiting for Transaction 3 to finish is an internal MySQL thing. MySQL "knew" the Transaction 1 and Transaction 3 were actually instantiated as part of one "transaction" request (from Process A). Now the whole "transaction" was blocked because Transaction 3 (a subpart of "transaction") was blocked. Because "transaction" was not able to finish the Transaction 1 (also a subpart of the "transaction") was marked as not finished as well. This is what I meant by "Transaction 1 waits for Transaction 3 to finish".

激活MySQL general.log(磁盘密集型)并使用mysql_analyse_general_log.pl来提取长时间运行的事务,例如:

——min-duration=你的innodb_lock_wait_timeout值

然后禁用general.log。

如果您正在使用JDBC,那么您可以选择 includeInnodbStatusInDeadlockExceptions = true

https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html

下面是我最终不得不做的事情,以找出是什么“其他查询”导致了锁定超时问题。在应用程序代码中,我们在专用于此任务的单独线程上跟踪所有挂起的数据库调用。如果任何DB调用的时间超过n秒(对我们来说是30秒),我们记录:

-- Pending InnoDB transactions
SELECT * FROM information_schema.innodb_trx ORDER BY trx_started; 

-- Optionally, log what transaction holds what locks
SELECT * FROM information_schema.innodb_locks;

通过上述方法,我们能够精确定位锁定导致死锁的行的并发查询。在我的例子中,它们是像INSERT…与普通SELECT不同,SELECT锁定底层行。然后可以重新组织代码或使用不同的事务隔离(如read uncommitted)。

好运!

在我的例子中,我只能使用SELECT命令查询数据库,而不能使用UPDATE和DELETE命令。

这个问题是由于一根线卡住了。它没有响应,状态为“sleep”。所以,我必须杀了它。

1-查找线程ID:

select * FROM information_schemaprocesslist ORDER BY id

2-对我来说,它返回了两个线程。现在的那个和卡住的那个。我使用“kill THREAD_ID”来杀死它。

更多信息请看这里 https://oracle-base.com/articles/mysql/mysql-killing-threads