In working with SQL Server, you may have encountered errors related to unresolved deadlock and scheduler deadlock. Many people blend these errors together, but they are actually separate issues. In this post, we provide a high-level explanation of both types of deadlock errors.
In general, an unresolved deadlock is a SQL Server product issue. When an unresolved deadlock is identified, SQL Server adds a record on the error log and captures a mini-dump.
NOTE: The product bug mentioned in this section was resolved on Service Pack 2 of SQL Server 2005, and became a permanent fix beginning with SQL Server 2008.
On SQL Server, deadlocks are recognized and resolved by the Lock Monitor. The Lock Monitor constructs a lock graph – a partial visualization of which can be seen from the deadlock trace event. As the lock graph develops, the cycles can be tracked, detecting any deadlock conditions that exist. If a deadlock condition is detected, the Lock Monitor selects and aborts the victim transaction.
Unresolved deadlock occurs when SQL Server has identified the deadlock, but is unable to select a victim transaction to terminate.
In cases like this, SQL Server is not able to select an eligible victim transaction for termination because the target is unsafe. To provide a more clear example of an unsafe victim, consider the following scenario. Let’s assume that a session opens a transaction, completes some work, and encounters an error. Naturally, the session will begin rollback processing – but what if, due to a product bug, it has to wait on a lock that results in a deadlock loop?
Because the session is already in rollback, it is unsafe to force an abort that command. Rollback failure causes SQL Server to take the database offline. This results in the elimination of all active sessions on the database. The database will reopen to complete crash revovery after all connections are terminated. If this process is successful, the database will be available for use. If crash recovery fails, the database will be labeled as suspect. Clearly, this is not a desirable result for your production server.
Although more difficult, reproducing the issue is the quickest way to fix unresolved deadlock. A reproduction plus the associated mini-dump capture will allow your specialized support team to identify the source of the problem.
Scheduler Deadlock (Error 17884)
Scheduler deadlock is a completely separate issue from unresolved deadlock.
On SQL Server, a deadlocked scheduler is recognized by the Scheduler Monitor. A scheduler deadlock is flagged when the scheduler stops making forward progress. The scheduler deadlock algorithm examines the scheduler’s task qeue and monitors whether or not those tasks are being picked up.
Approximately every 5 seconds, the Scheduler Monitor completes a scheduler check, looping over the scheduler to check its operational status, based on criteria such as the number of yields and work processed.
Here is what the Schedule Monitor is checking:
- Are there any tasks in the scheduler queue waiting to be processed?
- Since the last check, have any new worker threads been created?
- Has any new work been processed since the last scheduler check?
After checking all schedulers within the node (there is a Scheduler Monitor per each node) if all schedulers are stuck – making no progress, a 17884 error condition (i.e. deadlocked scheduler) is flagged. If the 17884 condition lasts for an unwanted period, it is officially declared and added in the SQL Server error log. The error log will also record if a majority of the wait states are caused by a single resource.
The quickest way to attack a scheduler deadlock is to login with the dedicated admin connection (DAC) and query these two DMVs: sys.dm_os_wait as well as sys.dm_exec_requests.