Geek City: What gets logged for index rebuild operations?

This blog post was inspired by a question from a future student. Someone who was already booked for my SQL Server Internals class asked for some information on a current problem he was having with transaction log writes causing excessive wait times during index rebuild operations when run in ONLINE mode. He wanted to know if switching to BULK_LOGGED recovery could help.

I knew the difference between ALTER INDEX in FULL vs BULK_LOGGED recovery when doing normal OFFLINE rebuilds, but I wasn't sure about ONLINE. So I dug out some old scripts, massaged them a bit, and ran some tests.

A minimally logged operation is one that does not always log every single row. In FULL mode, SQL Server will log the entire page as each page is filled, so that we end up with individual  log records of over 8K. In BULK_LOGGED mode, SQL Server just logs the fact that a new page was generated (and some information about the page) and doesn't log the full 8K. So the log space used is quite a bit less.

I will use Microsoft's AdventureWorks2014 database that you can download from github. I used the SQL Server 2005 version.

After making sure the database is set to FULL recovery, I make a copy of the Sales.SalesOrderDetail table (which I call Details) and build a clustered index on it.  The table has about 1580 pages and 121,317 rows.


ALTER DATABASE AdventureWorks2014 SET RECOVERY FULL
GO
USE AdventureWorks2014;
GO
DROP TABLE IF EXISTS Details;
GO     
 
SELECT * INTO Details
FROM Sales.SalesOrderDetail;
GO
CREATE CLUSTERED INDEX CL_details_indx ON Details(SalesOrderID);
GO

I then build a table to store log information that I will gather after the different index rebuild operations. After that I back up the database to make sure I have a base for making log backups. (You might need to create the folder to store the backups.)

DROP TABLE IF EXISTS AlterIndexLoggingData;
GO

CREATE TABLE AlterIndexLoggingData
(DBRecModel CHAR(12), OnlineIndexBuild CHAR(3), MaxLogRowSize INT, SumLogSize BIGINT, NumLogRows INT, NumLargeRows INT);
GO
BACKUP DATABASE AdventureWorks2014 TO DISK = 'c:\backups\AW.bak' WITH INIT;
GO

Now starts the testing. There are four sections, each of which basically does the same thing, with the four permutations of FULL vs BULK_LOGGED recovery, and ONLINE vs OFFLINE index builds.  Each section backs up the transaction log to truncate it, and then reports the number of rows in the log, just to verify we really have (logically) cleared it out. It then performs a rebuild of the index on the Details table. Finally, it captures the following information from the log using fn_dblog:

  • Max log record size
  • Sum of log record sizes
  • Number of log records
  • Number of log records larger than 8K

----  FULL Recovery, ONLINE = OFF

BACKUP LOG AdventureWorks2014 TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
GO
ALTER INDEX  CL_details_indx  ON Details REBUILD;
GO

INSERT INTO AlterIndexLoggingData
SELECT 'FULL', 'OFF',
        MAX([Log Record Length]),
        SUM([Log Record Length]),
        COUNT(*),
        (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
        FROM fn_dblog(null, null);
GO
 
 
--- FULL Recovery, ONLINE = ON


BACKUP LOG AdventureWorks2016 TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
GO

ALTER INDEX  CL_details_indx  ON Details REBUILD
WITH (ONLINE = ON);
GO

INSERT INTO AlterIndexLoggingData
SELECT 'FULL', 'ON',
        MAX([Log Record Length]),
        SUM([Log Record Length]),
        COUNT(*),
        (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
        FROM fn_dblog(null, null);
GO
--SELECT * FROM AlterIndexLoggingData;
GO

--- BULK_LOGGED Recovery; ONLINE = OFF


ALTER DATABASE AdventureWorks2014 SET RECOVERY BULK_LOGGED;
BACKUP LOG AdventureWorks2014 TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
GO

ALTER INDEX  CL_details_indx  ON Details REBUILD;
GO

INSERT INTO AlterIndexLoggingData
SELECT 'BULK_LOGGED', 'OFF',
        MAX([Log Record Length]),
        SUM([Log Record Length]),
        COUNT(*),
        (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
        FROM fn_dblog(null, null);
GO
--SELECT * FROM AlterIndexLoggingData;
GO

--- BULK_LOGGED Recovery, ONLINE = ON


BACKUP LOG AdventureWorks2014 TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
GO

ALTER INDEX  CL_details_indx  ON Details REBUILD WITH (ONLINE = ON);
GO

INSERT INTO AlterIndexLoggingData
SELECT 'BULK_LOGGED', 'ON',
        MAX([Log Record Length]),
        SUM([Log Record Length]),
        COUNT(*),
        (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
        FROM fn_dblog(null, null);
GO
SELECT * FROM AlterIndexLoggingData;
GO

Here are my results:

Figure 1.png

Notice the following:

  • Only FULL recovery with an OFFLINE rebuild logged the 8K pages.
  • FULL recovery with an ONLINE rebuild actually did log every single row; in fact, it looks like at least two log records were generated for each row. There is a LOT of logging for this set of options and by far, the largest amount of log space used.
  • BULK_LOGGED mode used substantially less log space than FULL, but the difference between ONLINE and OFFLINE is even more pronounced. In FULL recovery, ONLINE used about twice the log space, in BULK_LOGGED, the logging required more than 8 times the space!

I'm sure there are more fun facts you can pull out of the results from fn_dblog, but for my first look at the logging differences between ONLINE and OFFLINE index rebuilds, this is what I wanted to know. It's not a complete answer to the student's question, but it does indicate there is a lot more logging with ONLINE index builds, so it should not be surprising if there are a lot more log write waits. And I would assume that switching to BULK_LOGGED recovery could make a noticeable difference.

I hope this is useful!

~Kalen