SharePoint on Virtual Machines – Is Disk Defragmenting necessary for Performance?

Recently I got into it a bit of discourse with an IT dept. that was confident that defragmenting the Guest OS’s NTFS file system (with a SharePoint farm on it) was not necessary because they are using SAN/VMWare. I have always found it a bit weird how little attention is typically paid to disk defragmentation on IIS servers – typically one get’s the vibe that disk defragmenting is such a basic task that it should be relegated to Grandma’s bag of tricks.

The rise of Virtual Machines and fancier SAN’s, solid state disks et al has seemed to push defragmenting to an even dorkier dinosaur status. It’s a popular belief that huge disk buffer caches and smart host disks sort out the guest machines messy kitchen at the lower levels, and that defragging at the guest level is now an obsolete, ineffectual and even I/O degrading process.

VMWare has the following post regarding it:
http://blogs.vmware.com/vsphere/2011/09/should-i-defrag-my-guest-os.html

Sounds good, guess we are all sold right? IT Dept. get’s one less maintenance plan to worry about, more incentive to virtualize everything, life’s great. Aha.. but the Commenters are smart cookies too:

Andrew:

So while I am in agreement with some of the assertions in this article, I feel the conclusion is not complete. One overriding rule taken from the experience I’ve had with vmware (primarily storage) is that the entire stack must be optimized for best results or it will cause stress on the next weak point elsewhere in the stack. Focusing on one part while ignoring another will not yield best results. I feel that was done in this write up by focusing on the lower layers (SAN, channel, and VMFS), but completely missed the guest layer!

He is accurate in speaking that as a rule, running defrag against TP (thin provision), LC (linked clone), or auto-tiering, is a bad idea and should be avoided. However, in the case of systems that are designed at the outset to be a high IO/low latency NTFS filesystem, TP and LC wouldn’t be used, and auto-tiering hasn’t been around long enough to employ. Thus, we’ll assume in this conversation we’re using a plain-jane thick-provisioned FC disk on a shared VMFS filesystem.SAN technology abstracts physical disk from the server. This is well known and understood: the ESX doesn’t talk to the disks, it talks to the cache on the frame, thus a defrag operation (take block at location A and move to location B) doesn’t really “move” the block, as the cache deals with that, so defrag will not have any benefits at the SAN layer. Additionally, by its nature, vmware will always be pure random IO from the frame’s perspective, and defrag can’t gain us anything there either.

Now the big part that the author failed to look at: how things are from the NTFS point of view in the guest OS. This is a HUGE consideration. Every file location on NTFS volume is tracked in the MFT (master file table). The MFT is a flat linear file and 1024 bytes is allocated per MFT entry that holds file attributes and extent data which describes each extent that a file sits on in the file system. An extent in this context is defined as a series of contiguous NTFS clusters (blocks). A contiguous file has one extent entry, essentially “Starting offset and length”. A fragmented file can have many extent entries. Additionally a heavily fragmented file may fill up the 1024 bytes for its MFT table entry and it would have to append a new MFT entry to continue with the extent descriptors. Remember I said it’s a linear table, thus it can only be placed at the end of the MFT. Now lets take the fragmentation to extreme and the MFT reserved space fills up? The system will start taking free space blocks and reserving IT for the MFT, now the MFT itself is fragmented. So instead of the guest OS issuing two reads (one for the MFT table entry and one for the actual data), it would have to do multiple reads just to get the MFT and then many more additional reads to read in each extent. Now multiply this times the quantity of systems chattering down the same FC channel to the VMFS and you quickly have performance degradation and the CIO calling you asking why his email is taking forever to load up. 🙂 Granted, that one can likely mask this by implementing auto-tiering at the SAN or widening the IO channel, or setting up preferential shares/limits on VM IO access, at the end of the day the stack is not optimized and it will have to be addressed.

In conclusion, while I feel that the author makes valid points that defrag is not necessary for the most general of scenarios, it doesn’t look at the entire ecosystem as a whole, and thusly is flawed. For maximum performance and efficiency to scale (up or out), I still advocate that defrag does have real and tangible benefits in the virtual environment and should be implemented on targeted systems where warranted based on the storage characteristics of those workloads.

and another one- from Bob Nolan, president of the defrag vendor Raxco who make PerfectDisk:

Bob Nolan

Andrew’s comments about NTFS are the essence of this problem. The work is being done in the guest and NTFS behavior effects everything downstream including VMware and the disks.

Our company is a VMware Elite TAP member and the developers of a Windows guest optimizer. Last year we worked with Scott Drumonds from VMware’s performance engineering group to quantify the benefits of defragmenting guest servers. We used VMware’s vscsiStats utility to collect the data and there were several metrics that were very interesting.

1. Total I/O across the stack was reduced by 28%.
2. A 12x increase in the largest I/O transfers
3. A 50% reduction in I/O taking more than 30ms to complete
4. Sequential I/O increased by over 50%
5. System throughput increased 28%.

Fewer and larger I/O produce fewer SCSI commands across the stack. This in turn reduces physical I/O to the disk with a positive effect on disk latency and throughput.

Increasingly vendors are including features that work around and/or accommodate VMware. Allowing or disallowing defrag based on VM drive type and setting optimization strategy is one example. There are also strategies for working with thin-provisioning.

Bob backs up this direction on the following posts which expand on a white paper at their company blog (reposted from VMBlog.com):

White Paper: Maximizing VMware ESX Performance Through Defragmentation of Guest Systems

Bob Nolan

Virtualized platforms can suffer from resource contention issues. Why wouldn’t performance be an issue when there are multiple instances of Windows Server running on the same physical machine? Performance issues on virtual machines stem from their competition for a finite pool of CPU, memory and disk resources. Of these, disk resources are the most important since the disk is the slowest component of the three. To the extent you are hammering away at the disk, you are also consuming excess CPU and memory, depriving other VMs access to these resources. In this two-part article, we will look at one potential source of IO problems, its impact on virtual performance and possible solutions.

Just like their physical counterparts, performance issues on virtual machines stem from their competition for a finite pool of CPU, memory and disk resources.
SIOC is a practical workaround, but it doesn’t solve the problem. Rather, it trades symptoms.
The source of resource contention in many virtualized environments is caused by the Windows NT File System (NTFS).
When IO performance issues arise, the tendency is to look at the storage as the problem. But if the IO load from the guest is creating a latency issue by saturating the queues, isn’t it likely NTFS is the source?
To fully understand the relationship between NTFS and poor virtualization performance, we need to understand what is going on inside the Windows guest.

Here are the links to the full posts where he references Microsoft, and VMWares own best practices to present a compelling argument FOR defragmenting guest Windows OS’s:

How NTFS Causes IO Bottlenecks on Virtual Machines Part 1
How NTFS Causes IO Bottlenecks on Virtual Machines Part 2

My Conclusion

While Bob Nolan obviously has a vested interest in selling defragmentation software, his is the only analysis of the VM Guest Defragmentation argument that I have as of yet seen backed up by hard data and tests. But wait, there’s more!

Let’s bring it up a notch and note that in his report he references another white paper, this one from 2008 by David Goebel, one of the 4 engineers who wrote the NTFS file system:

http://www.balder.com/ImpactofFreeSpaceConsolidation.pdf

.. I don’t know how much more authoritative we can get than that. Who ya got, my moneys on the guy who wrote NTFS.

VMware may state in the earlier linked blog post that defragging is not required, but they are also have their own vested interested in pushing SAN and hardware controllers product lines too.

This data combined with my decade-plus experience of seeing direct noticeable performance improvements on IIS boxes after full defragmentation and implementation of regular defrags, located on physical or virtual disks, leads to me to stand by my guns: I will continue to recommend disk defragging. Would love to see if someone can change my mind..

disk defragment, guest os, san, vmware

Comments (3)

Leave a Reply

Your email address will not be published. Required fields are marked *