Overview of Shredded Storage Whitepaper

Bill Baer has released a great overview of shredded storage, highlighting how different storage strategies evolved from SharePoint Portal Server 2001 on to present day:

From http://blogs.technet.com/b/wbaer/archive/2013/09/17/overview-of-shredded-storage-in-sharepoint-2013.aspx
“Shredded Storage is a new storage model implementation in SharePoint Server 2013 used to provide smoother I/O patterns, improve data transfer performance, and reduce storage utilization when using historical versions with SharePoint.

This whitepaper provides a background of SharePoint products storage evolution and the implementation specifics and benefits of Shredded Storage in SharePoint 2013.

Download: http://www.microsoft.com/en-us/download/details.aspx?id=39719

Some key FAQ’s:

Can Shredded Storage be disabled? 
No, Shredded Storage is enabled by default and cannot be disabled.

Does Shredded Storage work with Remote BLOB Storage (RBS)? 
Yes, Shredded Storage works with Remote BLOB Storage.

Can I prevent a file from being shredded? 
No, Shredded Storage cannot be disabled.

Is BLOB data shredded when I upgrade to SharePoint 2013?  
No.  BLOB data is not partitioned until that data has been accessed and saved back to the server.

Are there any changes to IOP requires? 
No.  The published I/O recommendations for SharePoint 2010 are applicable to SharePoint 2013.  For capacity planning information see also Capacity planning for SharePoint Server 2013.

Does Shredded Storage work with 3rd party RBS Providers? 
Yes.  Shredded Storage is compatible with 3rd-party RBS Providers.

Why SharePoint 2013? Some great reasons to upgrade from SharePoint 2010

We (Sean, Colin & I) we’re recently asked to offer some good reasons for one to consider upgrading from SharePoint 2010 to SharePoint 2013. Here’s some of the goodness:

Shredded Storage in SharePoint 2013

A hyped but under-explained new feature of SharePoint 2013 that will be of great interest to DBA’s and admins is Shredded Storage. If you are familiar with differential database backups, think of Shredded Storage as a differential method of storing documents. For example, in SharePoint 2010 when you save X versions of a document in a library with versioning turned on, it will literally put a new row into the SharePoint SQL DB containing 100% of that versions file contents.

With Shredded Stroage, SharePoint automagically parses the document contents as it goes into the DB and checks for duplicate elements. When you think about it, the XML-based nature of MS Office documents since Office 2007 makes this conceptually understandable: we are essentially doing a diff compare and de-duplicating XML nodes in the documents.

This will mean tremendous savings in SQL, disk, backup etc. resources and make us all wonder how we could ever stand for our systems blindly cloning out entire documents just to reflect typical tiny changes in content and structure.

From SharePoint Joel’s blog:

Shredded Storage – This is one of my favorite new features. I can’t wait to see what it does to our farm. Shredded storage will remove file duplicates and reduce the amount of content sent across the wire. You can find more on this in the IT pro decks.

This post by Jason Warren explains:

The first big change Mitch presented was Shredded Storage. This is pretty cool (I can’t find any info on TechNet, I really wish I could link to something). The goal of Shredded Storage is to make changes within the database equal to the size of the change, rather than to the size of the file. Essentially only the bits in a file that have changed (during an edit) are saved to the database. Lets take a look at how this works.

SharePoint 2010 uses the File Synchronization via SOAP of HTTP API (FSSHTTP, or “Cobalt”) to transfer Office files between the WFE and the user. This API sends only the changes, however, once the user saves a document back to the SharePoint the WFE has to query SQL for the full file to compare and merge the change, then resaves the item back to SQL. When working with versions, each record in the table for a file’s version contains a full copy of the version (e.g., generally a 1 MB file with 10 versions takes up 10 MB of disk). SharePoint 2013 uses Shredded Storage to reduce this duplication of data and save only the changes made during an edit to the database. This in effect means a file could consist of several records in a database, the sum of all records constituting the current version.

Mitch’s presentation had a nice animation of the steps for both how it worked in SP2010 and how it will work in SP2013, which I will attempt to translate into an ordered list:

In SharePoint 2010:
1.User requests document from WFE
2.WFE requests document from SQL Server
3.SQL reads full document
4.SQL sends full document to WFE
5.WFE sends full document to user
6.User edits and updates document
7.The updates are sent to the WFE (via Cobalt)
8.WFE requests full document from SQL Server and reads it
9.WFE merges updates
10.WFE writes full document back to SQL

In SharePoint 2013, this process has changed:
1.User requests document from WFE
2.WFE requests document from SQL Server
3.SQL reads full document
4.SQL sends full document to WFE
5.WFE sends full document to user
6.User edits and updates document
7.The updates are sent to the WFE (via Cobalt)
8.(Changed) WFE saves updates to SQL as a new record (all parts of a document are somehow associated together)

Shredded storage is always turned on and is implemented for all files within SharePoint (other items too?). The interesting thing is that because the work is done by SharePoint, Shredded Storage works with RBS as well. My understanding is that you end up with many smaller files on your RBS media that constitute a single file. In the end, to use our 1 MB example with 10 versions, you end up using less than 10 MB of total storage (assuming there were no large edits) as each version would save only the changes.