Shredded Storage in SharePoint 2013

A hyped but under-explained new feature of SharePoint 2013 that will be of great interest to DBA’s and admins is Shredded Storage. If you are familiar with differential database backups, think of Shredded Storage as a differential method of storing documents. For example, in SharePoint 2010 when you save X versions of a document in a library with versioning turned on, it will literally put a new row into the SharePoint SQL DB containing 100% of that versions file contents.

With Shredded Stroage, SharePoint automagically parses the document contents as it goes into the DB and checks for duplicate elements. When you think about it, the XML-based nature of MS Office documents since Office 2007 makes this conceptually understandable: we are essentially doing a diff compare and de-duplicating XML nodes in the documents.

This will mean tremendous savings in SQL, disk, backup etc. resources and make us all wonder how we could ever stand for our systems blindly cloning out entire documents just to reflect typical tiny changes in content and structure.

From SharePoint Joel’s blog:

Shredded Storage – This is one of my favorite new features. I can’t wait to see what it does to our farm. Shredded storage will remove file duplicates and reduce the amount of content sent across the wire. You can find more on this in the IT pro decks.

This post by Jason Warren explains:

The first big change Mitch presented was Shredded Storage. This is pretty cool (I can’t find any info on TechNet, I really wish I could link to something). The goal of Shredded Storage is to make changes within the database equal to the size of the change, rather than to the size of the file. Essentially only the bits in a file that have changed (during an edit) are saved to the database. Lets take a look at how this works.

SharePoint 2010 uses the File Synchronization via SOAP of HTTP API (FSSHTTP, or “Cobalt”) to transfer Office files between the WFE and the user. This API sends only the changes, however, once the user saves a document back to the SharePoint the WFE has to query SQL for the full file to compare and merge the change, then resaves the item back to SQL. When working with versions, each record in the table for a file’s version contains a full copy of the version (e.g., generally a 1 MB file with 10 versions takes up 10 MB of disk). SharePoint 2013 uses Shredded Storage to reduce this duplication of data and save only the changes made during an edit to the database. This in effect means a file could consist of several records in a database, the sum of all records constituting the current version.

Mitch’s presentation had a nice animation of the steps for both how it worked in SP2010 and how it will work in SP2013, which I will attempt to translate into an ordered list:

In SharePoint 2010:
1.User requests document from WFE
2.WFE requests document from SQL Server
3.SQL reads full document
4.SQL sends full document to WFE
5.WFE sends full document to user
6.User edits and updates document
7.The updates are sent to the WFE (via Cobalt)
8.WFE requests full document from SQL Server and reads it
9.WFE merges updates
10.WFE writes full document back to SQL

In SharePoint 2013, this process has changed:
1.User requests document from WFE
2.WFE requests document from SQL Server
3.SQL reads full document
4.SQL sends full document to WFE
5.WFE sends full document to user
6.User edits and updates document
7.The updates are sent to the WFE (via Cobalt)
8.(Changed) WFE saves updates to SQL as a new record (all parts of a document are somehow associated together)

Shredded storage is always turned on and is implemented for all files within SharePoint (other items too?). The interesting thing is that because the work is done by SharePoint, Shredded Storage works with RBS as well. My understanding is that you end up with many smaller files on your RBS media that constitute a single file. In the end, to use our 1 MB example with 10 versions, you end up using less than 10 MB of total storage (assuming there were no large edits) as each version would save only the changes.

cobalt, shredded storage

Comments (2)

Leave a Reply

Your email address will not be published. Required fields are marked *