Automatic ASP.NET connection string encryption

Encrypting your connection strings is a good practice as it obfuscates your database connection information from an attacker who has gained filesystem access to your web server. It’s a bit of a pain to implement if you’re wanting to keep all of your build happening on a CI server however, as encrypting it requires a executing aspnet_regiis on the target server because the encryption is keyed by the machine key of the destination server.

The other day I was reading Thomas Stern’s article about encrypting Sitecore connection strings and it got me to thinking of ways to make connection string encryption fully automatic with minimal configuration. Turns out it’s actually quite easy.

There are actually quite nice C# APIs to encrypt and decrypt so I set to try and make the application self-encrypt on startup using these APIs. That way you can deploy an unencrypted configuration file from CI, and as soon as the app starts up (which is immediately in my case since we run some HTTP calls to it as part of the deploy) it will encrypt itself using the local machine key.

To use this code example you will need to install the WebActivator NuGet package that allows it to register itself as an application startup task. You can also just call the AutoEncrypt method in Global.asax for the same effect, but WebActivator makes it completely self-contained.

This example only runs encryption if <compilation debug="false"> is in the web.config (so for local debugging you keep yourself decrypted), and the section has not already been encrypted. It is also fully compatible with the default App_Config\ConnectionStrings.config that Sitecore uses as the config reader is smart enough to save the right file.

Hope that’s helpful.

Synthesis 7.2 released

Synthesis 7.2 has been released on NuGet. This release includes a number of enhancements to the content search APIs to accomdate more real world situations, as well as several general bug fixes and improvements.

Synthesis 7.2 only runs on Sitecore 7.2.x due to changes in the content search APIs in 7.2. It is hypothetically compatible with Sitecore 7.5 as that is supposed to be based on the 7.2 core but I have not tested it. Previous versions of Synthesis are not compatible with Sitecore 7.2 because they are compiled against the earlier content search APIs.

So what’s changed?

  • The Uri property on template items has been renamed ItemUri to avoid confusion with the similarly named Url property
  • Standard item properties related to searches - namely Statistics and Paths - have been flattened into properties on IStandardTemplateItem itself. This allows you to easily use them in content search queries
  • Additional TemplateName, Path, ParentId, AncestorIds, DatabaseName, and SearchableContent properties have been added to IStandardTemplateItem to support content search queries
  • Added WhereResultIsValidDatabaseItem() and TakeValidDatabaseItems() LINQ extension methods. These allow you to filter index-based result sets and remove any index entries that do not have valid database-based equivalents. Due to the way Synthesis transparently promotes index-based items to database items, this can avoid runtime exceptions during promotion if the index item has no database equivalent or security denies access to the database equivalent.
  • Synthesis no longer requires a global registration of its document type mapper now that Sitecore 7.2 supports overriding the mapper via an ExecutionContext
  • Multiple ExecutionContext parameters can now be passed to GetSynthesisQueryable
  • Support for doing OR and AND searches over collections, such as multilist targets, via the ContainsOr and ContainsAnd extension methods (original implementation by @jason_bert)
  • Integrated @jorgelusar‘s web test runner to improve visibility of Synthesis’ unit tests. Added a lot of tests to cover and document the content search functionality available.
  • Added support for using multilist, datetime, and lookup field values from the index without object promotion, if the value is stored in the index.
  • The ignoreStandardFilters parameter on GetSynthesisQueryable() was incorrectly being ignored
  • Fixed a bug where StartupRegenerateProjectPath entries starting with ~ were not correctly resolved (thanks @delgadobyron!)
  • Generating templates whose name starts with a number will now work correctly (prefixed with _), courtesy of @martijn_b0s
  • Using the SelectSingleItem() and SelectItems() queries on a template item will no longer cause an error if the query returns no results, courtesy of @martijn_b0s
  • A number of hacks to fix issues in the Sitecore search expression parser that were fixed in Sitecore 7.2 have been removed.

As always, thanks to the community members who use and contribute to Synthesis - you all rock. Feel free to hit me up on Twitter or report an issue or pull request on GitHub if you run into problems.

Indexing Subcontent

Often in Sitecore solutions we will utilize the page component architecture, where a page is composed of a base item and one or more “subcontent” items that are data sources for renderings that we have added to the page to customize its layout. This grants editors extreme flexibility to customize what is shown on their pages, but causes a major headache if you’re trying to have your search index contain the full content of a page because now the content is spread across potentially many backend items.

Note: While the code presented in this article is mine, the original idea was my coworker Erik Brown’s. Unfortunately he has no Twitter or blog that I know of, but if he ever gets one you should follow it :)

The solution we came up with utilizes Sitecore 7’s computed index fields (blog posts: John West, Martina Welander), and a handy but less than obvious trick.

The trick is that the Sitecore indexes can contain more than one value for the same index field. This means that if you want to augment the value of the system _content field (which contains all the indexable text fields on the item), all you have to do is define a computed field that is also named _content. Now the _content index field stores two values: the system value, and your computed value - and it will search on both values when you query it.

Armed with this knowledge I set out to use the layout field on the base item to find all rendering datasources it points to and use the same code that Sitecore uses to create the _content value for the main item. This turned out to not be all that hard (though I had the advantage of having done a lot of layout field messing around before):

Registering the computed field to augment the main _content index field requires a config patch file (note that this targets Lucene; your patch for SOLR or Coveo would be different):

Another handy tweak for indexing this kind of item is to use Nick Wesselman‘s handy code to cause the main item to be updated in the index whenever any of its datasource items are modified. Nice work!

Hope you find this helpful :)

Using Web API 2 Attribute Routing with Sitecore

There are a lot of times when developing a Sitecore site that you need to introduce dynamic, API-driven content - such as search autocompletes, single page apps, and other dynamic client-side content. Sitecore itself provides the Item Web API, but sometimes you want a more targeted solution designed for end users to acquire data in a fashion that exposes as little as possible.

Microsoft provides a technology that does this pretty nicely: the aptly named ASP.NET Web API. Web API uses a ASP.NET MVC-like controller to easily expose REST services over JSON or XML. I set out to make it work with Sitecore. Of course, some other folks had already got this to work. Unfortunately their work is also in some cases out of date - so let’s untangle things.

How to get Web API working on Sitecore, by version

Sitecore 6.x or 7.0 (without Sitecore MVC)

If you’re using Sitecore 6 or 7.0 without Sitecore MVC enabled, you can use the directions in the previously referenced blog posts to get Web API configured. By default, Sitecore will not respect ASP.NET routing unless Sitecore MVC is enabled so you have to use the solutions these blogs present to make Sitecore ignore routed URLs.

Sitecore with Sitecore MVC enabled

When Sitecore MVC is enabled (and you cannot turn it off on Sitecore 7.1 and later), you do not need the pipeline handler presented in the existing blog posts about Sitecore and Web API - Sitecore will automatically hand off routed requests to the route handler out of the box.

Web API versioning

At this time there are two versions of Web API:

  • v1, which released concurrently with ASP.NET MVC 4.
  • v2, which released concurrently with ASP.NET MVC 5.

If you’re on Sitecore 6 you’ll be using v1, because Sitecore 6 runs on .NET 4.0, and Web API 2 requires .NET 4.5. On Sitecore 7, you have more options. With 7.0 and MVC disabled, you can run v1 or v2. However on Sitecore 7.0 with MVC or 7.1, this adds a dependency on ASP.NET MVC 4 so you are limited to Web API v1 unless you follow the instructions in this post to add binding redirects and config modifications to enable MVC 5. The landscape thankfully improves with Sitecore 7.2, as that uses ASP.NET MVC 5.1 which easily adapts to Web API 2. Note that attribute routing, the point of this post, requires Web API 2

Attribute routing?

About time we started talking about the core of this post, eh? Attribute routing is a snazzy new way to register routes for Web API and ASP.NET MVC that was introduced in Web API 2/MVC 5 (and it was adapted from a community developed module for MVC 4). In earlier versions, you had to register routes in a global route table that was generally constructed all in one global class. This made developing relatively ad-hoc data services that provide what amount to an “AJAX backend for a CMS page” somewhat problematic as your dependency of a small part of the site now has to stick its fingers in everyone else’s pie by adding a global route.

Attribute routing solves this issue by allowing you to use standard .NET attributes (you know, like [HttpPost] on a MVC action method) to define the route to the code. This makes a lot more sense to my way of thinking. Here’s an example attribute-routed Web API 2 controller:

Microsoft has written pretty detailed documentation about Web API 2 attribute routing, which I suggest you review before getting serious about using this stuff.

Enabling attribute routing in Sitecore

If you read the aforementioned documentation, you’re probably already well on your way to enabling attribute routing in Sitecore since it’s the same as any other site - all you have to do is register the attribute routes.

The question then becomes how to do that in Sitecore. I’d say the official route is to add a pipeline handler to the initialize pipeline that does the registration. I decided to go with a more pure .NET approach, and register it using WebActivator - look ma, no config file! I created App_Start\WebApiConfig.cs - a pattern which may seem familiar to folks who’ve recently worked on ASP.NET MVC sites - and registered my attribute routes:

So now all I have to do is:

  • Install WebActivatorEx from NuGet
  • Copy WebApiConfig.cs into my project
  • Create an attribute-routed controller to do stuff

Pretty simple, and - if you’re using Sitecore 7.2 - easy to get set up on as well. For Sitecore before 7.2 there are some potential hoops to jump through to get attribute routing.

Anyhow, I hope this was helpful to someone.

Project Platypus: A content migration solution

Every so often as a developer we discover new and awesome things. Sitecore is a fairly capable enterprise CMS, but there’s another one that is simply better.

It all started when I started working with a beautifully designed, very consistent programming language: PHP. PHP has so many things that .NET could never imagine having, and it’s so easy to use. No more pesky ORMs and other silly tools adding layers of complexity on your CMS - with PHP this is all built in and wonderfully easy.

Once I got started with PHP it was only a short while until I started working with WordPress. I quickly realized that it was perfectly suited to extremely large enterprise multilingual and multisite sites, and that its easy to use page-based model makes it so much easier to develop and maintain than overcomplex and expensive competitors like Sitecore.

Now I set out to design a way to migrate content out of legacy Sitecore sites and into new WordPress instances. Enter Project Platypus. All you have to do is install it on your Sitecore instance, and a few clicks later you’ll have a brand new WordPress site with all your content migrated over. It’ll even transfer markup and Lucene indexes that are using scSearchContrib - all without any developer intervention.

In order to get people started using Project Platypus, I’ve made this short video demonstrating it in action. I’ll be posting the source code to GitHub in the next couple days.

Oh, and happy April Fool’s Day.

Now available: Unicorn 2

I’m happy to announce that the second major version of Unicorn has been released. Unicorn is a free and open source tool to automatically synchronize item changes between Sitecore instances. If you’re new to Unicorn, the README describes what it is in more detail.

Unicorn, like many projects, evolved out of a proof of concept. Like most proof of concept projects, Unicorn suffered from a lack of flexibility because it did just enough to prove that it worked. Almost as soon as the first public version was released, work started on the refactoring that would become Unicorn 2. This second major revision of Unicorn is the result of more than 100 hours of work. Both code and UX have been refactored, decoupled, fixed, and improved. Many of the improvements are due to the input of the Sitecore community. Thank you to the folks who have sent issues, questions, and chatted about Unicorn since it was released. You all are awesome.

What’s new in Unicorn 2

  • More reliable change detection. Instead of detecting changes using event handlers, a data provider is now used. This makes Unicorn immune to EventDisabler and thus missing changes to content.
  • Support for multiple configurations. A configuration is a set of dependencies, such as a predicate, serialization provider, and evaluator. These allow you to configure sets of content to serialize differently.
  • A built-in control panel that walks you through common configuration issues and initial setup, and allows executing sync and reserialize operations once setup.
  • Greatly improved logging. All operations are logged to the Sitecore logs, and the formatting has been streamlined to be terse and relevant.
  • Consistency checking during a sync that finds common merge errors and flags them as errors.
  • Better item comparison. All item fields are now compared when deciding whether to update an item, whereas v1 only evaluated the updated and revision fields.
  • Field-level exclusion. You can now have certain fields be ignored when checking for updates or deserializing an item.
  • Supports syncing media items (such as rendering thumbnails for page editor)
  • Tons of bug fixes for unusual situations (for example, moving and copying items between included and not included paths)
  • Compatible with Sitecore 6.5 and later. The original version was for Sitecore 7 only.

How do I get it and install it?

  • Upgrading from 1.x? Read this
  • You’ll need Sitecore 6.5.0 (121009) or later. Tested with Sitecore through 7.1 Update-1.
  • Install Unicorn. This is as simple as adding the Unicorn NuGet package to your project.
  • Configure what to serialize in the example configuration’s Predicate registration. There will be an App_Config/Include/Serialization.config file installed, which has a commented example of this syntax.
  • Run a build in Visual Studio to make sure the output files are up to date.
  • Visit $yoursite/unicorn.aspx and it will walk you through initial serialization. This will take the preset you configured and serialize all of the included items in it to disk.
    • NOTE: make sure to serialize an authoritative database with all items present. Other databases will be made to look just like this one when sync occurs.
    • NOTE: if you’re using Git, you need to make sure that Git doesn’t fool with the line endings of your serialized files. Add *.item -text to a .gitattributes file in the repo root. See this blog post for details.
  • Commit your serialized items to source control.

If you want to install Unicorn from source, the procedure is quite easy. The README has directions for this.

Neat things you can do with Unicorn

Code Review and Branching

Recently I’ve started doing a lot more code review using GitHub pull requests. Unicorn makes Sitecore code review very easy - all you have to do is switch to the branch under review and sync Unicorn. Now your Sitecore has all the templates and renderings from that branch. When you’re done you checkout the original branch and sync again - and you’re back where you started. Very convenient.

Syncing test content between developers

If your site’s information architecture allows it, you can include a path under the site’s home item, such as ‘Samples’ to be synced using Unicorn. This content can be used by developers to create examples of content types under development to share for QA. The same technique can also be used to share rendering thumbnails in the media library.

Integrating with continuous integration

Unicorn is very simple to run using any CI server or script that can make HTTP requests. Once that is set up, your integration server - or even live - will never have outdated templates again. Example scripts for this can be found here

Integrate with deployment tools

Several people have integrated Unicorn with tools such as Sitecore Courier and Sitecore.Ship to create update packages to deploy to remote environments.

What’s new in the backend

The largest change in the backend is having it be a modular dependency-based system. Unlike v1 which had issues like the inability to use anything but the default preset, v2 allows you to reconfigure nearly all aspects of the system by changing the classes registered in the Serialization.config file. Multiple configurations also allow for multiple configurations of the dependencies, so you can have more than one implementation of these extension points in any given project. This allows you to not only have different behavior per configuration but also to sync configurations separately if you wish to.

There is a rather long description of customization in the repository’s README file on GitHub if you’re interested.

Should I use this now?

That’s up to you. I am using it for my daily team development tasks, and consider it to be generally more stable than v1 because of the safety improvements (data provider, syncing all fields, consistency checking, better log messages).

Hope you enjoy it, and as always you can get ahold of me on Twitter or GitHub if you run across issues or have ideas.

Have two web databases? Don't forget the IndexUpdateStrategy

If you have two web databases in a Sitecore 7 site (e.g. for a pre-production preview or staging site) and you’re using Lucene here’s something to be aware of:

Simply copying the Sitecore.ContentSearch.Lucene.Index.Web.config file, renaming the index and changing the <locations> entr(ies) is not sufficient to make your new index work correctly! You’ll notice that if you publish items to that database that the updates don’t make it into the index. Why not?

Simply put, the default IndexUpdateStrategies are confusingly named. The web index defaults to the onPublishEndAsync strategy, which makes sense to use for our staging database as well - we want the index to get updated by the event queue after a publish completes. But there’s one minor detail: the onPublishEndAsync strategy should really be called onPublishEndAsyncWebdb or something similar. Let’s look at how the strategy is defined:

<onPublishEndAsync type="Sitecore.ContentSearch.Maintenance.Strategies.OnPublishEndAsynchronousStrategy, Sitecore.ContentSearch">
    <param desc="database">web</param>
    <CheckForThreshold>true</CheckForThreshold>
</onPublishEndAsync>

Derp - that <param desc="database">web</param>! The strategy actually is only watching the event queue for updates from a single database - and that database is not our staging database. The solution is to define a copy of the onPublishEndAsync strategy for your staging index and change the database parameter, then change your index declaration to reference that strategy instead:

In app_config\include\sitecore.contentsearch.lucene.defaultindexconfiguration.config (or a patch thereof):

<onPublishEndAsyncStaging type="Sitecore.ContentSearch.Maintenance.Strategies.OnPublishEndAsynchronousStrategy, Sitecore.ContentSearch">
  <param desc="database">web_staging</param>
  <CheckForThreshold>true</CheckForThreshold>
</onPublishEndAsyncStaging>

In App_Config\Include\Sitecore.ContentSearch.Lucene.Index.Web_Staging.config (or wherever you’re defining your staging index), reference the new strategy instead of the existing strategy:

<strategy ref="contentSearch/indexUpdateStrategies/onPublishEndAsyncStaging" />

Note that if you defined an index that spanned multiple databases you’d need to define multiple strategies for its updates, and reference all of them from the index config - an index can have multiple update strategies.

Hope this is helpful :)

Using MVC 5 with Sitecore 7.1

If you’re using something that is already on ASP.NET MVC 5, for example the latest release of Blade, and want to upgrade to Sitecore 7.1 you may be worried that as of 7.1 Sitecore takes a dependency on ASP.NET MVC 4. Thankfully that’s actually pretty simple to work around in such a way that both your MVC5 code and Sitecore’s MVC4 code can work together.

Tell Sitecore to use MVC5

First you need to redirect bindings to MVC4 to MVC5. You accomplish this by adding a binding redirect to your Web.config file like so:

<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
  <dependentAssembly>
    <assemblyIdentity name="System.Web.Mvc" publicKeyToken="31bf3856ad364e35" />
    <bindingRedirect oldVersion="1.0.0.0-4.0.0.0" newVersion="5.0.0.0" />
  </dependentAssembly>
 </assemblyBinding>
 </runtime>

Once you’ve done this, the new SPEAK backend will actually be making calls to the MVC5 System.Web.Mvc library - which it appears to be quite happy doing.

Comment out the extra Razor section definition

In Sitecore 7.1, there is a Web.config added to sitecore/shell/client that contains a definition of the Razor config section. If you’re defining the Razor config section yourself on the Web.config - like Blade does - then this is a duplicate declaration and will throw an exception. All you need to do is comment out this duplicate definition and it will fix the issue:

<!--   <configSections>       
    <sectionGroup name="system.web.webPages.razor" type="System.Web.WebPages.Razor.Configuration.RazorWebSectionGroup, System.Web.WebPages.Razor, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35">
        <section name="host" type="System.Web.WebPages.Razor.Configuration.HostSection, System.Web.WebPages.Razor, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" requirePermission="false" />
        <section name="pages" type="System.Web.WebPages.Razor.Configuration.RazorPagesSection, System.Web.WebPages.Razor, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" requirePermission="false" />       
    </sectionGroup>        
</configSections> -->

That’s all you need to do to get MVC5 running with Sitecore 7.1. The Web.config in the shell will automatically override whatever Razor settings you may have in your Web.config file and keep SPEAK running when it needs to.

Blade 2.1 - Razor 3 support

I’ve pushed Blade 2.1 out to NuGet. This release updates the MVC references to MVC5, and the Razor references to Razor 3.

This release also uses the latest bits for Sitecore NuGet, which resolve some issues with local IIS site connections.

Sitecore Data Architecture

Introduction

There are many times where you need to extend the Sitecore data architecture in some way. One of the most obvious, and most used, is attaching an event handler to one of the item events (e.g. item:saved, item:renamed) but this is not by any means the only way to manipulate items before they get to the database.

This blog post came out of a discussion I had with Alex Shyba about the DataEngine, a few hours on a plane, and a lot of decompilation. I’m going to attempt to show what happens when an item gets saved and where you can plug into that.

The entry point

The most high level API is of course the Item class. Saving an item using this API is very simple:

using(new EditContext(item))
{
    item["MyField"] = "newValue";
}

When the EditContext goes out of scope, this causes item.Editing.AcceptChanges() to be invoked, and our journey down the rabbit-hole begins there.

The Item Manager

Our next step is the highest level item data API in Sitecore: the ItemManager. At first blush ItemManager appears to be a rather ugly giant static class, with tons of manipulaton methods: SaveItem, AddVersion, etc. However, it’s not as bad as it looks. ItemManager is basically a static facade around the current ItemProvider.

The Item Provider

Here’s where things get interesting. You can register multiple item providers in the config under itemManager/providers. While this appears to provide a lot of power as an extension point - being able to plugin your own high level provider - it unfortunately seems to be rather ungainly. If multiple providers are registered you can access them directly using ItemManager.Providers, but the other ItemManager methods only execute against the default provider. You could replace the default provider with your own, but this is less than ideal as it can lead to contention for whose item provider gets used if more than one module wants to extend using it.

Extension points aside, the ItemProvider is the most broad data API Sitecore provides. It deals in high level objects and abstracts lower level concepts like databases and item definitions away. Most all methods in the ItemProvider result in invocations of the next lower level construct, the DataEngine.

UPDATE: Nick Wesselman pointed out that some queries will bypass the Item Provider and go directly to the Data Engine. This gist has some examples of the types of things that bypass the Item Provider. In light of this I’d say the item provider is generally not a good candidate for extending things.

The Data Engine

DataEngine (generally accessed by databaseObject.DataManager.DataEngine) is a very interesting component for extension. Unlike the item provider, the DataEngine is database-specific. This is also the layer at which item event handlers (e.g. item:saved) are processed, but more interestingly is also a place where you can hook events in an uninterruptible fashion. Regular event handlers, such as item:saved, are vulnerable to being disabled when someone scopes an EventDisabler. Generally this is good, as events are disabled for performance reasons during things like serialization bulk load operations. But occasionally, such as in Unicorn, you need an event handler that cannot be disabled. Having a Unicorn event not fire would likely mean data being lost.

The internal architecture of DataEngine works something like this:

  • An engine method gets invoked
  • The engine creates and initializes an DataEngineCommand<> object, using a stored prototype of the command
  • The DataEngineCommand has its Execute method invoked.

The EngineCommand is a generic type, and each action that the engine can take has its own implementation - for example, there is a Sitecore.Data.Engines.DataCommands.SaveItemCommand class. Each of these command types is bootstrapped by a prototype system on the DataEngine itself. The engine, by way of its Commands property, stores copies of each command type. When a new command is needed, the existing prototype command is cloned (Clone()), initialized (Initialize()), and then used to execute the action. These prototypes are settable, including via configuration, so this allows you to inject your own derived implementation of each command and thus inject ‘event handler like’ functionality into the DataEngine.

Item Buckets utilizes this type of functionality injection, replacing the AddFromTemplatePrototype with its own extended implementation. This is the relevant section of Sitecore.Buckets.config:

<database id="master" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
  <Engines.DataEngine.Commands.AddFromTemplatePrototype>
    <obj type="Sitecore.Buckets.Commands.AddFromTemplateCommand, Sitecore.Buckets" />
  </Engines.DataEngine.Commands.AddFromTemplatePrototype>
</database>

This same functionality could be used to inject non-disableable event handler type functionality into other events, for example by replacing the SaveItemPrototype. Unfortunately this method also suffers from possible contention issues if multiple modules wished to patch these commands, so be careful when using them.

The DataEngineCommand

Once the DataEngine creates and executes a command, the command generally does two things:

  • Pass its task down to the Nexus DataApi to be handled
  • Fire off any normal item event handlers (item:saved), as long as events are not disabled

Most of the methods on DataEngineCommand are virtual so you could extend them. The most obvious candidate, of course, is the DoExecute method that performs the basic action of the command.

Further down the rabbit-hole we arrive to the Nexus APIs.

The Nexus DataApi

The Nexus assembly, due to it containing licensing code I believe, is the only obfuscated assembly in Sitecore. This makes following what the data API is doing rather difficult, but I’m pretty sure I traced it back out as a call to the databaseItem.DataManager.DataSource APIs, which thankfully are not obfuscated. It’s worth noting that the Nexus APIs appear to also be using a separate command-class based architecture internally. Given the sensitive nature of the Nexus assembly however, I would look elsewhere for extension points.

The DataSource

The Sitecore.Data.DataSource appears to largely be a translation layer between slightly higher level APIs, where things like Item are used, and the low level API of the data providers (which use constructs like IDs and ItemDefinition instead). The data source is very generic and does not appear to be designed for extension, which is fine because we can extend both above and below it.

Methods within DataSource eventually make calls down to static methods on the DataProvider class.

The Data Provider

There are two faces to the DataProvider class. The internal static methods that the DataSource is invoking are helper methods that invoke non-static methods on all of the data providers attached to the database. (Yes, you can have multiple data providers within the same database, which may not even look at the same backend database at all…hehe) The actual data provider instances are the lowest level data APIs in Sitecore. They deal with primitive objects and have a lot of unspoken rules about them that make them tougher to implement than most other extension points. However they are also the most powerful extension point there is. Using a data provider you can manipulate the content tree in nearly any fashion for example Rhino, a data provider that makes serialized item files on disk appear to be real content items in Sitecore.

You’re still reading this?

Hopefully this is a useful post for some crazy nuts like myself who like to bend the guts of Sitecore. It’s definitely a long and byzantine road from saving an item to it getting to the database. Rather amazing that Sitecore is as fast as it is with all of these layers. Part of me suspects that some are baggage from Sitecore 4 for backwards compatibility ;)