Indexing Subcontent

Often in Sitecore solutions we will utilize the page component architecture, where a page is composed of a base item and one or more “subcontent” items that are data sources for renderings that we have added to the page to customize its layout. This grants editors extreme flexibility to customize what is shown on their pages, but causes a major headache if you’re trying to have your search index contain the full content of a page because now the content is spread across potentially many backend items.

Note: While the code presented in this article is mine, the original idea was my coworker Erik Brown’s. Unfortunately he has no Twitter or blog that I know of, but if he ever gets one you should follow it :)

The solution we came up with utilizes Sitecore 7’s computed index fields (blog posts: John West, Martina Welander), and a handy but less than obvious trick.

The trick is that the Sitecore indexes can contain more than one value for the same index field. This means that if you want to augment the value of the system _content field (which contains all the indexable text fields on the item), all you have to do is define a computed field that is also named _content. Now the _content index field stores two values: the system value, and your computed value - and it will search on both values when you query it.

Armed with this knowledge I set out to use the layout field on the base item to find all rendering datasources it points to and use the same code that Sitecore uses to create the _content value for the main item. This turned out to not be all that hard (though I had the advantage of having done a lot of layout field messing around before):

using System.Collections.Generic;
using System.Linq;
using Blade.Utility;
using Sitecore;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.ComputedFields;
using Sitecore.Data.Fields;
using Sitecore.Data.Items;
using Sitecore.Layouts;
namespace Foo.ContentSearch.ComputedFields
{
/// <summary>
/// Computed field that contains all textual content of items that are rendering data sources on the current item's layout details
/// </summary>
public class SubcontentField : IComputedIndexField
{
public object ComputeFieldValue(IIndexable indexable)
{
var sitecoreIndexable = indexable as SitecoreIndexableItem;
if (sitecoreIndexable == null) return null;
// find renderings with datasources set
var customDataSources = ExtractRenderingDataSourceItems(sitecoreIndexable.Item);
// extract text from data sources
var contentToAdd = customDataSources.SelectMany(GetItemContent).ToList();
if (contentToAdd.Count == 0) return null;
return string.Join(" ", contentToAdd);
}
/// <summary>
/// Finds all renderings on an item's layout details with valid custom data sources set and returns the data source items.
/// </summary>
protected virtual IEnumerable<Item> ExtractRenderingDataSourceItems(Item baseItem)
{
string currentLayoutXml = LayoutField.GetFieldValue(baseItem.Fields[FieldIDs.LayoutField]);
if (string.IsNullOrEmpty(currentLayoutXml)) yield break;
LayoutDefinition layout = LayoutDefinition.Parse(currentLayoutXml);
// loop over devices in the rendering
for (int deviceIndex = layout.Devices.Count - 1; deviceIndex >= 0; deviceIndex--)
{
var device = layout.Devices[deviceIndex] as DeviceDefinition;
if (device == null) continue;
// loop over renderings within the device
for (int renderingIndex = device.Renderings.Count - 1; renderingIndex >= 0; renderingIndex--)
{
var rendering = device.Renderings[renderingIndex] as RenderingDefinition;
if (rendering == null) continue;
// if the rendering has a custom data source, we resolve the data source item and place its text fields into the content to add
if (!string.IsNullOrWhiteSpace(rendering.Datasource))
{
// DataSourceHelper is a component of Blade
var dataSource = DataSourceHelper.ResolveDataSource(rendering.Datasource, baseItem);
if (dataSource != baseItem)
{
yield return dataSource;
}
}
}
}
}
/// <summary>
/// Extracts textual content from an item's fields
/// </summary>
protected virtual IEnumerable<string> GetItemContent(Item dataSource)
{
foreach (Field field in dataSource.Fields)
{
// this check is what Sitecore uses to determine if a field belongs in _content (see LuceneDocumentBuilder.AddField())
if (!IndexOperationsHelper.IsTextField(new SitecoreItemDataField(field))) continue;
string fieldValue = (field.Value ?? string.Empty).StripHtml();
if (!string.IsNullOrWhiteSpace(fieldValue)) yield return fieldValue;
}
}
public string FieldName { get; set; }
public string ReturnType { get; set; }
}
}

Registering the computed field to augment the main _content index field requires a config patch file (note that this targets Lucene; your patch for SOLR or Coveo would be different):

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<contentSearch>
<indexConfigurations>
<defaultLuceneIndexConfiguration>
<fields hint="raw:AddComputedIndexField">
<!-- indexes subcontent contents into parent's _content field in the index (for better site search) -->
<field fieldName="_content" type="Foo.ContentSearch.ComputedFields.SubcontentField, Foo" />
</fields>
</defaultLuceneIndexConfiguration>
</indexConfigurations>
</contentSearch>
</sitecore>
</configuration>

Another handy tweak for indexing this kind of item is to use Nick Wesselman‘s handy code to cause the main item to be updated in the index whenever any of its datasource items are modified. Nice work!

Hope you find this helpful :)